Repository: alibaba/DataX Branch: master Commit: 60ea07b73086 Files: 1445 Total size: 5.0 MB Directory structure: gitextract_n5eqdejp/ ├── .gitignore ├── NOTICE ├── README.md ├── adbmysqlwriter/ │ ├── doc/ │ │ └── adbmysqlwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── adbmysqlwriter/ │ │ └── AdbMysqlWriter.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── adbpgwriter/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── doc/ │ │ └── adbpgwriter.md │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── adbpgwriter/ │ │ ├── AdbpgWriter.java │ │ ├── copy/ │ │ │ ├── Adb4pgClientProxy.java │ │ │ └── AdbProxy.java │ │ ├── package-info.java │ │ └── util/ │ │ ├── Adb4pgUtil.java │ │ ├── Constant.java │ │ └── Key.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── adswriter/ │ ├── doc/ │ │ └── adswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── adswriter/ │ │ ├── AdsException.java │ │ ├── AdsWriter.java │ │ ├── AdsWriterErrorCode.java │ │ ├── ads/ │ │ │ ├── ColumnDataType.java │ │ │ ├── ColumnInfo.java │ │ │ ├── TableInfo.java │ │ │ └── package-info.java │ │ ├── insert/ │ │ │ ├── AdsClientProxy.java │ │ │ ├── AdsInsertProxy.java │ │ │ ├── AdsInsertUtil.java │ │ │ ├── AdsProxy.java │ │ │ └── OperationType.java │ │ ├── load/ │ │ │ ├── AdsHelper.java │ │ │ ├── TableMetaHelper.java │ │ │ └── TransferProjectConf.java │ │ ├── odps/ │ │ │ ├── DataType.java │ │ │ ├── FieldSchema.java │ │ │ ├── TableMeta.java │ │ │ └── package-info.java │ │ ├── package-info.java │ │ └── util/ │ │ ├── AdsUtil.java │ │ ├── Constant.java │ │ └── Key.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── cassandrareader/ │ ├── doc/ │ │ └── cassandrareader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── cassandrareader/ │ │ ├── CassandraReader.java │ │ ├── CassandraReaderErrorCode.java │ │ ├── CassandraReaderHelper.java │ │ ├── Key.java │ │ ├── LocalStrings.properties │ │ ├── LocalStrings_en_US.properties │ │ ├── LocalStrings_ja_JP.properties │ │ ├── LocalStrings_zh_CN.properties │ │ ├── LocalStrings_zh_HK.properties │ │ └── LocalStrings_zh_TW.properties │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── cassandrawriter/ │ ├── doc/ │ │ └── cassandrawriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── cassandrawriter/ │ │ ├── CassandraWriter.java │ │ ├── CassandraWriterErrorCode.java │ │ ├── CassandraWriterHelper.java │ │ ├── Key.java │ │ ├── LocalStrings.properties │ │ ├── LocalStrings_en_US.properties │ │ ├── LocalStrings_ja_JP.properties │ │ ├── LocalStrings_zh_CN.properties │ │ ├── LocalStrings_zh_HK.properties │ │ └── LocalStrings_zh_TW.properties │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── clickhousereader/ │ ├── doc/ │ │ └── clickhousereader.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── reader/ │ │ │ └── clickhousereader/ │ │ │ └── ClickhouseReader.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ └── resources/ │ ├── basic1.json │ └── basic1.sql ├── clickhousewriter/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── clickhousewriter/ │ │ ├── ClickhouseWriter.java │ │ └── ClickhouseWriterErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── common/ │ ├── pom.xml │ └── src/ │ └── main/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── common/ │ ├── base/ │ │ └── BaseObject.java │ ├── constant/ │ │ ├── CommonConstant.java │ │ └── PluginType.java │ ├── element/ │ │ ├── BoolColumn.java │ │ ├── BytesColumn.java │ │ ├── Column.java │ │ ├── ColumnCast.java │ │ ├── DateColumn.java │ │ ├── DoubleColumn.java │ │ ├── LongColumn.java │ │ ├── OverFlowUtil.java │ │ ├── Record.java │ │ └── StringColumn.java │ ├── exception/ │ │ ├── CommonErrorCode.java │ │ ├── DataXException.java │ │ └── ExceptionTracker.java │ ├── plugin/ │ │ ├── AbstractJobPlugin.java │ │ ├── AbstractPlugin.java │ │ ├── AbstractTaskPlugin.java │ │ ├── JobPluginCollector.java │ │ ├── PluginCollector.java │ │ ├── Pluginable.java │ │ ├── RecordReceiver.java │ │ ├── RecordSender.java │ │ └── TaskPluginCollector.java │ ├── spi/ │ │ ├── ErrorCode.java │ │ ├── Hook.java │ │ ├── Reader.java │ │ └── Writer.java │ ├── statistics/ │ │ ├── PerfRecord.java │ │ ├── PerfTrace.java │ │ └── VMInfo.java │ └── util/ │ ├── Configuration.java │ ├── ConfigurationUtil.java │ ├── DESCipher.java │ ├── DataXCaseEnvUtil.java │ ├── FilterUtil.java │ ├── HostUtils.java │ ├── LimitLogger.java │ ├── ListUtil.java │ ├── LocalStrings.properties │ ├── LocalStrings_en_US.properties │ ├── LocalStrings_ja_JP.properties │ ├── LocalStrings_zh_CN.properties │ ├── LocalStrings_zh_HK.properties │ ├── LocalStrings_zh_TW.properties │ ├── LoggerFunction.java │ ├── MessageSource.java │ ├── RangeSplitUtil.java │ ├── RetryUtil.java │ └── StrUtil.java ├── core/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── conf/ │ │ ├── .secret.properties │ │ ├── core.json │ │ └── logback.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ ├── core/ │ │ │ ├── AbstractContainer.java │ │ │ ├── Engine.java │ │ │ ├── LocalStrings.properties │ │ │ ├── LocalStrings_en_US.properties │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ ├── container/ │ │ │ │ └── util/ │ │ │ │ ├── HookInvoker.java │ │ │ │ └── JobAssignUtil.java │ │ │ ├── job/ │ │ │ │ ├── JobContainer.java │ │ │ │ ├── meta/ │ │ │ │ │ ├── ExecuteMode.java │ │ │ │ │ └── State.java │ │ │ │ └── scheduler/ │ │ │ │ ├── AbstractScheduler.java │ │ │ │ └── processinner/ │ │ │ │ ├── ProcessInnerScheduler.java │ │ │ │ └── StandAloneScheduler.java │ │ │ ├── statistics/ │ │ │ │ ├── communication/ │ │ │ │ │ ├── Communication.java │ │ │ │ │ ├── CommunicationTool.java │ │ │ │ │ └── LocalTGCommunicationManager.java │ │ │ │ ├── container/ │ │ │ │ │ ├── collector/ │ │ │ │ │ │ ├── AbstractCollector.java │ │ │ │ │ │ └── ProcessInnerCollector.java │ │ │ │ │ ├── communicator/ │ │ │ │ │ │ ├── AbstractContainerCommunicator.java │ │ │ │ │ │ ├── job/ │ │ │ │ │ │ │ └── StandAloneJobContainerCommunicator.java │ │ │ │ │ │ └── taskgroup/ │ │ │ │ │ │ ├── AbstractTGContainerCommunicator.java │ │ │ │ │ │ └── StandaloneTGContainerCommunicator.java │ │ │ │ │ └── report/ │ │ │ │ │ ├── AbstractReporter.java │ │ │ │ │ └── ProcessInnerReporter.java │ │ │ │ └── plugin/ │ │ │ │ ├── DefaultJobPluginCollector.java │ │ │ │ └── task/ │ │ │ │ ├── AbstractTaskPluginCollector.java │ │ │ │ ├── HttpPluginCollector.java │ │ │ │ ├── StdoutPluginCollector.java │ │ │ │ └── util/ │ │ │ │ └── DirtyRecord.java │ │ │ ├── taskgroup/ │ │ │ │ ├── TaskGroupContainer.java │ │ │ │ ├── TaskMonitor.java │ │ │ │ └── runner/ │ │ │ │ ├── AbstractRunner.java │ │ │ │ ├── ReaderRunner.java │ │ │ │ ├── TaskGroupContainerRunner.java │ │ │ │ └── WriterRunner.java │ │ │ ├── transport/ │ │ │ │ ├── channel/ │ │ │ │ │ ├── Channel.java │ │ │ │ │ └── memory/ │ │ │ │ │ └── MemoryChannel.java │ │ │ │ ├── exchanger/ │ │ │ │ │ ├── BufferedRecordExchanger.java │ │ │ │ │ ├── BufferedRecordTransformerExchanger.java │ │ │ │ │ ├── RecordExchanger.java │ │ │ │ │ └── TransformerExchanger.java │ │ │ │ ├── record/ │ │ │ │ │ ├── DefaultRecord.java │ │ │ │ │ └── TerminateRecord.java │ │ │ │ └── transformer/ │ │ │ │ ├── ComplexTransformerProxy.java │ │ │ │ ├── DigestTransformer.java │ │ │ │ ├── FilterTransformer.java │ │ │ │ ├── GroovyTransformer.java │ │ │ │ ├── GroovyTransformerStaticUtil.java │ │ │ │ ├── PadTransformer.java │ │ │ │ ├── ReplaceTransformer.java │ │ │ │ ├── SubstrTransformer.java │ │ │ │ ├── TransformerErrorCode.java │ │ │ │ ├── TransformerExecution.java │ │ │ │ ├── TransformerExecutionParas.java │ │ │ │ ├── TransformerInfo.java │ │ │ │ └── TransformerRegistry.java │ │ │ └── util/ │ │ │ ├── ClassSize.java │ │ │ ├── ClassUtil.java │ │ │ ├── ConfigParser.java │ │ │ ├── ConfigurationValidate.java │ │ │ ├── ErrorRecordChecker.java │ │ │ ├── ExceptionTracker.java │ │ │ ├── FrameworkErrorCode.java │ │ │ ├── HttpClientUtil.java │ │ │ ├── LocalStrings.properties │ │ │ ├── LocalStrings_en_US.properties │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ ├── SecretUtil.java │ │ │ ├── TransformerUtil.java │ │ │ └── container/ │ │ │ ├── ClassLoaderSwapper.java │ │ │ ├── CoreConstant.java │ │ │ ├── JarLoader.java │ │ │ └── LoadUtil.java │ │ └── dataxservice/ │ │ └── face/ │ │ └── domain/ │ │ └── enums/ │ │ ├── EnumStrVal.java │ │ ├── EnumVal.java │ │ ├── ExecuteMode.java │ │ └── State.java │ ├── job/ │ │ └── job.json │ └── script/ │ └── Readme.md ├── databendwriter/ │ ├── doc/ │ │ ├── databendwriter-CN.md │ │ └── databendwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── databendwriter/ │ │ ├── DatabendWriter.java │ │ ├── DatabendWriterErrorCode.java │ │ └── util/ │ │ └── DatabendWriterUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── datahubreader/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── datahubreader/ │ │ ├── Constant.java │ │ ├── DatahubClientHelper.java │ │ ├── DatahubReader.java │ │ ├── DatahubReaderErrorCode.java │ │ ├── DatahubReaderUtils.java │ │ ├── DatahubWriterErrorCode.java │ │ ├── Key.java │ │ ├── LocalStrings.properties │ │ ├── LocalStrings_en_US.properties │ │ ├── LocalStrings_ja_JP.properties │ │ ├── LocalStrings_zh_CN.properties │ │ ├── LocalStrings_zh_HK.properties │ │ └── LocalStrings_zh_TW.properties │ └── resources/ │ ├── job_config_template.json │ └── plugin.json ├── datahubwriter/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── datahubwriter/ │ │ ├── DatahubClientHelper.java │ │ ├── DatahubWriter.java │ │ ├── DatahubWriterErrorCode.java │ │ ├── Key.java │ │ ├── LocalStrings.properties │ │ ├── LocalStrings_en_US.properties │ │ ├── LocalStrings_ja_JP.properties │ │ ├── LocalStrings_zh_CN.properties │ │ ├── LocalStrings_zh_HK.properties │ │ └── LocalStrings_zh_TW.properties │ └── resources/ │ ├── job_config_template.json │ └── plugin.json ├── datax-example/ │ ├── datax-example-core/ │ │ ├── pom.xml │ │ └── src/ │ │ ├── main/ │ │ │ ├── java/ │ │ │ │ └── com/ │ │ │ │ └── alibaba/ │ │ │ │ └── datax/ │ │ │ │ └── example/ │ │ │ │ ├── ExampleContainer.java │ │ │ │ ├── Main.java │ │ │ │ └── util/ │ │ │ │ ├── ExampleConfigParser.java │ │ │ │ └── PathUtil.java │ │ │ └── resources/ │ │ │ └── example/ │ │ │ └── conf/ │ │ │ └── core.json │ │ └── test/ │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── example/ │ │ │ └── util/ │ │ │ └── PathUtilTest.java │ │ └── resources/ │ │ └── pathTest.json │ ├── datax-example-neo4j/ │ │ ├── pom.xml │ │ └── src/ │ │ └── test/ │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── example/ │ │ │ └── neo4j/ │ │ │ └── StreamReader2Neo4jWriterTest.java │ │ └── resources/ │ │ └── streamreader2neo4j.json │ ├── datax-example-streamreader/ │ │ ├── pom.xml │ │ └── src/ │ │ └── test/ │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── example/ │ │ │ └── streamreader/ │ │ │ └── StreamReader2StreamWriterTest.java │ │ └── resources/ │ │ └── stream2stream.json │ ├── doc/ │ │ └── README.md │ └── pom.xml ├── dataxPluginDev.md ├── dorisreader/ │ ├── doc/ │ │ └── dorisreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── dorisreader/ │ │ ├── DorisReader.java │ │ └── DorisReaderErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── doriswriter/ │ ├── doc/ │ │ ├── doriswriter.md │ │ └── mysql2doris.json │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── doriswriter/ │ │ ├── DelimiterParser.java │ │ ├── DorisBaseCodec.java │ │ ├── DorisCodec.java │ │ ├── DorisCodecFactory.java │ │ ├── DorisCsvCodec.java │ │ ├── DorisJsonCodec.java │ │ ├── DorisStreamLoadObserver.java │ │ ├── DorisUtil.java │ │ ├── DorisWriter.java │ │ ├── DorisWriterExcetion.java │ │ ├── DorisWriterManager.java │ │ ├── Keys.java │ │ └── WriterTuple.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── drdsreader/ │ ├── doc/ │ │ └── drdsreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── drdsreader/ │ │ ├── DrdsReader.java │ │ ├── DrdsReaderErrorCode.java │ │ └── DrdsReaderSplitUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── drdswriter/ │ ├── doc/ │ │ └── drdswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── drdswriter/ │ │ └── DrdsWriter.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── elasticsearchwriter/ │ ├── README.md │ ├── build.sh │ ├── doc/ │ │ └── elasticsearchwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── elasticsearchwriter/ │ │ ├── ElasticSearchClient.java │ │ ├── ElasticSearchColumn.java │ │ ├── ElasticSearchFieldType.java │ │ ├── ElasticSearchWriter.java │ │ ├── ElasticSearchWriterErrorCode.java │ │ ├── JsonPathUtil.java │ │ ├── JsonUtil.java │ │ ├── Key.java │ │ ├── NoReRunException.java │ │ ├── PartitionColumn.java │ │ ├── PrimaryKeyInfo.java │ │ └── jest/ │ │ ├── ClusterInfo.java │ │ ├── ClusterInfoResult.java │ │ └── PutMapping7.java │ └── resources/ │ └── plugin.json ├── ftpreader/ │ ├── doc/ │ │ └── ftpreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── ftpreader/ │ │ ├── Constant.java │ │ ├── FtpHelper.java │ │ ├── FtpReader.java │ │ ├── FtpReaderErrorCode.java │ │ ├── Key.java │ │ ├── SftpHelper.java │ │ └── StandardFtpHelper.java │ └── resources/ │ ├── plugin-template.json │ ├── plugin.json │ └── plugin_job_template.json ├── ftpwriter/ │ ├── doc/ │ │ ├── .gitkeep │ │ └── ftpwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── ftpwriter/ │ │ ├── FtpWriter.java │ │ ├── FtpWriterErrorCode.java │ │ ├── Key.java │ │ └── util/ │ │ ├── Constant.java │ │ ├── IFtpHelper.java │ │ ├── SftpHelperImpl.java │ │ └── StandardFtpHelperImpl.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── gaussdbreader/ │ ├── doc/ │ │ └── gaussdbreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── gaussdbreader/ │ │ ├── Constant.java │ │ └── GaussDbReader.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── gaussdbwriter/ │ ├── doc/ │ │ └── gaussdbwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── gaussdbwriter/ │ │ └── GaussDbWriter.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── gdbreader/ │ ├── doc/ │ │ └── gdbreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── gdbreader/ │ │ ├── GdbReader.java │ │ ├── GdbReaderErrorCode.java │ │ ├── Key.java │ │ ├── mapping/ │ │ │ ├── DefaultGdbMapper.java │ │ │ ├── MappingRule.java │ │ │ ├── MappingRuleFactory.java │ │ │ └── ValueType.java │ │ ├── model/ │ │ │ ├── AbstractGdbGraph.java │ │ │ ├── GdbElement.java │ │ │ ├── GdbGraph.java │ │ │ └── ScriptGdbGraph.java │ │ └── util/ │ │ └── ConfigHelper.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── gdbwriter/ │ ├── doc/ │ │ └── gdbwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── gdbwriter/ │ │ ├── GdbWriter.java │ │ ├── GdbWriterErrorCode.java │ │ ├── Key.java │ │ ├── client/ │ │ │ ├── GdbGraphManager.java │ │ │ └── GdbWriterConfig.java │ │ ├── mapping/ │ │ │ ├── DefaultGdbMapper.java │ │ │ ├── GdbMapper.java │ │ │ ├── MapperConfig.java │ │ │ ├── MappingRule.java │ │ │ ├── MappingRuleFactory.java │ │ │ └── ValueType.java │ │ ├── model/ │ │ │ ├── AbstractGdbGraph.java │ │ │ ├── GdbEdge.java │ │ │ ├── GdbElement.java │ │ │ ├── GdbGraph.java │ │ │ ├── GdbVertex.java │ │ │ └── ScriptGdbGraph.java │ │ └── util/ │ │ ├── ConfigHelper.java │ │ └── GdbDuplicateIdException.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hbase094xreader/ │ ├── doc/ │ │ ├── .gitkeep │ │ └── hbase094xreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── hbase094xreader/ │ │ ├── ColumnType.java │ │ ├── Constant.java │ │ ├── Hbase094xHelper.java │ │ ├── Hbase094xReader.java │ │ ├── Hbase094xReaderErrorCode.java │ │ ├── HbaseAbstractTask.java │ │ ├── HbaseColumnCell.java │ │ ├── Key.java │ │ ├── ModeType.java │ │ ├── MultiVersionFixedColumnTask.java │ │ ├── MultiVersionTask.java │ │ └── NormalTask.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hbase094xwriter/ │ ├── doc/ │ │ ├── .gitkeep │ │ └── hbase094xwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── hbase094xwriter/ │ │ ├── ColumnType.java │ │ ├── Constant.java │ │ ├── Hbase094xHelper.java │ │ ├── Hbase094xWriter.java │ │ ├── Hbase094xWriterErrorCode.java │ │ ├── HbaseAbstractTask.java │ │ ├── Key.java │ │ ├── ModeType.java │ │ ├── NormalTask.java │ │ └── NullModeType.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hbase11xreader/ │ ├── doc/ │ │ ├── .gitkeep │ │ └── hbase11xreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── hbase11xreader/ │ │ ├── ColumnType.java │ │ ├── Constant.java │ │ ├── Hbase11xHelper.java │ │ ├── Hbase11xReader.java │ │ ├── Hbase11xReaderErrorCode.java │ │ ├── HbaseAbstractTask.java │ │ ├── HbaseColumnCell.java │ │ ├── Key.java │ │ ├── ModeType.java │ │ ├── MultiVersionDynamicColumnTask.java │ │ ├── MultiVersionFixedColumnTask.java │ │ ├── MultiVersionTask.java │ │ └── NormalTask.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hbase11xsqlreader/ │ ├── doc/ │ │ └── hbase11xsqlreader.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── reader/ │ │ │ └── hbase11xsqlreader/ │ │ │ ├── HadoopSerializationUtil.java │ │ │ ├── HbaseSQLHelper.java │ │ │ ├── HbaseSQLReader.java │ │ │ ├── HbaseSQLReaderConfig.java │ │ │ ├── HbaseSQLReaderErrorCode.java │ │ │ ├── HbaseSQLReaderTask.java │ │ │ ├── Key.java │ │ │ ├── LocalStrings.properties │ │ │ ├── LocalStrings_en_US.properties │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ └── LocalStrings_zh_CN.properties │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── plugin/ │ └── reader/ │ └── hbase11xsqlreader/ │ ├── HbaseSQLHelperTest.java │ └── HbaseSQLReaderTaskTest.java ├── hbase11xsqlwriter/ │ ├── doc/ │ │ └── hbase11xsqlwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── hbase11xsqlwriter/ │ │ ├── Constant.java │ │ ├── HbaseSQLHelper.java │ │ ├── HbaseSQLWriter.java │ │ ├── HbaseSQLWriterConfig.java │ │ ├── HbaseSQLWriterErrorCode.java │ │ ├── HbaseSQLWriterTask.java │ │ ├── Key.java │ │ ├── NullModeType.java │ │ └── ThinClientPTable.java │ └── resources/ │ └── plugin.json ├── hbase11xwriter/ │ ├── doc/ │ │ ├── .gitkeep │ │ └── hbase11xwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── hbase11xwriter/ │ │ ├── ColumnType.java │ │ ├── Constant.java │ │ ├── Hbase11xHelper.java │ │ ├── Hbase11xWriter.java │ │ ├── Hbase11xWriterErrorCode.java │ │ ├── HbaseAbstractTask.java │ │ ├── Key.java │ │ ├── ModeType.java │ │ ├── MultiVersionTask.java │ │ ├── NormalTask.java │ │ └── NullModeType.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hbase20xsqlreader/ │ ├── doc/ │ │ └── hbase20xsqlreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── hbase20xsqlreader/ │ │ ├── Constant.java │ │ ├── HBase20SQLReaderHelper.java │ │ ├── HBase20xSQLReader.java │ │ ├── HBase20xSQLReaderErrorCode.java │ │ ├── HBase20xSQLReaderTask.java │ │ └── Key.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hbase20xsqlwriter/ │ ├── doc/ │ │ └── hbase20xsqlwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── hbase20xsqlwriter/ │ │ ├── Constant.java │ │ ├── HBase20xSQLHelper.java │ │ ├── HBase20xSQLWriter.java │ │ ├── HBase20xSQLWriterErrorCode.java │ │ ├── HBase20xSQLWriterTask.java │ │ ├── Key.java │ │ └── NullModeType.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hdfsreader/ │ ├── doc/ │ │ └── hdfsreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── hdfsreader/ │ │ ├── Constant.java │ │ ├── DFSUtil.java │ │ ├── HdfsFileType.java │ │ ├── HdfsPathFilter.java │ │ ├── HdfsReader.java │ │ ├── HdfsReaderErrorCode.java │ │ ├── Key.java │ │ ├── ParquetMessageHelper.java │ │ └── ParquetMeta.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hdfswriter/ │ ├── doc/ │ │ └── hdfswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── hdfswriter/ │ │ ├── Constant.java │ │ ├── HdfsHelper.java │ │ ├── HdfsWriter.java │ │ ├── HdfsWriterErrorCode.java │ │ ├── Key.java │ │ ├── ParquetFileProccessor.java │ │ ├── ParquetFileSupport.java │ │ └── SupportHiveDataType.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── hologresjdbcwriter/ │ ├── doc/ │ │ └── hologresjdbcwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── hologresjdbcwriter/ │ │ ├── BaseWriter.java │ │ ├── Constant.java │ │ ├── HologresJdbcWriter.java │ │ ├── Key.java │ │ └── util/ │ │ ├── ConfLoader.java │ │ ├── OriginalConfPretreatmentUtil.java │ │ └── WriterUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── introduction.md ├── kingbaseesreader/ │ ├── doc/ │ │ └── kingbaseesreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── kingbaseesreader/ │ │ ├── Constant.java │ │ └── KingbaseesReader.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── kingbaseeswriter/ │ ├── doc/ │ │ └── kingbaseeswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── kingbaseeswriter/ │ │ └── KingbaseesWriter.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── kuduwriter/ │ ├── README.md │ ├── doc/ │ │ └── kuduwirter.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── q1/ │ │ │ ├── datax/ │ │ │ │ └── plugin/ │ │ │ │ └── writer/ │ │ │ │ └── kudu11xwriter/ │ │ │ │ ├── ColumnType.java │ │ │ │ ├── Constant.java │ │ │ │ ├── InsertModeType.java │ │ │ │ ├── Key.java │ │ │ │ ├── Kudu11xHelper.java │ │ │ │ ├── Kudu11xWriter.java │ │ │ │ ├── Kudu11xWriterErrorcode.java │ │ │ │ └── KuduWriterTask.java │ │ │ └── kudu/ │ │ │ └── conf/ │ │ │ └── KuduConfig.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ └── java/ │ └── com/ │ └── dai/ │ └── test.java ├── license.txt ├── loghubreader/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── loghubreader/ │ │ ├── Constant.java │ │ ├── Key.java │ │ ├── LogHubReader.java │ │ └── LogHubReaderErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── loghubwriter/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── loghubwriter/ │ │ ├── Key.java │ │ ├── LogHubWriter.java │ │ └── LogHubWriterErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── milvuswriter/ │ ├── doc/ │ │ └── milvuswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── milvuswriter/ │ │ ├── KeyConstant.java │ │ ├── MilvusBufferWriter.java │ │ ├── MilvusClient.java │ │ ├── MilvusColumn.java │ │ ├── MilvusCreateCollection.java │ │ ├── MilvusWriter.java │ │ ├── MilvusWriterErrorCode.java │ │ └── enums/ │ │ ├── SchemaCreateModeEnum.java │ │ └── WriteModeEnum.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── mongodbreader/ │ ├── doc/ │ │ └── mongodbreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── mongodbreader/ │ │ ├── KeyConstant.java │ │ ├── MongoDBReader.java │ │ ├── MongoDBReaderErrorCode.java │ │ └── util/ │ │ ├── CollectionSplitUtil.java │ │ └── MongoUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── mongodbwriter/ │ ├── doc/ │ │ └── mongodbwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── mongodbwriter/ │ │ ├── KeyConstant.java │ │ ├── MongoDBWriter.java │ │ ├── MongoDBWriterErrorCode.java │ │ └── util/ │ │ └── MongoUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── mysqlreader/ │ ├── doc/ │ │ └── mysqlreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── mysqlreader/ │ │ ├── MysqlReader.java │ │ └── MysqlReaderErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── mysqlwriter/ │ ├── doc/ │ │ └── mysqlwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── mysqlwriter/ │ │ └── MysqlWriter.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── neo4jwriter/ │ ├── doc/ │ │ └── neo4jwriter.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── writer/ │ │ │ └── neo4jwriter/ │ │ │ ├── Neo4jClient.java │ │ │ ├── Neo4jWriter.java │ │ │ ├── adapter/ │ │ │ │ ├── DateAdapter.java │ │ │ │ └── ValueAdapter.java │ │ │ ├── config/ │ │ │ │ ├── ConfigConstants.java │ │ │ │ ├── Neo4jProperty.java │ │ │ │ └── Option.java │ │ │ ├── element/ │ │ │ │ └── PropertyType.java │ │ │ └── exception/ │ │ │ └── Neo4jErrorCode.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ ├── Neo4jWriterTest.java │ │ └── mock/ │ │ ├── MockRecord.java │ │ └── MockUtil.java │ └── resources/ │ ├── allTypeFieldNode.json │ ├── dynamicLabel.json │ ├── relationship.json │ └── streamreader2neo4j.json ├── obhbasereader/ │ ├── doc/ │ │ └── obhbasereader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── obhbasereader/ │ │ ├── Constant.java │ │ ├── HTableManager.java │ │ ├── HbaseColumnCell.java │ │ ├── HbaseReaderErrorCode.java │ │ ├── Key.java │ │ ├── LocalStrings.properties │ │ ├── LocalStrings_en_US.properties │ │ ├── LocalStrings_ja_JP.properties │ │ ├── LocalStrings_zh_CN.properties │ │ ├── LocalStrings_zh_HK.properties │ │ ├── LocalStrings_zh_TW.properties │ │ ├── ObHbaseReader.java │ │ ├── enums/ │ │ │ ├── ColumnType.java │ │ │ ├── FetchVersion.java │ │ │ └── ModeType.java │ │ ├── ext/ │ │ │ └── ServerConnectInfo.java │ │ ├── task/ │ │ │ ├── AbstractHbaseTask.java │ │ │ ├── AbstractScanReader.java │ │ │ ├── SQLNormalModeReader.java │ │ │ ├── ScanMultiVersionReader.java │ │ │ └── ScanNormalModeReader.java │ │ └── util/ │ │ ├── HbaseSplitUtil.java │ │ ├── LocalStrings.properties │ │ ├── LocalStrings_en_US.properties │ │ ├── LocalStrings_ja_JP.properties │ │ ├── LocalStrings_zh_CN.properties │ │ ├── LocalStrings_zh_HK.properties │ │ ├── LocalStrings_zh_TW.properties │ │ ├── ObHbaseReaderUtil.java │ │ └── SqlReaderSplitUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── obhbasewriter/ │ ├── doc/ │ │ └── obhbasewriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── obhbasewriter/ │ │ ├── ColumnType.java │ │ ├── Config.java │ │ ├── ConfigKey.java │ │ ├── ConfigValidator.java │ │ ├── Constant.java │ │ ├── Hbase094xWriterErrorCode.java │ │ ├── LocalStrings.properties │ │ ├── LocalStrings_en_US.properties │ │ ├── LocalStrings_ja_JP.properties │ │ ├── LocalStrings_zh_CN.properties │ │ ├── LocalStrings_zh_HK.properties │ │ ├── LocalStrings_zh_TW.properties │ │ ├── ModeType.java │ │ ├── NullModeType.java │ │ ├── ObHTableInfo.java │ │ ├── ObHbaseWriter.java │ │ ├── ext/ │ │ │ ├── LocalStrings.properties │ │ │ ├── LocalStrings_en_US.properties │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ ├── ObDataSourceErrorCode.java │ │ │ ├── ObHbaseTableHolder.java │ │ │ └── ServerConnectInfo.java │ │ ├── task/ │ │ │ ├── LocalStrings.properties │ │ │ ├── LocalStrings_en_US.properties │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ ├── MultiVersionWriteTask.java │ │ │ ├── NormalWriteTask.java │ │ │ ├── ObHBaseWriteTask.java │ │ │ └── PutTask.java │ │ └── util/ │ │ ├── LocalStrings.properties │ │ ├── LocalStrings_en_US.properties │ │ ├── LocalStrings_ja_JP.properties │ │ ├── LocalStrings_zh_CN.properties │ │ ├── LocalStrings_zh_HK.properties │ │ ├── LocalStrings_zh_TW.properties │ │ └── ObHbaseWriterUtils.java │ └── resources/ │ └── plugin.json ├── oceanbasev10reader/ │ ├── doc/ │ │ └── oceanbasev10reader.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── reader/ │ │ │ └── oceanbasev10reader/ │ │ │ ├── Config.java │ │ │ ├── OceanBaseReader.java │ │ │ ├── ext/ │ │ │ │ ├── Constant.java │ │ │ │ ├── ObReaderKey.java │ │ │ │ ├── ReaderJob.java │ │ │ │ └── ReaderTask.java │ │ │ └── util/ │ │ │ ├── ExecutorTemplate.java │ │ │ ├── ObReaderUtils.java │ │ │ ├── PartInfo.java │ │ │ ├── PartType.java │ │ │ ├── PartitionSplitUtil.java │ │ │ └── TaskContext.java │ │ └── resources/ │ │ └── plugin.json │ └── test/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── plugin/ │ └── reader/ │ └── oceanbasev10reader/ │ └── util/ │ └── ObReaderUtilsTest.java ├── oceanbasev10writer/ │ ├── doc/ │ │ └── oceanbasev10writer.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── oceanbasev10writer/ │ │ ├── Config.java │ │ ├── OceanBaseV10Writer.java │ │ ├── common/ │ │ │ ├── Table.java │ │ │ └── TableCache.java │ │ ├── directPath/ │ │ │ ├── AbstractRestrictedConnection.java │ │ │ ├── AbstractRestrictedPreparedStatement.java │ │ │ ├── DirectLoaderBuilder.java │ │ │ ├── DirectPathConnection.java │ │ │ ├── DirectPathConstants.java │ │ │ ├── DirectPathPreparedStatement.java │ │ │ └── ObTableDirectLoad.java │ │ ├── ext/ │ │ │ ├── AbstractConnHolder.java │ │ │ ├── ConnHolder.java │ │ │ ├── DataBaseWriterBuffer.java │ │ │ ├── DirectPathAbstractConnHolder.java │ │ │ ├── DirectPathConnHolder.java │ │ │ ├── OBDataSourceV10.java │ │ │ ├── OCJConnHolder.java │ │ │ ├── ObClientConnHolder.java │ │ │ ├── ObDataSourceErrorCode.java │ │ │ └── ServerConnectInfo.java │ │ ├── part/ │ │ │ ├── IObPartCalculator.java │ │ │ ├── ObPartitionCalculatorV1.java │ │ │ └── ObPartitionCalculatorV2.java │ │ ├── task/ │ │ │ ├── AbstractInsertTask.java │ │ │ ├── ColumnMetaCache.java │ │ │ ├── ConcurrentTableWriterTask.java │ │ │ ├── DirectPathInsertTask.java │ │ │ ├── InsertTask.java │ │ │ ├── SingleTableWriterTask.java │ │ │ └── WriterThreadPool.java │ │ └── util/ │ │ ├── DbUtils.java │ │ └── ObWriterUtils.java │ └── resources/ │ └── plugin.json ├── ocswriter/ │ ├── doc/ │ │ └── ocswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── ocswriter/ │ │ ├── Key.java │ │ ├── OcsWriter.java │ │ └── utils/ │ │ ├── CommonUtils.java │ │ ├── ConfigurationChecker.java │ │ └── OcsWriterErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── odpsreader/ │ ├── doc/ │ │ └── odpsreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── odpsreader/ │ │ ├── ColumnType.java │ │ ├── Constant.java │ │ ├── InternalColumnInfo.java │ │ ├── Key.java │ │ ├── LocalStrings.properties │ │ ├── OdpsReader.java │ │ ├── OdpsReaderErrorCode.java │ │ ├── ReaderProxy.java │ │ └── util/ │ │ ├── LocalStrings.properties │ │ ├── OdpsExceptionMsg.java │ │ ├── OdpsSplitUtil.java │ │ ├── OdpsUtil.java │ │ ├── SqliteUtil.java │ │ └── UserConfiguredPartitionClassification.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── odpswriter/ │ ├── doc/ │ │ └── odpswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── odpswriter/ │ │ ├── Constant.java │ │ ├── DateTransForm.java │ │ ├── Key.java │ │ ├── LocalStrings.properties │ │ ├── OdpsWriter.java │ │ ├── OdpsWriterErrorCode.java │ │ ├── OdpsWriterProxy.java │ │ ├── model/ │ │ │ ├── PartitionInfo.java │ │ │ ├── UserDefinedFunction.java │ │ │ └── UserDefinedFunctionRule.java │ │ └── util/ │ │ ├── CustomPartitionUtils.java │ │ ├── LocalStrings.properties │ │ ├── OdpsExceptionMsg.java │ │ └── OdpsUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── opentsdbreader/ │ ├── doc/ │ │ └── opentsdbreader.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── reader/ │ │ │ ├── conn/ │ │ │ │ ├── CliQuery.java │ │ │ │ ├── Connection4TSDB.java │ │ │ │ ├── DataPoint4TSDB.java │ │ │ │ ├── DumpSeries.java │ │ │ │ ├── OpenTSDBConnection.java │ │ │ │ └── OpenTSDBDump.java │ │ │ ├── opentsdbreader/ │ │ │ │ ├── Constant.java │ │ │ │ ├── Key.java │ │ │ │ ├── OpenTSDBReader.java │ │ │ │ └── OpenTSDBReaderErrorCode.java │ │ │ └── util/ │ │ │ ├── HttpUtils.java │ │ │ ├── TSDBUtils.java │ │ │ └── TimeUtils.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── plugin/ │ └── reader/ │ ├── conn/ │ │ └── OpenTSDBConnectionTest.java │ └── util/ │ ├── Const.java │ ├── HttpUtilsTest.java │ ├── TSDBTest.java │ └── TimeUtilsTest.java ├── oraclereader/ │ ├── doc/ │ │ └── oraclereader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── oraclereader/ │ │ ├── Constant.java │ │ ├── OracleReader.java │ │ └── OracleReaderErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── oraclewriter/ │ ├── doc/ │ │ └── oraclewriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── oraclewriter/ │ │ ├── OracleWriter.java │ │ └── OracleWriterErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── oscarwriter/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── oscarwriter/ │ │ ├── OscarWriter.java │ │ └── OscarWriterErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── ossreader/ │ ├── doc/ │ │ └── ossreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── ossreader/ │ │ ├── Constant.java │ │ ├── Key.java │ │ ├── OssInputStream.java │ │ ├── OssReader.java │ │ ├── OssReaderErrorCode.java │ │ └── util/ │ │ ├── HdfsParquetUtil.java │ │ ├── OssSplitUtil.java │ │ └── OssUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── osswriter/ │ ├── doc/ │ │ └── osswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── osswriter/ │ │ ├── Constant.java │ │ ├── Key.java │ │ ├── OssSingleObject.java │ │ ├── OssWriter.java │ │ ├── OssWriterErrorCode.java │ │ ├── OssWriterProxy.java │ │ ├── parquet/ │ │ │ ├── ParquetFileProccessor.java │ │ │ └── ParquetFileSupport.java │ │ └── util/ │ │ ├── HandlerUtil.java │ │ ├── HdfsParquetUtil.java │ │ └── OssUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── otsreader/ │ ├── doc/ │ │ └── otsreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── otsreader/ │ │ ├── IOtsReaderMasterProxy.java │ │ ├── IOtsReaderSlaveProxy.java │ │ ├── OtsReader.java │ │ ├── OtsReaderMasterProxy.java │ │ ├── OtsReaderSlaveMetaProxy.java │ │ ├── OtsReaderSlaveMultiVersionProxy.java │ │ ├── OtsReaderSlaveNormalProxy.java │ │ ├── OtsReaderSlaveProxyOld.java │ │ ├── adaptor/ │ │ │ ├── ColumnAdaptor.java │ │ │ └── PrimaryKeyValueAdaptor.java │ │ ├── callable/ │ │ │ ├── GetFirstRowPrimaryKeyCallable.java │ │ │ ├── GetRangeCallable.java │ │ │ ├── GetRangeCallableOld.java │ │ │ ├── GetTableMetaCallable.java │ │ │ ├── GetTimeseriesSplitCallable.java │ │ │ └── ScanTimeseriesDataCallable.java │ │ ├── model/ │ │ │ ├── DefaultNoRetry.java │ │ │ ├── OTSColumn.java │ │ │ ├── OTSConf.java │ │ │ ├── OTSConst.java │ │ │ ├── OTSCriticalException.java │ │ │ ├── OTSErrorCode.java │ │ │ ├── OTSMode.java │ │ │ ├── OTSMultiVersionConf.java │ │ │ ├── OTSPrimaryKeyColumn.java │ │ │ └── OTSRange.java │ │ └── utils/ │ │ ├── Common.java │ │ ├── CommonOld.java │ │ ├── CompareHelper.java │ │ ├── Constant.java │ │ ├── DefaultNoRetry.java │ │ ├── GsonParser.java │ │ ├── Key.java │ │ ├── OtsHelper.java │ │ ├── OtsReaderError.java │ │ ├── ParamChecker.java │ │ ├── ParamCheckerOld.java │ │ ├── ParamParser.java │ │ ├── RangeSplit.java │ │ ├── ReaderModelParser.java │ │ ├── RetryHelper.java │ │ ├── RetryHelperOld.java │ │ └── TranformHelper.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── otsstreamreader/ │ ├── README.md │ ├── pom.xml │ ├── src/ │ │ └── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── reader/ │ │ │ └── otsstreamreader/ │ │ │ └── internal/ │ │ │ ├── LocalStrings.properties │ │ │ ├── LocalStrings_en_US.properties │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ ├── OTSReaderError.java │ │ │ ├── OTSStreamReader.java │ │ │ ├── OTSStreamReaderException.java │ │ │ ├── OTSStreamReaderMasterProxy.java │ │ │ ├── OTSStreamReaderSlaveProxy.java │ │ │ ├── config/ │ │ │ │ ├── LocalStrings.properties │ │ │ │ ├── LocalStrings_en_US.properties │ │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ │ ├── Mode.java │ │ │ │ ├── OTSRetryStrategyForStreamReader.java │ │ │ │ ├── OTSStreamReaderConfig.java │ │ │ │ ├── OTSStreamReaderConstants.java │ │ │ │ └── StatusTableConstants.java │ │ │ ├── core/ │ │ │ │ ├── CheckpointTimeTracker.java │ │ │ │ ├── IStreamRecordSender.java │ │ │ │ ├── LocalStrings.properties │ │ │ │ ├── LocalStrings_en_US.properties │ │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ │ ├── MultiVerModeRecordSender.java │ │ │ │ ├── OTSStreamReaderChecker.java │ │ │ │ ├── RecordProcessor.java │ │ │ │ ├── ShardStatusChecker.java │ │ │ │ └── SingleVerAndUpOnlyModeRecordSender.java │ │ │ ├── model/ │ │ │ │ ├── LocalStrings.properties │ │ │ │ ├── LocalStrings_en_US.properties │ │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ │ ├── OTSErrorCode.java │ │ │ │ ├── OTSStreamJobShard.java │ │ │ │ ├── ShardCheckpoint.java │ │ │ │ └── StreamJob.java │ │ │ └── utils/ │ │ │ ├── ColumnValueTransformHelper.java │ │ │ ├── GsonParser.java │ │ │ ├── LocalStrings.properties │ │ │ ├── LocalStrings_en_US.properties │ │ │ ├── LocalStrings_ja_JP.properties │ │ │ ├── LocalStrings_zh_CN.properties │ │ │ ├── LocalStrings_zh_HK.properties │ │ │ ├── LocalStrings_zh_TW.properties │ │ │ ├── OTSHelper.java │ │ │ ├── OTSStreamJobShardUtil.java │ │ │ ├── ParamChecker.java │ │ │ └── TimeUtils.java │ │ └── resources/ │ │ ├── log4j2.xml │ │ └── plugin.json │ └── tools/ │ ├── config.json │ ├── tablestore_streamreader_console.py │ └── tabulate.py ├── otswriter/ │ ├── doc/ │ │ └── otswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── otswriter/ │ │ ├── IOtsWriterMasterProxy.java │ │ ├── IOtsWriterSlaveProxy.java │ │ ├── Key.java │ │ ├── OTSCriticalException.java │ │ ├── OTSErrorCode.java │ │ ├── OtsWriter.java │ │ ├── OtsWriterError.java │ │ ├── OtsWriterMasterProxy.java │ │ ├── OtsWriterSlaveProxyMultiversion.java │ │ ├── OtsWriterSlaveProxyNormal.java │ │ ├── OtsWriterSlaveProxyOld.java │ │ ├── callable/ │ │ │ ├── BatchWriteRowCallable.java │ │ │ ├── GetTableMetaCallable.java │ │ │ ├── GetTableMetaCallableOld.java │ │ │ ├── PutRowChangeCallable.java │ │ │ ├── PutTimeseriesDataCallable.java │ │ │ └── UpdateRowChangeCallable.java │ │ ├── model/ │ │ │ ├── OTSAttrColumn.java │ │ │ ├── OTSBatchWriteRowTaskManager.java │ │ │ ├── OTSBatchWriterRowTask.java │ │ │ ├── OTSBlockingExecutor.java │ │ │ ├── OTSConf.java │ │ │ ├── OTSConst.java │ │ │ ├── OTSErrorMessage.java │ │ │ ├── OTSLine.java │ │ │ ├── OTSMode.java │ │ │ ├── OTSOpType.java │ │ │ ├── OTSSendBuffer.java │ │ │ ├── OTSTaskManagerInterface.java │ │ │ ├── OTSTimeseriesRowTask.java │ │ │ ├── OTSTimeseriesRowTaskManager.java │ │ │ ├── RowDeleteChangeWithRecord.java │ │ │ ├── RowPutChangeWithRecord.java │ │ │ └── RowUpdateChangeWithRecord.java │ │ └── utils/ │ │ ├── CalculateHelper.java │ │ ├── CollectorUtil.java │ │ ├── ColumnConversion.java │ │ ├── ColumnConversionOld.java │ │ ├── Common.java │ │ ├── CommonOld.java │ │ ├── DefaultNoRetry.java │ │ ├── GsonParser.java │ │ ├── LineAndError.java │ │ ├── ParamChecker.java │ │ ├── ParseRecord.java │ │ ├── RetryHelper.java │ │ ├── WithRecord.java │ │ ├── WriterModelParser.java │ │ └── WriterRetryPolicy.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── package.xml ├── plugin-rdbms-util/ │ ├── pom.xml │ └── src/ │ └── main/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── plugin/ │ └── rdbms/ │ ├── reader/ │ │ ├── CommonRdbmsReader.java │ │ ├── Constant.java │ │ ├── Key.java │ │ ├── ResultSetReadProxy.java │ │ └── util/ │ │ ├── HintUtil.java │ │ ├── ObVersion.java │ │ ├── OriginalConfPretreatmentUtil.java │ │ ├── PreCheckTask.java │ │ ├── ReaderSplitUtil.java │ │ └── SingleTableSplitUtil.java │ ├── util/ │ │ ├── ConnectionFactory.java │ │ ├── Constant.java │ │ ├── DBUtil.java │ │ ├── DBUtilErrorCode.java │ │ ├── DataBaseType.java │ │ ├── JdbcConnectionFactory.java │ │ ├── RdbmsException.java │ │ ├── RdbmsRangeSplitWrap.java │ │ ├── SplitedSlice.java │ │ └── TableExpandUtil.java │ └── writer/ │ ├── CommonRdbmsWriter.java │ ├── Constant.java │ ├── Key.java │ ├── MysqlWriterErrorCode.java │ └── util/ │ ├── OriginalConfPretreatmentUtil.java │ └── WriterUtil.java ├── plugin-unstructured-storage-util/ │ ├── pom.xml │ └── src/ │ └── main/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── plugin/ │ └── unstructuredstorage/ │ ├── FileFormat.java │ ├── LocalStrings.properties │ ├── LocalStrings_en_US.properties │ ├── LocalStrings_ja_JP.properties │ ├── LocalStrings_zh_CN.properties │ ├── LocalStrings_zh_HK.properties │ ├── LocalStrings_zh_TW.properties │ ├── reader/ │ │ ├── ColumnEntry.java │ │ ├── Constant.java │ │ ├── ExpandLzopInputStream.java │ │ ├── Key.java │ │ ├── UnstructuredStorageReaderErrorCode.java │ │ ├── UnstructuredStorageReaderUtil.java │ │ ├── ZipCycleInputStream.java │ │ ├── binaryFileUtil/ │ │ │ ├── BinaryFileReaderUtil.java │ │ │ └── ByteUtils.java │ │ └── split/ │ │ ├── StartEndPair.java │ │ └── UnstructuredSplitUtil.java │ ├── util/ │ │ ├── ColumnTypeUtil.java │ │ └── HdfsUtil.java │ └── writer/ │ ├── Constant.java │ ├── DataXCsvWriter.java │ ├── Key.java │ ├── SqlWriter.java │ ├── TextCsvWriterManager.java │ ├── UnstructuredStorageWriterErrorCode.java │ ├── UnstructuredStorageWriterUtil.java │ ├── UnstructuredWriter.java │ └── binaryFileUtil/ │ ├── BinaryFileWriterErrorCode.java │ └── BinaryFileWriterUtil.java ├── pom.xml ├── postgresqlreader/ │ ├── doc/ │ │ └── postgresqlreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── postgresqlreader/ │ │ ├── Constant.java │ │ └── PostgresqlReader.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── postgresqlwriter/ │ ├── doc/ │ │ └── postgresqlwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── postgresqlwriter/ │ │ └── PostgresqlWriter.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── rdbmsreader/ │ ├── doc/ │ │ └── rdbmsreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── rdbmsreader/ │ │ ├── Constant.java │ │ ├── RdbmsReader.java │ │ └── SubCommonRdbmsReader.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── rdbmswriter/ │ ├── doc/ │ │ └── rdbmswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── rdbmswriter/ │ │ ├── RdbmsWriter.java │ │ └── SubCommonRdbmsWriter.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── rpm/ │ ├── t_dp_dw_datax_3_core_all-build.sh │ └── t_dp_dw_datax_3_hook_dqc-build.sh ├── selectdbwriter/ │ ├── doc/ │ │ ├── selectdbwriter.md │ │ └── stream2selectdb.json │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── selectdbwriter/ │ │ ├── BaseResponse.java │ │ ├── CopyIntoResp.java │ │ ├── CopySQLBuilder.java │ │ ├── DelimiterParser.java │ │ ├── HttpPostBuilder.java │ │ ├── HttpPutBuilder.java │ │ ├── Keys.java │ │ ├── SelectdbBaseCodec.java │ │ ├── SelectdbCodec.java │ │ ├── SelectdbCodecFactory.java │ │ ├── SelectdbCopyIntoObserver.java │ │ ├── SelectdbCsvCodec.java │ │ ├── SelectdbJsonCodec.java │ │ ├── SelectdbUtil.java │ │ ├── SelectdbWriter.java │ │ ├── SelectdbWriterException.java │ │ ├── SelectdbWriterManager.java │ │ └── WriterTuple.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── sqlserverreader/ │ ├── doc/ │ │ └── sqlserverreader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── sqlserverreader/ │ │ ├── Constant.java │ │ ├── Key.java │ │ ├── SqlServerReader.java │ │ └── SqlServerReaderErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── sqlserverwriter/ │ ├── doc/ │ │ └── sqlserverwriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── sqlserverwriter/ │ │ ├── SqlServerWriter.java │ │ └── SqlServerWriterErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── starrocksreader/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── starrocksreader/ │ │ └── StarRocksReader.java │ └── resources/ │ └── plugin.json ├── starrockswriter/ │ ├── doc/ │ │ └── starrockswriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── starrocks/ │ │ └── connector/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── starrockswriter/ │ │ ├── StarRocksWriter.java │ │ ├── StarRocksWriterOptions.java │ │ ├── manager/ │ │ │ ├── StarRocksFlushTuple.java │ │ │ ├── StarRocksStreamLoadFailedException.java │ │ │ ├── StarRocksStreamLoadVisitor.java │ │ │ └── StarRocksWriterManager.java │ │ ├── row/ │ │ │ ├── StarRocksBaseSerializer.java │ │ │ ├── StarRocksCsvSerializer.java │ │ │ ├── StarRocksDelimiterParser.java │ │ │ ├── StarRocksISerializer.java │ │ │ ├── StarRocksJsonSerializer.java │ │ │ └── StarRocksSerializerFactory.java │ │ └── util/ │ │ └── StarRocksWriterUtil.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── streamreader/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── streamreader/ │ │ ├── Constant.java │ │ ├── Key.java │ │ ├── StreamReader.java │ │ └── StreamReaderErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── streamwriter/ │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── streamwriter/ │ │ ├── Key.java │ │ ├── StreamWriter.java │ │ └── StreamWriterErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── sybasereader/ │ ├── doc/ │ │ └── sybasereader.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── reader/ │ │ │ └── sybasereader/ │ │ │ ├── Constants.java │ │ │ └── SybaseReader.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── plugin/ │ └── reader/ │ └── sybasereader/ │ └── SybaseDatabaseUnitTest.java ├── sybasewriter/ │ ├── doc/ │ │ └── sybasewriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ └── java/ │ ├── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── sybasewriter/ │ │ └── SybaseWriter.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── tdenginereader/ │ ├── doc/ │ │ └── tdenginereader-CN.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── reader/ │ │ │ ├── TDengineReader.java │ │ │ └── TDengineReaderErrorCode.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ ├── TDengine2DMTest.java │ │ ├── TDengine2StreamTest.java │ │ └── TDengineReaderTest.java │ └── resources/ │ ├── t2dm.json │ ├── t2stream-1.json │ └── t2stream-2.json ├── tdenginewriter/ │ ├── doc/ │ │ ├── tdenginewriter-CN.md │ │ └── tdenginewriter.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── writer/ │ │ │ └── tdenginewriter/ │ │ │ ├── ColumnMeta.java │ │ │ ├── Constants.java │ │ │ ├── DataHandler.java │ │ │ ├── DefaultDataHandler.java │ │ │ ├── Key.java │ │ │ ├── OpentsdbDataHandler.java │ │ │ ├── SchemaManager.java │ │ │ ├── TDengineWriter.java │ │ │ ├── TDengineWriterErrorCode.java │ │ │ ├── TableMeta.java │ │ │ ├── TableType.java │ │ │ └── TimestampPrecision.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── tdenginewriter/ │ │ ├── Csv2TDengineTest.java │ │ ├── DM2TDengineTest.java │ │ ├── DefaultDataHandlerTest.java │ │ ├── Mongo2TDengineTest.java │ │ ├── Mysql2TDengineTest.java │ │ ├── Opentsdb2TDengineTest.java │ │ ├── SchemaManagerTest.java │ │ ├── Stream2TDengineTest.java │ │ ├── TDengine2TDengineTest.java │ │ └── TDengineWriterTest.java │ └── resources/ │ ├── csv2t.json │ ├── defaultJob.json │ ├── dm-schema.sql │ ├── dm2t-1.json │ ├── dm2t-2.json │ ├── dm2t-3.json │ ├── dm2t-4.json │ ├── incremental_sync/ │ │ ├── clean_env.sh │ │ ├── csv2t-jni.json │ │ ├── csv2t-restful.json │ │ ├── dm2t-jni.json │ │ ├── dm2t-restful.json │ │ ├── dm2t-update.json │ │ ├── dm2t_sync.sh │ │ ├── t2dm-jni.json │ │ ├── t2dm-restful.json │ │ └── upload.sh │ ├── m2t-1.json │ ├── mongo2t.json │ ├── o2t-1.json │ ├── t2t-1.json │ ├── t2t-2.json │ ├── t2t-3.json │ ├── t2t-4.json │ └── weather.csv ├── transformer/ │ ├── doc/ │ │ ├── .gitkeep │ │ └── transformer.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── transformer/ │ ├── ComplexTransformer.java │ └── Transformer.java ├── tsdbreader/ │ ├── doc/ │ │ └── tsdbreader.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── reader/ │ │ │ └── tsdbreader/ │ │ │ ├── Constant.java │ │ │ ├── Key.java │ │ │ ├── TSDBReader.java │ │ │ ├── TSDBReaderErrorCode.java │ │ │ ├── conn/ │ │ │ │ ├── Connection4TSDB.java │ │ │ │ ├── DataPoint4MultiFieldsTSDB.java │ │ │ │ ├── DataPoint4TSDB.java │ │ │ │ ├── MultiFieldQueryResult.java │ │ │ │ ├── QueryResult.java │ │ │ │ ├── TSDBConnection.java │ │ │ │ └── TSDBDump.java │ │ │ └── util/ │ │ │ ├── HttpUtils.java │ │ │ ├── TSDBUtils.java │ │ │ └── TimeUtils.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── plugin/ │ └── reader/ │ └── tsdbreader/ │ ├── conn/ │ │ └── TSDBConnectionTest.java │ └── util/ │ ├── Const.java │ └── TimeUtilsTest.java ├── tsdbwriter/ │ ├── doc/ │ │ └── tsdbhttpwriter.md │ ├── pom.xml │ └── src/ │ ├── main/ │ │ ├── assembly/ │ │ │ └── package.xml │ │ ├── java/ │ │ │ └── com/ │ │ │ └── alibaba/ │ │ │ └── datax/ │ │ │ └── plugin/ │ │ │ └── writer/ │ │ │ ├── conn/ │ │ │ │ ├── Connection4TSDB.java │ │ │ │ ├── DataPoint4TSDB.java │ │ │ │ └── TSDBConnection.java │ │ │ ├── tsdbwriter/ │ │ │ │ ├── Constant.java │ │ │ │ ├── Key.java │ │ │ │ ├── SourceDBType.java │ │ │ │ ├── TSDBConverter.java │ │ │ │ ├── TSDBModel.java │ │ │ │ ├── TSDBWriter.java │ │ │ │ └── TSDBWriterErrorCode.java │ │ │ └── util/ │ │ │ ├── HttpUtils.java │ │ │ └── TSDBUtils.java │ │ └── resources/ │ │ ├── plugin.json │ │ └── plugin_job_template.json │ └── test/ │ └── java/ │ └── com/ │ └── alibaba/ │ └── datax/ │ └── plugin/ │ └── writer/ │ ├── conn/ │ │ └── TSDBConnectionTest.java │ └── util/ │ ├── Const.java │ ├── HttpUtilsTest.java │ └── TSDBTest.java ├── txtfilereader/ │ ├── doc/ │ │ └── txtfilereader.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── reader/ │ │ └── txtfilereader/ │ │ ├── Constant.java │ │ ├── Key.java │ │ ├── TxtFileReader.java │ │ └── TxtFileReaderErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json ├── txtfilewriter/ │ ├── doc/ │ │ └── txtfilewriter.md │ ├── pom.xml │ └── src/ │ └── main/ │ ├── assembly/ │ │ └── package.xml │ ├── java/ │ │ └── com/ │ │ └── alibaba/ │ │ └── datax/ │ │ └── plugin/ │ │ └── writer/ │ │ └── txtfilewriter/ │ │ ├── Key.java │ │ ├── TxtFileWriter.java │ │ └── TxtFileWriterErrorCode.java │ └── resources/ │ ├── plugin.json │ └── plugin_job_template.json └── userGuid.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # Created by .ignore support plugin (hsz.mobi) .DS_Store .AppleDouble .LSOverride Icon ._* .DocumentRevisions-V100 .fseventsd .Spotlight-V100 .TemporaryItems .Trashes .VolumeIcon.icns .com.apple.timemachine.donotpresent .AppleDB .AppleDesktop Network Trash Folder Temporary Items .apdisk *.class *.log *.ctxt .mtj.tmp/ *.jar *.war *.nar *.ear *.zip *.tar.gz *.rar hs_err_pid* .idea/**/workspace.xml .idea/**/tasks.xml .idea/**/dictionaries .idea/**/shelf .idea/**/dataSources/ .idea/**/dataSources.ids .idea/**/dataSources.local.xml .idea/**/sqlDataSources.xml .idea/**/dynamic.xml .idea/**/uiDesigner.xml .idea/**/dbnavigator.xml .idea/**/gradle.xml .idea/**/libraries cmake-build-debug/ cmake-build-release/ .idea/**/mongoSettings.xml *.iws out/ .idea_modules/ atlassian-ide-plugin.xml .idea/replstate.xml com_crashlytics_export_strings.xml crashlytics.properties crashlytics-build.properties fabric.properties .idea/httpRequests target/ pom.xml.tag pom.xml.releaseBackup pom.xml.versionsBackup pom.xml.next release.properties dependency-reduced-pom.xml buildNumber.properties .mvn/timing.properties !/.mvn/wrapper/maven-wrapper.jar .idea *.iml out gen### Python template __pycache__/ *.py[cod] *$py.class *.so .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST *.manifest *.spec pip-log.txt pip-delete-this-directory.txt htmlcov/ .tox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover .hypothesis/ .pytest_cache/ *.mo *.pot *.log local_settings.py db.sqlite3 instance/ .webassets-cache .scrapy docs/_build/ target/ .ipynb_checkpoints .python-version celerybeat-schedule *.sage.py .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ .spyderproject .spyproject .ropeproject /site .mypy_cache/ .metadata bin/ tmp/ *.tmp *.bak *.swp *~.nib local.properties .settings/ .loadpath .recommenders .externalToolBuilders/ *.launch *.pydevproject .cproject .autotools .factorypath .buildpath .target .tern-project .texlipse .springBeans .recommenders/ .cache-main .scala_dependencies .worksheet ================================================ FILE: NOTICE ================================================ ======================================================== DataX 是阿里云 DataWorks数据集成 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各种异构数据源之间高效的数据同步功能。 DataX is an open source offline data synchronization tool / platform widely used in Alibaba group and other companies. DataX implements efficient data synchronization between heterogeneous data sources including mysql, Oracle, oceanbase, sqlserver, postgre, HDFS, hive, ads, HBase, tablestore (OTS), maxcompute (ODPs), hologres, DRDS, etc. Copyright 1999-2022 Alibaba Group Holding Ltd. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. =================================================================== 文级别引用,按许可证 This product contains various third-party components under other open source licenses. This section summarizes those components and their licenses. GNU Lesser General Public License -------------------------------------- opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/CliQuery.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/Connection4TSDB.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/DataPoint4TSDB.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/DumpSeries.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBConnection.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBDump.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/Constant.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/Key.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/OpenTSDBReader.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/OpenTSDBReaderErrorCode.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/HttpUtils.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/TSDBUtils.java opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/TimeUtils.java =================================================================== ================================================ FILE: README.md ================================================ ![Datax-logo](https://github.com/alibaba/DataX/blob/master/images/DataX-logo.jpg) # DataX [![Leaderboard](https://img.shields.io/badge/DataX-%E6%9F%A5%E7%9C%8B%E8%B4%A1%E7%8C%AE%E6%8E%92%E8%A1%8C%E6%A6%9C-orange)](https://opensource.alibaba.com/contribution_leaderboard/details?projectValue=datax) DataX 是阿里云 [DataWorks数据集成](https://www.aliyun.com/product/bigdata/ide) 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS, databend 等各种异构数据源之间高效的数据同步功能。 # DataX 商业版本 阿里云DataWorks数据集成是DataX团队在阿里云上的商业化产品,致力于提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。目前已经支持云上近3000家客户,单日同步数据超过3万亿条。DataWorks数据集成目前支持离线50+种数据源,可以进行整库迁移、批量上云、增量同步、分库分表等各类同步解决方案。2020年更新实时同步能力,支持10+种数据源的读写任意组合。提供MySQL,Oracle等多种数据源到阿里云MaxCompute,Hologres等大数据引擎的一键全增量同步解决方案。 商业版本参见: https://www.aliyun.com/product/bigdata/ide # Features DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。 # DataX详细介绍 ##### 请参考:[DataX-Introduction](https://github.com/alibaba/DataX/blob/master/introduction.md) # Quick Start ##### Download [DataX下载地址](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202308/datax.tar.gz) ##### 请点击:[Quick Start](https://github.com/alibaba/DataX/blob/master/userGuid.md) # Support Data Channels DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:[DataX数据源参考指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels) | 类型 | 数据源 | Reader(读) | Writer(写) | 文档 | |--------------|---------------------------|:---------:|:---------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | RDBMS 关系型数据库 | MySQL | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md) | | | Oracle | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md) | | | OceanBase | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/oceanbasev10reader/doc/oceanbasev10reader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oceanbasev10writer/doc/oceanbasev10writer.md) | | | SQLServer | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md) | | | PostgreSQL | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md) | | | DRDS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md) | | | Kingbase | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md) | | | 通用RDBMS(支持所有关系型数据库) | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/rdbmsreader/doc/rdbmsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/rdbmswriter/doc/rdbmswriter.md) | | 阿里云数仓数据存储 | ODPS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpswriter/doc/odpswriter.md) | | | ADB | | √ | [写](https://github.com/alibaba/DataX/blob/master/adbmysqlwriter/doc/adbmysqlwriter.md) | | | ADS | | √ | [写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md) | | | OSS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md) | | | OCS | | √ | [写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md) | | | Hologres | | √ | [写](https://github.com/alibaba/DataX/blob/master/hologresjdbcwriter/doc/hologresjdbcwriter.md) | | | AnalyticDB For PostgreSQL | | √ | 写 | | 阿里云中间件 | datahub | √ | √ | 读 、写 | | | SLS | √ | √ | 读 、写 | | 图数据库 | 阿里云 GDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/gdbreader/doc/gdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/gdbwriter/doc/gdbwriter.md) | | | Neo4j | | √ | [写](https://github.com/alibaba/DataX/blob/master/neo4jwriter/doc/neo4jwriter.md) | | NoSQL数据存储 | OTS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md) | | | Hbase0.94 | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md) | | | Hbase1.1 | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md) | | | Phoenix4.x | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase11xsqlreader/doc/hbase11xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xsqlwriter/doc/hbase11xsqlwriter.md) | | | Phoenix5.x | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md) | | | MongoDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongodbwriter/doc/mongodbwriter.md) | | | Cassandra | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md) 、[写](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md) | | 数仓数据存储 | StarRocks | √ | √ | 读 、[写](https://github.com/alibaba/DataX/blob/master/starrockswriter/doc/starrockswriter.md) | | | ApacheDoris | | √ | [写](https://github.com/alibaba/DataX/blob/master/doriswriter/doc/doriswriter.md) | | | ClickHouse | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/clickhousereader/doc/clickhousereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/clickhousewriter/doc/clickhousewriter.md) | | | Databend | | √ | [写](https://github.com/alibaba/DataX/blob/master/databendwriter/doc/databendwriter.md) | | | Hive | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) | | | kudu | | √ | [写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) | | | selectdb | | √ | [写](https://github.com/alibaba/DataX/blob/master/selectdbwriter/doc/selectdbwriter.md) | | 无结构化数据存储 | TxtFile | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md) | | | FTP | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md) | | | HDFS | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md) | | | Elasticsearch | | √ | [写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md) | | 时间序列数据库 | OpenTSDB | √ | | [读](https://github.com/alibaba/DataX/blob/master/opentsdbreader/doc/opentsdbreader.md) | | | TSDB | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/tsdbreader/doc/tsdbreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/tsdbwriter/doc/tsdbhttpwriter.md) | | | TDengine | √ | √ | [读](https://github.com/alibaba/DataX/blob/master/tdenginereader/doc/tdenginereader-CN.md) 、[写](https://github.com/alibaba/DataX/blob/master/tdenginewriter/doc/tdenginewriter-CN.md) | # 阿里云DataWorks数据集成 目前DataX的已有能力已经全部融和进阿里云的数据集成,并且比DataX更加高效、安全,同时数据集成具备DataX不具备的其它高级特性和功能。可以理解为数据集成是DataX的全面升级的商业化用版本,为企业可以提供稳定、可靠、安全的数据传输服务。与DataX相比,数据集成主要有以下几大突出特点: 支持实时同步: - 功能简介:https://help.aliyun.com/document_detail/181912.html - 支持的数据源:https://help.aliyun.com/document_detail/146778.html - 支持数据处理:https://help.aliyun.com/document_detail/146777.html 离线同步数据源种类大幅度扩充: - 新增比如:DB2、Kafka、Hologres、MetaQ、SAPHANA、达梦等等,持续扩充中 - 离线同步支持的数据源:https://help.aliyun.com/document_detail/137670.html - 具备同步解决方案: - 解决方案系统:https://help.aliyun.com/document_detail/171765.html - 一键全增量:https://help.aliyun.com/document_detail/175676.html - 整库迁移:https://help.aliyun.com/document_detail/137809.html - 批量上云:https://help.aliyun.com/document_detail/146671.html - 更新更多能力请访问:https://help.aliyun.com/document_detail/137663.html - # 我要开发新的插件 请点击:[DataX插件开发宝典](https://github.com/alibaba/DataX/blob/master/dataxPluginDev.md) # 重要版本更新说明 DataX 后续计划月度迭代更新,也欢迎感兴趣的同学提交 Pull requests,月度更新内容如下。 - [datax_v202309](https://github.com/alibaba/DataX/releases/tag/datax_v202309) - 支持Phoenix 同步数据添加 where条件 - 支持华为 GuassDB读写插件 - 修复ClickReader 插件运行报错 Can't find bundle for base name - 增加 DataX调试模块 - 修复 orc空文件报错问题 - 优化obwriter性能 - txtfilewriter 增加导出为insert语句功能支持 - HdfsReader/HdfsWriter 支持parquet读写能力 - [datax_v202308](https://github.com/alibaba/DataX/releases/tag/datax_v202308) - OTS 插件更新 - databend 插件更新 - Oceanbase驱动修复 - [datax_v202306](https://github.com/alibaba/DataX/releases/tag/datax_v202306) - 精简代码 - 新增插件(neo4jwriter、clickhousewriter) - 优化插件、修复问题(oceanbase、hdfs、databend、txtfile) - [datax_v202303](https://github.com/alibaba/DataX/releases/tag/datax_v202303) - 精简代码 - 新增插件(adbmysqlwriter、databendwriter、selectdbwriter) - 优化插件、修复问题(sqlserver、hdfs、cassandra、kudu、oss) - fastjson 升级到 fastjson2 - [datax_v202210](https://github.com/alibaba/DataX/releases/tag/datax_v202210) - 涉及通道能力更新(OceanBase、Tdengine、Doris等) - [datax_v202209](https://github.com/alibaba/DataX/releases/tag/datax_v202209) - 涉及通道能力更新(MaxCompute、Datahub、SLS等)、安全漏洞更新、通用打包更新等 - [datax_v202205](https://github.com/alibaba/DataX/releases/tag/datax_v202205) - 涉及通道能力更新(MaxCompute、Hologres、OSS、Tdengine等)、安全漏洞更新、通用打包更新等 # 项目成员 核心Contributions: 言柏 、枕水、秋奇、青砾、一斅、云时 感谢天烬、光戈、祁然、巴真、静行对DataX做出的贡献。 # License This software is free to use under the Apache License [Apache license](https://github.com/alibaba/DataX/blob/master/license.txt). # 请及时提出issue给我们。请前往:[DataxIssue](https://github.com/alibaba/DataX/issues) # 开源版DataX企业用户 ![Datax-logo](https://github.com/alibaba/DataX/blob/master/images/datax-enterprise-users.jpg) ``` 长期招聘 联系邮箱:datax@alibabacloud.com 【JAVA开发职位】 职位名称:JAVA资深开发工程师/专家/高级专家 工作年限 : 2年以上 学历要求 : 本科(如果能力靠谱,这些都不是条件) 期望层级 : P6/P7/P8 岗位描述: 1. 负责阿里云大数据平台(数加)的开发设计。 2. 负责面向政企客户的大数据相关产品开发; 3. 利用大规模机器学习算法挖掘数据之间的联系,探索数据挖掘技术在实际场景中的产品应用 ; 4. 一站式大数据开发平台 5. 大数据任务调度引擎 6. 任务执行引擎 7. 任务监控告警 8. 海量异构数据同步 岗位要求: 1. 拥有3年以上JAVA Web开发经验; 2. 熟悉Java的基础技术体系。包括JVM、类装载、线程、并发、IO资源管理、网络; 3. 熟练使用常用Java技术框架、对新技术框架有敏锐感知能力;深刻理解面向对象、设计原则、封装抽象; 4. 熟悉HTML/HTML5和JavaScript;熟悉SQL语言; 5. 执行力强,具有优秀的团队合作精神、敬业精神; 6. 深刻理解设计模式及应用场景者加分; 7. 具有较强的问题分析和处理能力、比较强的动手能力,对技术有强烈追求者优先考虑; 8. 对高并发、高稳定可用性、高性能、大数据处理有过实际项目及产品经验者优先考虑; 9. 有大数据产品、云产品、中间件技术解决方案者优先考虑。 ```` 用户咨询支持: 钉钉群目前暂时受到了一些管控策略影响,建议大家有问题优先在这里提交问题 Issue,DataX研发和社区会定期回答Issue中的问题,知识库丰富后也能帮助到后来的使用者。 ================================================ FILE: adbmysqlwriter/doc/adbmysqlwriter.md ================================================ # DataX AdbMysqlWriter --- ## 1 快速介绍 AdbMysqlWriter 插件实现了写入数据到 ADB MySQL 目的表的功能。在底层实现上, AdbMysqlWriter 通过 JDBC 连接远程 ADB MySQL 数据库,并执行相应的 `insert into ...` 或者 ( `replace into ...` ) 的 SQL 语句将数据写入 ADB MySQL,内部会分批次提交入库。 AdbMysqlWriter 面向ETL开发工程师,他们使用 AdbMysqlWriter 从数仓导入数据到 ADB MySQL。同时 AdbMysqlWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 AdbMysqlWriter 通过 DataX 框架获取 Reader 生成的协议数据,AdbMysqlWriter 通过 JDBC 连接远程 ADB MySQL 数据库,并执行相应的 `insert into ...` 或者 ( `replace into ...` ) 的 SQL 语句将数据写入 ADB MySQL。 * `insert into...`(遇到主键重复时会自动忽略当前写入数据,不做更新,作用等同于`insert ignore into`) ##### 或者 * `replace into...`(没有遇到主键/唯一性索引冲突时,与 insert into 行为一致,冲突时会用新行替换原有行所有字段) 的语句写入数据到 MySQL。出于性能考虑,采用了 `PreparedStatement + Batch`,并且设置了:`rewriteBatchedStatements=true`,将数据缓冲到线程上下文 Buffer 中,当 Buffer 累计到预定阈值时,才发起写入请求。
注意:整个任务至少需要具备 `insert/replace into...` 的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 ADB MySQL 导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "adbmysqlwriter", "parameter": { "writeMode": "replace", "username": "root", "password": "root", "column": [ "*" ], "preSql": [ "truncate table @table" ], "connection": [ { "jdbcUrl": "jdbc:mysql://ip:port/database?useUnicode=true", "table": [ "test" ] } ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息。作业运行时,DataX 会在你提供的 jdbcUrl 后面追加如下属性:yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true 注意:1、在一个数据库上只能配置一个 jdbcUrl 2、一个 AdbMySQL 写入任务仅能配置一个 jdbcUrl 3、jdbcUrl按照MySQL官方规范,并可以填写连接附加控制信息,比如想指定连接编码为 gbk ,则在 jdbcUrl 后面追加属性 useUnicode=true&characterEncoding=gbk。具体请参看 Mysql官方文档或者咨询对应 DBA。 * 必选:是
* 默认值:无
* **username** * 描述:目的数据库的用户名
* 必选:是
* 默认值:无
* **password** * 描述:目的数据库的密码
* 必选:是
* 默认值:无
* **table** * 描述:目的表的表名称。只能配置一个 AdbMySQL 的表名称。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
* 默认值:无
* **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id", "name", "age"]。如果要依次写入全部列,使用`*`表示, 例如: `"column": ["*"]`。 **column配置项必须指定,不能留空!** 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、 column 不能配置任何常量值 * 必选:是
* 默认值:否
* **session** * 描述: DataX在获取 ADB MySQL 连接时,执行session指定的SQL语句,修改当前connection session属性 * 必须: 否 * 默认值: 空 * **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 SQL 语句时,会对变量按照实际表名称进行替换。比如希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["truncate table @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 `truncate table 对应表名称`
* 必选:否
* 默认值:无
* **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
* 必选:否
* 默认值:无
* **writeMode** * 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句
* 必选:是
* 所有选项:insert/replace/update
* 默认值:replace
* **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与 Adb MySQL 的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
* 必选:否
* 默认值:2048
### 3.3 类型转换 目前 AdbMysqlWriter 支持大部分 MySQL 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出 AdbMysqlWriter 针对 MySQL 类型转换列表: | DataX 内部类型 | AdbMysql 数据类型 | |---------------|---------------------------------| | Long | tinyint, smallint, int, bigint | | Double | float, double, decimal | | String | varchar | | Date | date, time, datetime, timestamp | | Boolean | boolean | | Bytes | binary | ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 TPC-H 数据集 lineitem 表,共 17 个字段, 随机生成总记录行数 59986052。未压缩总数据量:7.3GiB 建表语句: CREATE TABLE `datax_adbmysqlwriter_perf_lineitem` ( `l_orderkey` bigint NOT NULL COMMENT '', `l_partkey` int NOT NULL COMMENT '', `l_suppkey` int NOT NULL COMMENT '', `l_linenumber` int NOT NULL COMMENT '', `l_quantity` decimal(15,2) NOT NULL COMMENT '', `l_extendedprice` decimal(15,2) NOT NULL COMMENT '', `l_discount` decimal(15,2) NOT NULL COMMENT '', `l_tax` decimal(15,2) NOT NULL COMMENT '', `l_returnflag` varchar(1024) NOT NULL COMMENT '', `l_linestatus` varchar(1024) NOT NULL COMMENT '', `l_shipdate` date NOT NULL COMMENT '', `l_commitdate` date NOT NULL COMMENT '', `l_receiptdate` date NOT NULL COMMENT '', `l_shipinstruct` varchar(1024) NOT NULL COMMENT '', `l_shipmode` varchar(1024) NOT NULL COMMENT '', `l_comment` varchar(1024) NOT NULL COMMENT '', `dummy` varchar(1024), PRIMARY KEY (`l_orderkey`, `l_linenumber`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='datax perf test'; 单行记录类似于: l_orderkey: 2122789 l_partkey: 1233571 l_suppkey: 8608 l_linenumber: 1 l_quantity: 35.00 l_extendedprice: 52657.85 l_discount: 0.02 l_tax: 0.07 l_returnflag: N l_linestatus: O l_shipdate: 1996-11-03 l_commitdate: 1996-12-07 l_receiptdate: 1996-11-16 l_shipinstruct: COLLECT COD l_shipmode: FOB l_comment: ld, regular theodolites. dummy: #### 4.1.2 机器参数 * DataX ECS: 24Core48GB * Adb MySQL 数据库 * 计算资源:16Core64GB(集群版) * 弹性IO资源:3 #### 4.1.3 DataX jvm 参数 -Xms1G -Xmx10G -XX:+HeapDumpOnOutOfMemoryError ### 4.2 测试报告 | 通道数 | 批量提交行数 | DataX速度(Rec/s) | DataX流量(MB/s) | 导入用时(s) | |-----|-------|------------------|---------------|---------| | 1 | 512 | 23071 | 2.34 | 2627 | | 1 | 1024 | 26080 | 2.65 | 2346 | | 1 | 2048 | 28162 | 2.86 | 2153 | | 1 | 4096 | 28978 | 2.94 | 2119 | | 4 | 512 | 56590 | 5.74 | 1105 | | 4 | 1024 | 81062 | 8.22 | 763 | | 4 | 2048 | 107117 | 10.87 | 605 | | 4 | 4096 | 113181 | 11.48 | 579 | | 8 | 512 | 81062 | 8.22 | 786 | | 8 | 1024 | 127629 | 12.95 | 519 | | 8 | 2048 | 187456 | 19.01 | 369 | | 8 | 4096 | 206848 | 20.98 | 341 | | 16 | 512 | 130404 | 13.23 | 513 | | 16 | 1024 | 214235 | 21.73 | 335 | | 16 | 2048 | 299930 | 30.42 | 253 | | 16 | 4096 | 333255 | 33.80 | 227 | | 32 | 512 | 206848 | 20.98 | 347 | | 32 | 1024 | 315716 | 32.02 | 241 | | 32 | 2048 | 399907 | 40.56 | 199 | | 32 | 4096 | 461431 | 46.80 | 184 | | 64 | 512 | 333255 | 33.80 | 231 | | 64 | 1024 | 399907 | 40.56 | 204 | | 64 | 2048 | 428471 | 43.46 | 199 | | 64 | 4096 | 461431 | 46.80 | 187 | | 128 | 512 | 333255 | 33.80 | 235 | | 128 | 1024 | 399907 | 40.56 | 203 | | 128 | 2048 | 425432 | 43.15 | 197 | | 128 | 4096 | 387006 | 39.26 | 211 | 说明: 1. datax 使用 txtfilereader 读取本地文件,避免源端存在性能瓶颈。 #### 性能测试小结 1. channel通道个数和batchSize对性能影响比较大 2. 通常不建议写入数据库时,通道个数 > 32 ## 5 约束限制 ## FAQ *** **Q: AdbMysqlWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 *** **Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。第二种,向临时表导入数据,完成后再 rename 到线上表。 *** **Q: 上面第二种方法可以避免对线上数据造成影响,那我具体怎样操作?** A: 可以配置临时表导入 ================================================ FILE: adbmysqlwriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT adbmysqlwriter adbmysqlwriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java 5.1.40 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: adbmysqlwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/adbmysqlwriter target/ adbmysqlwriter-0.0.1-SNAPSHOT.jar plugin/writer/adbmysqlwriter false plugin/writer/adbmysqlwriter/libs runtime ================================================ FILE: adbmysqlwriter/src/main/java/com/alibaba/datax/plugin/writer/adbmysqlwriter/AdbMysqlWriter.java ================================================ package com.alibaba.datax.plugin.writer.adbmysqlwriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import org.apache.commons.lang3.StringUtils; import java.sql.Connection; import java.sql.SQLException; import java.util.List; public class AdbMysqlWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.ADB; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterJob; @Override public void preCheck(){ this.init(); this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE); } @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterJob.init(this.originalConfig); } // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) @Override public void prepare() { //实跑先不支持 权限 检验 //this.commonRdbmsWriterJob.privilegeValid(this.originalConfig, DATABASE_TYPE); this.commonRdbmsWriterJob.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); } // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) @Override public void post() { this.commonRdbmsWriterJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterJob.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterTask; public static class DelegateClass extends CommonRdbmsWriter.Task { private long writeTime = 0L; private long writeCount = 0L; private long lastLogTime = 0; public DelegateClass(DataBaseType dataBaseType) { super(dataBaseType); } @Override protected void doBatchInsert(Connection connection, List buffer) throws SQLException { long startTime = System.currentTimeMillis(); super.doBatchInsert(connection, buffer); writeCount = writeCount + buffer.size(); writeTime = writeTime + (System.currentTimeMillis() - startTime); // log write metrics every 10 seconds if (System.currentTimeMillis() - lastLogTime > 10000) { lastLogTime = System.currentTimeMillis(); logTotalMetrics(); } } public void logTotalMetrics() { LOG.info(Thread.currentThread().getName() + ", AdbMySQL writer take " + writeTime + " ms, write " + writeCount + " records."); } } @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); if (StringUtils.isBlank(this.writerSliceConfig.getString(Key.WRITE_MODE))) { this.writerSliceConfig.set(Key.WRITE_MODE, "REPLACE"); } this.commonRdbmsWriterTask = new DelegateClass(DATABASE_TYPE); this.commonRdbmsWriterTask.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); } //TODO 改用连接池,确保每次获取的连接都是可用的(注意:连接可能需要每次都初始化其 session) public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterTask.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); } @Override public boolean supportFailOver(){ String writeMode = writerSliceConfig.getString(Key.WRITE_MODE); return "replace".equalsIgnoreCase(writeMode); } } } ================================================ FILE: adbmysqlwriter/src/main/resources/plugin.json ================================================ { "name": "adbmysqlwriter", "class": "com.alibaba.datax.plugin.writer.adbmysqlwriter.AdbMysqlWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: adbmysqlwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "adbmysqlwriter", "parameter": { "username": "username", "password": "password", "column": ["col1", "col2", "col3"], "connection": [ { "jdbcUrl": "jdbc:mysql://:[/]", "table": ["table1", "table2"] } ], "preSql": [], "postSql": [], "batchSize": 65536, "batchByteSize": 134217728, "dryRun": false, "writeMode": "insert" } } ================================================ FILE: adbpgwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 adbpgwriter adbpgwriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j mysql mysql-connector-java com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.alibaba druid com.alibaba druid 1.1.17 org.slf4j slf4j-api org.apache.commons commons-exec 1.3 ch.qos.logback logback-classic commons-configuration commons-configuration 1.10 com.alibaba.cloud.analyticdb adb4pgclient 1.0.0 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: adbpgwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/adbpgwriter target/ adbpgwriter-0.0.1-SNAPSHOT.jar plugin/writer/adbpgwriter false plugin/writer/adbpgwriter/libs runtime ================================================ FILE: adbpgwriter/src/main/doc/adbpgwriter.md ================================================ # DataX ADB PG Writer --- ## 1 快速介绍 AdbpgWriter 插件实现了写入数据到 ABD PG版数据库的功能。在底层实现上,AdbpgWriter 插件会先缓存需要写入的数据,当缓存的 数据量达到 commitSize 时,插件会通过 JDBC 连接远程 ADB PG版 数据库,并执行 COPY 命令将数据写入 ADB PG 数据库。 AdbpgWriter 可以作为数据迁移工具为用户提供服务。 ## 2 实现原理 AdbpgWriter 通过 DataX 框架获取 Reader 生成的协议数据,首先会将数据缓存,当缓存的数据量达到commitSize时,插件根据你配置生成相应的COPY语句,执行 COPY命令将数据写入ADB PG数据库中。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 AdbpgWriter导入的数据 ```json { "job": { "setting": { "speed": { "channel": 32 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ] }, "sliceRecordCount": 1000 }, "writer": { "name": "adbpgwriter", "parameter": { "username": "", "password": "", "host": "127.0.0.1", "port": "1234", "database": "database", "schema": "schema", "table": "table", "preSql": ["delete * from table"], "postSql": ["select * from table"], "column": ["*"] } } } ] } } ``` ### 3.2 参数说明 * **name** * 描述:插件名称
* 必选:是
* 默认值:无
* **username** * 描述:目的数据库的用户名
* 必选:是
* 默认值:无
* **password** * 描述:目的数据库的密码
* 必选:是
* 默认值:无
* **host** * 描述:目的数据库主机名
* 必选:是
* 默认值:无
* **port** * 描述:目的数据库的端口
* 必选:是
* 默认值:无
* **database** * 描述:需要写入的表所属的数据库名称
* 必选:是
* 默认值:无
* **schema** * 描述:需要写入的表所属的schema名称
* 必选:是
* 默认值:无
* **table** * 描述:需要写入的表名称
* 必选:是
* 默认值:无
* **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用*表示, 例如: "column": ["*"] 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 3、大写字段名,此处配置时,不需要拼接转义符号:\" * 必选:是
* 默认值:否
* **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,可以使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
* 必选:否
* 默认值:否
* **postSql** * 描述:写入数据到目的表后,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,可以使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。
* 必选:否
* 默认值:否
### 3.3 类型转换 目前 AdbpgWriter 支持大部分 ADB PG 数据库的类型,但也存在部分没有支持的情况,请注意检查你的类型。 下面列出 AdbpgWriter 针对 ADB PG 类型转换列表: | DataX 内部类型| ADB PG 数据类型 | | -------- | ----- | | Long |bigint, bigserial, integer, smallint, serial | | Double |double precision, float, numeric, real | | String |varchar, char, text| | Date |date, time, timestamp | | Boolean |bool| ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: ```sql create table schematest.test_datax ( t1 int, t2 bigint, t3 bigserial, t4 float, t5 timestamp, t6 varchar )distributed by(t1); ``` #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 24核 2. mem: 96GB * ADB PG数据库机器参数为: 1. 平均core数量:4 2. primary segment 数量: 4 3. 计算组数量:2 ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数| commitSize MB | DataX速度(Rec/s)| DataX流量(M/s) |--------|--------| --------|--------| |1| 10 | 54098 | 15.54 | |1| 20 | 55000 | 15.80 | |4| 10 | 183333 | 52.66 | |4| 20 | 173684 | 49.89 | |8| 10 | 330000 | 94.79 | |8| 20 | 300000 | 86.17 | |16| 10 | 412500 | 118.48 | |16| 20 | 366666 | 105.32 | |32| 10 | 366666 | 105.32 | #### 4.2.2 性能测试小结 1. `channel数对性能影响很大` 2. `通常不建议写入数据库时,通道个数 > 32` ================================================ FILE: adbpgwriter/src/main/java/com/alibaba/datax/plugin/writer/adbpgwriter/AdbpgWriter.java ================================================ package com.alibaba.datax.plugin.writer.adbpgwriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import java.util.ArrayList; import java.util.List; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil; import com.alibaba.datax.plugin.writer.adbpgwriter.copy.Adb4pgClientProxy; import com.alibaba.datax.plugin.writer.adbpgwriter.util.Adb4pgUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import static com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode.*; import static com.alibaba.datax.plugin.rdbms.util.DataBaseType.PostgreSQL; /** * @author yuncheng */ public class AdbpgWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.PostgreSQL; public static class Job extends Writer.Job { private Configuration originalConfig; private CommonRdbmsWriter.Job commonRdbmsWriterMaster; private static final Logger LOG = LoggerFactory.getLogger(Writer.Job.class); @Override public void init() { this.originalConfig = super.getPluginJobConf(); LOG.info("in Job.init(), config is:[\n{}\n]", originalConfig.toJSON()); this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); //convert to DatabaseConfig, use DatabaseConfig to check user configuration Adb4pgUtil.checkConfig(originalConfig); } @Override public void prepare() { Adb4pgUtil.prepare(originalConfig); } @Override public List split(int adviceNumber) { List splitResult = new ArrayList(); for(int i = 0; i < adviceNumber; i++) { splitResult.add(this.originalConfig.clone()); } return splitResult; } @Override public void post() { Adb4pgUtil.post(originalConfig); } @Override public void destroy() { } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterSlave; private Adb4pgClientProxy adb4pgClientProxy; //Adb4pgClient client; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.adb4pgClientProxy = new Adb4pgClientProxy(writerSliceConfig, super.getTaskPluginCollector()); this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE){ @Override public String calcValueHolder(String columnType){ if("serial".equalsIgnoreCase(columnType)){ return "?::int"; }else if("bit".equalsIgnoreCase(columnType)){ return "?::bit varying"; } return "?::" + columnType; } }; } @Override public void prepare() { } @Override public void startWrite(RecordReceiver recordReceiver) { this.adb4pgClientProxy.startWriteWithConnection(recordReceiver, Adb4pgUtil.getAdbpgConnect(writerSliceConfig)); } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: adbpgwriter/src/main/java/com/alibaba/datax/plugin/writer/adbpgwriter/copy/Adb4pgClientProxy.java ================================================ package com.alibaba.datax.plugin.writer.adbpgwriter.copy; import com.alibaba.cloud.analyticdb.adb4pgclient.*; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.transport.record.DefaultRecord; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.writer.adbpgwriter.util.Adb4pgUtil; import com.alibaba.datax.plugin.writer.adbpgwriter.util.Constant; import com.alibaba.datax.plugin.writer.adbpgwriter.util.Key; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.Types; import java.util.ArrayList; import java.util.List; /** * @author yuncheng */ public class Adb4pgClientProxy implements AdbProxy { private static final Logger LOG = LoggerFactory.getLogger(Adb4pgClientProxy.class); private Adb4pgClient adb4pgClient; private String table; private String schema; List columns; private TableInfo tableInfo; private TaskPluginCollector taskPluginCollector; private boolean useRawData[]; public Adb4pgClientProxy(Configuration configuration,TaskPluginCollector taskPluginCollector) { this.taskPluginCollector = taskPluginCollector; DatabaseConfig databaseConfig = Adb4pgUtil.convertConfiguration(configuration); // If the value of column is empty, set null boolean emptyAsNull = configuration.getBool(Key.EMPTY_AS_NULL, false); databaseConfig.setEmptyAsNull(emptyAsNull); // 使用insert ignore into方式进行插入 boolean ignoreInsert = configuration.getBool(Key.IGNORE_INSERT, false); databaseConfig.setInsertIgnore(ignoreInsert); // commit时,写入ADB出现异常时重试的3次 int retryTimes = configuration.getInt(Key.RETRY_CONNECTION_TIME, Constant.DEFAULT_RETRY_TIMES); databaseConfig.setRetryTimes(retryTimes); // 重试间隔的时间为1s,单位是ms int retryIntervalTime = configuration.getInt(Key.RETRY_INTERVAL_TIME, 1000); databaseConfig.setRetryIntervalTime(retryIntervalTime); // 设置自动提交的SQL长度(单位Byte),默认为10MB,一般不建议设置 int commitSize = configuration.getInt("commitSize", 10 * 1024 * 1024); databaseConfig.setCommitSize(commitSize); // 设置写入adb时的并发线程数,默认4,针对配置的所有表 int parallelNumber = configuration.getInt("parallelNumber", 4); databaseConfig.setParallelNumber(parallelNumber); // 设置client中使用的logger对象,此处使用slf4j.Logger databaseConfig.setLogger(Adb4pgClientProxy.LOG); // sdk 默认值为true boolean shareDataSource = configuration.getBool("shareDataSource", true); databaseConfig.setShareDataSource(shareDataSource); //List columns = configuration.getList(Key.COLUMN, String.class); this.table = configuration.getString(com.alibaba.datax.plugin.rdbms.writer.Key.TABLE); this.schema = configuration.getString(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.SCHEMA); this.adb4pgClient = new Adb4pgClient(databaseConfig); this.columns = databaseConfig.getColumns(table,schema); this.tableInfo = adb4pgClient.getTableInfo(table, schema); this.useRawData = new boolean[this.columns.size()]; List columnInfos = tableInfo.getColumns(); for (int i = 0; i < this.columns.size(); i++) { String oriEachColumn = columns.get(i); String eachColumn = oriEachColumn; // 防御性保留字 if (eachColumn.startsWith(Constant.COLUMN_QUOTE_CHARACTER) && eachColumn.endsWith(Constant.COLUMN_QUOTE_CHARACTER)) { eachColumn = eachColumn.substring(1, eachColumn.length() - 1); } for (ColumnInfo eachAdsColumn : columnInfos) { if (eachColumn.equals(eachAdsColumn.getName())) { int columnSqltype = eachAdsColumn.getDataType().sqlType; switch (columnSqltype) { case Types.DATE: case Types.TIME: case Types.TIMESTAMP: this.useRawData[i] = false; break; default: this.useRawData[i] = true; break; } } } } } @Override public void startWriteWithConnection(RecordReceiver recordReceiver, Connection connection) { try { Record record; while ((record = recordReceiver.getFromReader()) != null) { Row row = new Row(); List values = new ArrayList(); this.prepareColumnTypeValue(record, values); row.setColumnValues(values); try { this.adb4pgClient.addRow(row,this.table, this.schema); } catch (Adb4pgClientException e) { if (101 == e.getCode()) { for (String each : e.getErrData()) { Record dirtyData = new DefaultRecord(); dirtyData.addColumn(new StringColumn(each)); this.taskPluginCollector.collectDirtyRecord(dirtyData, e.getMessage()); } } else { throw e; } } } try { this.adb4pgClient.commit(); } catch (Adb4pgClientException e) { if (101 == e.getCode()) { for (String each : e.getErrData()) { Record dirtyData = new DefaultRecord(); dirtyData.addColumn(new StringColumn(each)); this.taskPluginCollector.collectDirtyRecord(dirtyData, e.getMessage()); } } else { throw e; } } }catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); }finally { DBUtil.closeDBResources(null, null, connection); } return; } private void prepareColumnTypeValue(Record record, List values) { for (int i = 0; i < this.columns.size(); i++) { Column column = record.getColumn(i); if (this.useRawData[i]) { values.add(column.getRawData()); } else { values.add(column.asString()); } } } @Override public void closeResource() { try { LOG.info("stop the adb4pgClient"); this.adb4pgClient.stop(); } catch (Exception e) { LOG.warn("stop adbClient meet a exception, ignore it: {}", e.getMessage(), e); } } } ================================================ FILE: adbpgwriter/src/main/java/com/alibaba/datax/plugin/writer/adbpgwriter/copy/AdbProxy.java ================================================ package com.alibaba.datax.plugin.writer.adbpgwriter.copy; import com.alibaba.datax.common.plugin.RecordReceiver; import java.sql.Connection; /** * @author yuncheng */ public interface AdbProxy { public abstract void startWriteWithConnection(RecordReceiver recordReceiver, Connection connection); public void closeResource(); } ================================================ FILE: adbpgwriter/src/main/java/com/alibaba/datax/plugin/writer/adbpgwriter/package-info.java ================================================ /** * Greenplum Writer. * * @since 0.0.1 */ package com.alibaba.datax.plugin.writer.adbpgwriter; ================================================ FILE: adbpgwriter/src/main/java/com/alibaba/datax/plugin/writer/adbpgwriter/util/Adb4pgUtil.java ================================================ package com.alibaba.datax.plugin.writer.adbpgwriter.util; import com.alibaba.cloud.analyticdb.adb4pgclient.Adb4pgClient; import com.alibaba.cloud.analyticdb.adb4pgclient.Adb4pgClientException; import com.alibaba.cloud.analyticdb.adb4pgclient.DatabaseConfig; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.spi.ErrorCode; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.*; import static com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode.COLUMN_SPLIT_ERROR; /** * @author yuncheng */ public class Adb4pgUtil { private static final Logger LOG = LoggerFactory.getLogger(Adb4pgUtil.class); private static final DataBaseType DATABASE_TYPE = DataBaseType.PostgreSQL; public static void checkConfig(Configuration originalConfig) { try { DatabaseConfig databaseConfig = convertConfiguration(originalConfig); Adb4pgClient testConfigClient = new Adb4pgClient(databaseConfig); } catch (Exception e) { throw new Adb4pgClientException(Adb4pgClientException.CONFIG_ERROR, "Check config exception: " + e.getMessage(), null); } } public static DatabaseConfig convertConfiguration(Configuration originalConfig) { originalConfig.getNecessaryValue(Key.USERNAME, COLUMN_SPLIT_ERROR); originalConfig.getNecessaryValue(Key.PASSWORD, COLUMN_SPLIT_ERROR); String userName = originalConfig.getString(Key.USERNAME); String passWord = originalConfig.getString(Key.PASSWORD); String tableName = originalConfig.getString(Key.TABLE); String schemaName = originalConfig.getString(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.SCHEMA); String host = originalConfig.getString(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.HOST); String port = originalConfig.getString(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.PORT); String databseName = originalConfig.getString(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.DATABASE); List columns = originalConfig.getList(Key.COLUMN, String.class); DatabaseConfig databaseConfig = new DatabaseConfig(); databaseConfig.setHost(host); databaseConfig.setPort(Integer.valueOf(port)); databaseConfig.setDatabase(databseName); databaseConfig.setUser(userName); databaseConfig.setPassword(passWord); databaseConfig.setLogger(LOG); databaseConfig.setInsertIgnore(originalConfig.getBool(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.IS_INSERTINGORE, true)); databaseConfig.addTable(Collections.singletonList(tableName), schemaName); databaseConfig.setColumns(columns, tableName, schemaName); return databaseConfig; } private static Map> splitBySchemaName(List tables) { HashMap> res = new HashMap>(16); for (String schemaNameTableName: tables) { String[] s = schemaNameTableName.split("\\."); if (!res.containsKey(s[0])) { res.put(s[0], new ArrayList()); } res.get(s[0]).add(s[1]); } return res; } public static Connection getAdbpgConnect(Configuration conf) { String userName = conf.getString(Key.USERNAME); String passWord = conf.getString(Key.PASSWORD); return DBUtil.getConnection(DataBaseType.PostgreSQL, generateJdbcUrl(conf), userName, passWord); } private static String generateJdbcUrl(Configuration configuration) { String host = configuration.getString(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.HOST); String port = configuration.getString(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.PORT); String databseName = configuration.getString(com.alibaba.datax.plugin.writer.adbpgwriter.util.Key.DATABASE); String jdbcUrl = "jdbc:postgresql://" + host + ":" + port + "/" + databseName; return jdbcUrl; } public static void prepare(Configuration originalConfig) { List preSqls = originalConfig.getList(Key.PRE_SQL, String.class); String tableName = originalConfig.getString(Key.TABLE); List renderedPreSqls = WriterUtil.renderPreOrPostSqls( preSqls, tableName); if (renderedPreSqls.size() == 0) { return; } originalConfig.remove(Key.PRE_SQL); Connection conn = getAdbpgConnect(originalConfig); WriterUtil.executeSqls(conn, renderedPreSqls, generateJdbcUrl(originalConfig), DATABASE_TYPE); DBUtil.closeDBResources(null, null, conn); } public static void post(Configuration configuration) { List postSqls = configuration.getList(Key.POST_SQL, String.class); String tableName = configuration.getString(Key.TABLE); List renderedPostSqls = WriterUtil.renderPreOrPostSqls( postSqls, tableName); if (renderedPostSqls.size() == 0) { return; } configuration.remove(Key.POST_SQL); Connection conn = getAdbpgConnect(configuration); WriterUtil.executeSqls(conn, renderedPostSqls, generateJdbcUrl(configuration), DATABASE_TYPE); DBUtil.closeDBResources(null, null, conn); } } ================================================ FILE: adbpgwriter/src/main/java/com/alibaba/datax/plugin/writer/adbpgwriter/util/Constant.java ================================================ package com.alibaba.datax.plugin.writer.adbpgwriter.util; /** * @author yuncheng */ public class Constant { public static final int DEFAULT_RETRY_TIMES = 3; public static final String COLUMN_QUOTE_CHARACTER = "\""; } ================================================ FILE: adbpgwriter/src/main/java/com/alibaba/datax/plugin/writer/adbpgwriter/util/Key.java ================================================ package com.alibaba.datax.plugin.writer.adbpgwriter.util; /** * @author yuncheng */ public class Key { public final static String COLUMN = "column"; public final static String IS_INSERTINGORE = "insertIgnore"; public final static String HOST = "host"; public final static String PORT = "port"; public final static String DATABASE = "database"; public final static String SCHEMA = "schema"; public final static String EMPTY_AS_NULL = "emptyAsNull"; public final static String IGNORE_INSERT = "ignoreInsert"; public final static String RETRY_CONNECTION_TIME = "retryTimes"; public final static String RETRY_INTERVAL_TIME = "retryIntervalTime"; public final static String COMMIT_SIZE = "commitSize"; public final static String PARALLEL_NUMBER = "parallelNumber"; public final static String SHARED_DATASOURCE = "shareDataSource"; } ================================================ FILE: adbpgwriter/src/main/resources/plugin.json ================================================ { "name": "adbpgwriter", "class": "com.alibaba.datax.plugin.writer.adbpgwriter.AdbpgWriter", "description": "", "developer": "alibaba" } ================================================ FILE: adbpgwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "adbpgwriter", "parameter": { "username": "", "password": "", "host": "", "port": "", "database": "", "schema": "", "table": "", "column": ["*"] } } ================================================ FILE: adswriter/doc/adswriter.md ================================================ # DataX ADS写入 --- ## 1 快速介绍
欢迎ADS加入DataX生态圈!ADSWriter插件实现了其他数据源向ADS写入功能,现有DataX所有的数据源均可以无缝接入ADS,实现数据快速导入ADS。 ADS写入预计支持两种实现方式: * ADSWriter 支持向ODPS中转落地导入ADS方式,优点在于当数据量较大时(>1KW),可以以较快速度进行导入,缺点引入了ODPS作为落地中转,因此牵涉三方系统(DataX、ADS、ODPS)鉴权认证。 * ADSWriter 同时支持向ADS直接写入的方式,优点在于小批量数据写入能够较快完成(<1KW),缺点在于大数据导入较慢。 注意: > 如果从ODPS导入数据到ADS,请用户提前在源ODPS的Project中授权ADS Build账号具有读取你源表ODPS的权限,同时,ODPS源表创建人和ADS写入属于同一个阿里云账号。 - > 如果从非ODPS导入数据到ADS,请用户提前在目的端ADS空间授权ADS Build账号具备Load data权限。 以上涉及ADS Build账号请联系ADS管理员提供。 ## 2 实现原理 ADS写入预计支持两种实现方式: ### 2.1 Load模式 DataX 将数据导入ADS为当前导入任务分配的ADS项目表,随后DataX通知ADS完成数据加载。该类数据导入方式实际上是写ADS完成数据同步,由于ADS是分布式存储集群,因此该通道吞吐量较大,可以支持TB级别数据导入。 ![中转导入](http://aligitlab.oss-cn-hangzhou-zmf.aliyuncs.com/uploads/cdp/cdp/f805dea46b/_____2015-04-10___12.06.21.png) 1. DataX底层得到明文的 jdbc://host:port/dbname + username + password + table, 以此连接ADS, 执行show grants; 前置检查该用户是否有ADS中目标表的Load Data或者更高的权限。注意,此时ADSWriter使用用户填写的ADS用户名+密码信息完成登录鉴权工作。 2. 检查通过后,通过ADS中目标表的元数据反向生成ODPS DDL,在ODPS中间project中,以ADSWriter的账户建立ODPS表(非分区表,生命周期设为1-2Day), 并调用ODPSWriter把数据源的数据写入该ODPS表中。 注意,这里需要使用中转ODPS的账号AK向中转ODPS写入数据。 3. 写入完成后,以中转ODPS账号连接ADS,发起Load Data From ‘odps://中转project/中转table/' [overwrite] into adsdb.adstable [partition (xx,xx=xx)]; 这个命令返回一个Job ID需要记录。 注意,此时ADS使用自己的Build账号访问中转ODPS,因此需要中转ODPS对这个Build账号提前开放读取权限。 4. 连接ADS一分钟一次轮询执行 select state from information_schema.job_instances where job_id like ‘$Job ID’,查询状态,注意这个第一个一分钟可能查不到状态记录。 5. Success或者Fail后返回给用户,然后删除中转ODPS表,任务结束。 上述流程是从其他非ODPS数据源导入ADS流程,对于ODPS导入ADS流程使用如下流程: ![直接导入](http://aligitlab.oss-cn-hangzhou-zmf.aliyuncs.com/uploads/cdp/cdp/b3a76459d1/_____2015-04-10___12.06.25.png) ### 2.2 Insert模式 DataX 将数据直连ADS接口,利用ADS暴露的INSERT接口直写到ADS。该类数据导入方式写入吞吐量较小,不适合大批量数据写入。有如下注意点: * ADSWriter使用JDBC连接直连ADS,并只使用了JDBC Statement进行数据插入。ADS不支持PreparedStatement,故ADSWriter只能单行多线程进行写入。 * ADSWriter支持筛选部分列,列换序等功能,即用户可以填写列。 * 考虑到ADS负载问题,建议ADSWriter Insert模式建议用户使用TPS限流,最高在1W TPS。 * ADSWriter在所有Task完成写入任务后,Job Post单例执行flush工作,保证数据在ADS整体更新。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到ADS,使用Load模式进行导入的数据。 ``` { "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "value": "DataX", "type": "string" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 100000 } }, "writer": { "name": "adswriter", "parameter": { "odps": { "accessId": "xxx", "accessKey": "xxx", "account": "xxx@aliyun.com", "odpsServer": "xxx", "tunnelServer": "xxx", "project": "transfer_project" }, "writeMode": "load", "url": "127.0.0.1:3306", "schema": "schema", "table": "table", "username": "username", "password": "password", "partition": "", "lifeCycle": 2, "overWrite": true, } } } ] } } ``` * 这里使用一份从内存产生到ADS,使用Insert模式进行导入的数据。 ``` { "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "value": "DataX", "type": "string" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 100000 } }, "writer": { "name": "adswriter", "parameter": { "writeMode": "insert", "url": "127.0.0.1:3306", "schema": "schema", "table": "table", "column": ["*"], "username": "username", "password": "password", "partition": "id,ds=2015" } } } ] } } ``` ### 3.2 参数说明 (用户配置规格) * **url** * 描述:ADS连接信息,格式为"ip:port"。 * 必选:是
* 默认值:无
* **schema** * 描述:ADS的schema名称。 * 必选:是
* 默认值:无
* **username** * 描述:ADS对应的username,目前就是accessId
* 必选:是
* 默认值:无
* **password** * 描述:ADS对应的password,目前就是accessKey
* 必选:是
* 默认值:无
* **table** * 描述:目的表的表名称。 * 必选:是
* 默认值:无
* **partition** * 描述:目标表的分区名称,当目标表为分区表,需要指定该字段。 * 必选:否
* 默认值:无
* **writeMode** * 描述:支持Load和Insert两种写入模式 * 必选:是
* 默认值:无
* **column** * 描述:目的表字段列表,可以为["*"],或者具体的字段列表,例如["a", "b", "c"] * 必选:是
* 默认值:无
* **overWrite** * 描述:ADS写入是否覆盖当前写入的表,true为覆盖写入,false为不覆盖(追加)写入。当writeMode为Load,该值才会生效。 * 必选:是
* 默认值:无
* **lifeCycle** * 描述:ADS 临时表生命周期。当writeMode为Load时,该值才会生效。 * 必选:是
* 默认值:无
* **batchSize** * 描述:ADS 提交数据写的批量条数,当writeMode为insert时,该值才会生效。 * 必选:writeMode为insert时才有用
* 默认值:32
* **bufferSize** * 描述:DataX数据收集缓冲区大小,缓冲区的目的是攒一个较大的buffer,源头的数据首先进入到此buffer中进行排序,排序完成后再提交ads写。排序是根据ads的分区列模式进行的,排序的目的是数据顺序对ADS服务端更友好,出于性能考虑。bufferSize缓冲区中的数据会经过batchSize批量提交到ADS中,一般如果要设置bufferSize,设置bufferSize为batchSize数量的多倍。当writeMode为insert时,该值才会生效。 * 必选:writeMode为insert时才有用
* 默认值:默认不配置不开启此功能
### 3.3 类型转换 | DataX 内部类型| ADS 数据类型 | | -------- | ----- | | Long |int, tinyint, smallint, int, bigint| | Double |float, double, decimal| | String |varchar | | Date |date | | Boolean |bool | | Bytes |无 | 注意: * multivalue ADS支持multivalue类型,DataX对于该类型支持待定? ## 4 插件约束 如果Reader为ODPS,且ADSWriter写入模式为Load模式时,ODPS的partition只支持如下三种配置方式(以两级分区为例): ``` "partition":["pt=*,ds=*"] (读取test表所有分区的数据) "partition":["pt=1,ds=*"] (读取test表下面,一级分区pt=1下面的所有二级分区) "partition":["pt=1,ds=hangzhou"] (读取test表下面,一级分区pt=1下面,二级分区ds=hz的数据) ``` ## 5 性能报告(线上环境实测) ### 5.1 环境准备 ### 5.2 测试报告 ## 6 FAQ ================================================ FILE: adswriter/pom.xml ================================================ com.alibaba.datax datax-all 0.0.1-SNAPSHOT 4.0.0 adswriter adswriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j mysql mysql-connector-java com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.alibaba druid com.alibaba.cloud.analyticdb adbclient 1.0.2 com.alibaba druid 1.1.12 org.slf4j slf4j-api org.apache.commons commons-exec 1.3 com.alibaba.datax odpswriter ${datax-project-version} ch.qos.logback logback-classic mysql mysql-connector-java 5.1.31 commons-configuration commons-configuration 1.10 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: adswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json config.properties plugin_job_template.json plugin/writer/adswriter target/ adswriter-0.0.1-SNAPSHOT.jar plugin/writer/adswriter false plugin/writer/adswriter/libs runtime ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/AdsException.java ================================================ package com.alibaba.datax.plugin.writer.adswriter; public class AdsException extends Exception { private static final long serialVersionUID = 1080618043484079794L; public final static int ADS_CONN_URL_NOT_SET = -100; public final static int ADS_CONN_USERNAME_NOT_SET = -101; public final static int ADS_CONN_PASSWORD_NOT_SET = -102; public final static int ADS_CONN_SCHEMA_NOT_SET = -103; public final static int JOB_NOT_EXIST = -200; public final static int JOB_FAILED = -201; public final static int ADS_LOADDATA_SCHEMA_NULL = -300; public final static int ADS_LOADDATA_TABLE_NULL = -301; public final static int ADS_LOADDATA_SOURCEPATH_NULL = -302; public final static int ADS_LOADDATA_JOBID_NOT_AVAIL = -303; public final static int ADS_LOADDATA_FAILED = -304; public final static int ADS_TABLEMETA_SCHEMA_NULL = -404; public final static int ADS_TABLEMETA_TABLE_NULL = -405; public final static int OTHER = -999; private int code = OTHER; private String message; public AdsException(int code, String message, Throwable e) { super(message, e); this.code = code; this.message = message; } @Override public String getMessage() { return "Code=" + this.code + " Message=" + this.message; } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/AdsWriter.java ================================================ package com.alibaba.datax.plugin.writer.adswriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil; import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnInfo; import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo; import com.alibaba.datax.plugin.writer.adswriter.insert.AdsClientProxy; import com.alibaba.datax.plugin.writer.adswriter.insert.AdsInsertProxy; import com.alibaba.datax.plugin.writer.adswriter.insert.AdsInsertUtil; import com.alibaba.datax.plugin.writer.adswriter.insert.AdsProxy; import com.alibaba.datax.plugin.writer.adswriter.load.AdsHelper; import com.alibaba.datax.plugin.writer.adswriter.load.TableMetaHelper; import com.alibaba.datax.plugin.writer.adswriter.load.TransferProjectConf; import com.alibaba.datax.plugin.writer.adswriter.odps.TableMeta; import com.alibaba.datax.plugin.writer.adswriter.util.AdsUtil; import com.alibaba.datax.plugin.writer.adswriter.util.Constant; import com.alibaba.datax.plugin.writer.adswriter.util.Key; import com.alibaba.datax.plugin.writer.odpswriter.OdpsWriter; import com.aliyun.odps.Instance; import com.aliyun.odps.Odps; import com.aliyun.odps.OdpsException; import com.aliyun.odps.account.Account; import com.aliyun.odps.account.AliyunAccount; import com.aliyun.odps.task.SQLTask; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.Collections; import java.util.List; public class AdsWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Writer.Job.class); public final static String ODPS_READER = "odpsreader"; private OdpsWriter.Job odpsWriterJobProxy = new OdpsWriter.Job(); private Configuration originalConfig; private Configuration readerConfig; /** * 持有ads账号的ads helper */ private AdsHelper adsHelper; /** * 持有odps账号的ads helper */ private AdsHelper odpsAdsHelper; /** * 中转odps的配置,对应到writer配置的parameter.odps部分 */ private TransferProjectConf transProjConf; private final int ODPSOVERTIME = 120000; private String odpsTransTableName; private String writeMode; private long startTime; @Override public void init() { startTime = System.currentTimeMillis(); this.originalConfig = super.getPluginJobConf(); this.writeMode = this.originalConfig.getString(Key.WRITE_MODE); if(null == this.writeMode) { LOG.warn("您未指定[writeMode]参数, 默认采用load模式, load模式只能用于离线表"); this.writeMode = Constant.LOADMODE; this.originalConfig.set(Key.WRITE_MODE, "load"); } if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { AdsUtil.checkNecessaryConfig(this.originalConfig, this.writeMode); loadModeInit(); } else if(Constant.INSERTMODE.equalsIgnoreCase(this.writeMode) || Constant.STREAMMODE.equalsIgnoreCase(this.writeMode)) { AdsUtil.checkNecessaryConfig(this.originalConfig, this.writeMode); List allColumns = AdsInsertUtil.getAdsTableColumnNames(originalConfig); AdsInsertUtil.dealColumnConf(originalConfig, allColumns); LOG.debug("After job init(), originalConfig now is:[\n{}\n]", originalConfig.toJSON()); } else { throw DataXException.asDataXException(AdsWriterErrorCode.INVALID_CONFIG_VALUE, "writeMode 必须为 'load' 或者 'insert' 或者 'stream'"); } } private void loadModeInit() { this.adsHelper = AdsUtil.createAdsHelper(this.originalConfig); this.odpsAdsHelper = AdsUtil.createAdsHelperWithOdpsAccount(this.originalConfig); this.transProjConf = TransferProjectConf.create(this.originalConfig); // 打印权限申请流程到日志中 LOG.info(String .format("%s%n%s%n%s", "如果您直接是odps->ads数据同步, 需要做2方面授权:", "[1] ads官方账号至少需要有待同步表的describe和select权限, 因为ads系统需要获取odps待同步表的结构和数据信息", "[2] 您配置的ads数据源访问账号ak, 需要有向指定的ads数据库发起load data的权限, 您可以在ads系统中添加授权")); LOG.info(String .format("%s%s%n%s%n%s", "如果您直接是rds(或其它非odps数据源)->ads数据同步, 流程是先将数据装载如odps临时表,再从odps临时表->ads, ", String.format("中转odps项目为%s,中转项目账号为%s, 权限方面:", this.transProjConf.getProject(), this.transProjConf.getAccount()), "[1] ads官方账号至少需要有待同步表(这里是odps临时表)的describe和select权限, 因为ads系统需要获取odps待同步表的结构和数据信息,此部分部署时已经完成授权", String.format("[2] 中转odps对应的账号%s, 需要有向指定的ads数据库发起load data的权限, 您可以在ads系统中添加授权", this.transProjConf.getAccount()))); /** * 如果是从odps导入到ads,直接load data然后System.exit() */ if (super.getPeerPluginName().equals(ODPS_READER)) { transferFromOdpsAndExit(); } Account odpsAccount; odpsAccount = new AliyunAccount(transProjConf.getAccessId(), transProjConf.getAccessKey()); Odps odps = new Odps(odpsAccount); odps.setEndpoint(transProjConf.getOdpsServer()); odps.setDefaultProject(transProjConf.getProject()); TableMeta tableMeta; try { String adsTable = this.originalConfig.getString(Key.ADS_TABLE); TableInfo tableInfo = adsHelper.getTableInfo(adsTable); int lifeCycle = this.originalConfig.getInt(Key.Life_CYCLE); tableMeta = TableMetaHelper.createTempODPSTable(tableInfo, lifeCycle); this.odpsTransTableName = tableMeta.getTableName(); String sql = tableMeta.toDDL(); LOG.info("正在创建ODPS临时表: "+sql); Instance instance = SQLTask.run(odps, transProjConf.getProject(), sql, null, null); boolean terminated = false; int time = 0; while (!terminated && time < ODPSOVERTIME) { Thread.sleep(1000); terminated = instance.isTerminated(); time += 1000; } LOG.info("正在创建ODPS临时表成功"); } catch (AdsException e) { throw DataXException.asDataXException(AdsWriterErrorCode.ODPS_CREATETABLE_FAILED, e); }catch (OdpsException e) { throw DataXException.asDataXException(AdsWriterErrorCode.ODPS_CREATETABLE_FAILED,e); } catch (InterruptedException e) { throw DataXException.asDataXException(AdsWriterErrorCode.ODPS_CREATETABLE_FAILED,e); } Configuration newConf = AdsUtil.generateConf(this.originalConfig, this.odpsTransTableName, tableMeta, this.transProjConf); odpsWriterJobProxy.setPluginJobConf(newConf); odpsWriterJobProxy.init(); } /** * 当reader是odps的时候,直接call ads的load接口,完成后退出。 * 这种情况下,用户在odps reader里头填写的参数只有部分有效。 * 其中accessId、accessKey是忽略掉iao的。 */ private void transferFromOdpsAndExit() { this.readerConfig = super.getPeerPluginJobConf(); String odpsTableName = this.readerConfig.getString(Key.ODPSTABLENAME); List userConfiguredPartitions = this.readerConfig.getList(Key.PARTITION, String.class); if (userConfiguredPartitions == null) { userConfiguredPartitions = Collections.emptyList(); } if(userConfiguredPartitions.size() > 1) { throw DataXException.asDataXException(AdsWriterErrorCode.ODPS_PARTITION_FAILED, ""); } if(userConfiguredPartitions.size() == 0) { loadAdsData(adsHelper, odpsTableName,null); }else { loadAdsData(adsHelper, odpsTableName,userConfiguredPartitions.get(0)); } System.exit(0); } // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) @Override public void prepare() { if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { //导数据到odps表中 this.odpsWriterJobProxy.prepare(); } else { // 实时表模式非分库分表 String adsTable = this.originalConfig.getString(Key.ADS_TABLE); List preSqls = this.originalConfig.getList(Key.PRE_SQL, String.class); List renderedPreSqls = WriterUtil.renderPreOrPostSqls( preSqls, adsTable); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { // 说明有 preSql 配置,则此处删除掉 this.originalConfig.remove(Key.PRE_SQL); Connection preConn = AdsUtil.getAdsConnect(this.originalConfig); LOG.info("Begin to execute preSqls:[{}]. context info:{}.", StringUtils.join(renderedPreSqls, ";"), this.originalConfig.getString(Key.ADS_URL)); WriterUtil.executeSqls(preConn, renderedPreSqls, this.originalConfig.getString(Key.ADS_URL), DataBaseType.ADS); DBUtil.closeDBResources(null, null, preConn); } } } @Override public List split(int mandatoryNumber) { if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { return this.odpsWriterJobProxy.split(mandatoryNumber); } else { List splitResult = new ArrayList(); for(int i = 0; i < mandatoryNumber; i++) { splitResult.add(this.originalConfig.clone()); } return splitResult; } } // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) @Override public void post() { if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { loadAdsData(odpsAdsHelper, this.odpsTransTableName, null); this.odpsWriterJobProxy.post(); } else { // 实时表模式非分库分表 String adsTable = this.originalConfig.getString(Key.ADS_TABLE); List postSqls = this.originalConfig.getList( Key.POST_SQL, String.class); List renderedPostSqls = WriterUtil.renderPreOrPostSqls( postSqls, adsTable); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { // 说明有 preSql 配置,则此处删除掉 this.originalConfig.remove(Key.POST_SQL); Connection postConn = AdsUtil.getAdsConnect(this.originalConfig); LOG.info( "Begin to execute postSqls:[{}]. context info:{}.", StringUtils.join(renderedPostSqls, ";"), this.originalConfig.getString(Key.ADS_URL)); WriterUtil.executeSqls(postConn, renderedPostSqls, this.originalConfig.getString(Key.ADS_URL), DataBaseType.ADS); DBUtil.closeDBResources(null, null, postConn); } } } @Override public void destroy() { if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { this.odpsWriterJobProxy.destroy(); } else { //insert mode do noting } } private void loadAdsData(AdsHelper helper, String odpsTableName, String odpsPartition) { String table = this.originalConfig.getString(Key.ADS_TABLE); String project; if (super.getPeerPluginName().equals(ODPS_READER)) { project = this.readerConfig.getString(Key.PROJECT); } else { project = this.transProjConf.getProject(); } String partition = this.originalConfig.getString(Key.PARTITION); String sourcePath = AdsUtil.generateSourcePath(project,odpsTableName,odpsPartition); /** * 因为之前检查过,所以不用担心unbox的时候NPE */ boolean overwrite = this.originalConfig.getBool(Key.OVER_WRITE); try { String id = helper.loadData(table,partition,sourcePath,overwrite); LOG.info("ADS Load Data任务已经提交,job id: " + id); boolean terminated = false; int time = 0; while(!terminated) { Thread.sleep(120000); terminated = helper.checkLoadDataJobStatus(id); time += 2; LOG.info("ADS 正在导数据中,整个过程需要20分钟以上,请耐心等待,目前已执行 "+ time+" 分钟"); } LOG.info("ADS 导数据已成功"); } catch (AdsException e) { if (super.getPeerPluginName().equals(ODPS_READER)) { // TODO 使用云账号 AdsWriterErrorCode.ADS_LOAD_ODPS_FAILED.setAdsAccount(helper.getUserName()); throw DataXException.asDataXException(AdsWriterErrorCode.ADS_LOAD_ODPS_FAILED,e); } else { throw DataXException.asDataXException(AdsWriterErrorCode.ADS_LOAD_TEMP_ODPS_FAILED,e); } } catch (InterruptedException e) { throw DataXException.asDataXException(AdsWriterErrorCode.ODPS_CREATETABLE_FAILED,e); } } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Writer.Task.class); private Configuration writerSliceConfig; private OdpsWriter.Task odpsWriterTaskProxy = new OdpsWriter.Task(); private String writeMode; private String schema; private String table; private int columnNumber; // warn: 只有在insert, stream模式才有, 对于load模式表明为odps临时表了 private TableInfo tableInfo; private String writeProxy; AdsProxy proxy = null; @Override public void init() { writerSliceConfig = super.getPluginJobConf(); this.writeMode = this.writerSliceConfig.getString(Key.WRITE_MODE); this.schema = writerSliceConfig.getString(Key.SCHEMA); this.table = writerSliceConfig.getString(Key.ADS_TABLE); if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { odpsWriterTaskProxy.setPluginJobConf(writerSliceConfig); odpsWriterTaskProxy.init(); } else if(Constant.INSERTMODE.equalsIgnoreCase(this.writeMode) || Constant.STREAMMODE.equalsIgnoreCase(this.writeMode)) { if (Constant.STREAMMODE.equalsIgnoreCase(this.writeMode)) { this.writeProxy = "datax"; } else { this.writeProxy = this.writerSliceConfig.getString("writeProxy", "adbClient"); } this.writerSliceConfig.set("writeProxy", this.writeProxy); try { this.tableInfo = AdsUtil.createAdsHelper(this.writerSliceConfig).getTableInfo(this.table); } catch (AdsException e) { throw DataXException.asDataXException(AdsWriterErrorCode.CREATE_ADS_HELPER_FAILED, e); } List allColumns = new ArrayList(); List columnInfo = this.tableInfo.getColumns(); for (ColumnInfo eachColumn : columnInfo) { allColumns.add(eachColumn.getName()); } LOG.info("table:[{}] all columns:[\n{}\n].", this.writerSliceConfig.get(Key.ADS_TABLE), StringUtils.join(allColumns, ",")); AdsInsertUtil.dealColumnConf(writerSliceConfig, allColumns); List userColumns = writerSliceConfig.getList(Key.COLUMN, String.class); this.columnNumber = userColumns.size(); } else { throw DataXException.asDataXException(AdsWriterErrorCode.INVALID_CONFIG_VALUE, "writeMode 必须为 'load' 或者 'insert' 或者 'stream'"); } } @Override public void prepare() { if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { odpsWriterTaskProxy.prepare(); } else { //do nothing } } public void startWrite(RecordReceiver recordReceiver) { // 这里的是非odps数据源->odps中转临时表数据同步, load操作在job post阶段完成 if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { odpsWriterTaskProxy.setTaskPluginCollector(super.getTaskPluginCollector()); odpsWriterTaskProxy.startWrite(recordReceiver); } else { // insert 模式 List columns = writerSliceConfig.getList(Key.COLUMN, String.class); Connection connection = AdsUtil.getAdsConnect(this.writerSliceConfig); TaskPluginCollector taskPluginCollector = super.getTaskPluginCollector(); if (StringUtils.equalsIgnoreCase(this.writeProxy, "adbClient")) { this.proxy = new AdsClientProxy(table, columns, writerSliceConfig, taskPluginCollector, this.tableInfo); } else { this.proxy = new AdsInsertProxy(schema + "." + table, columns, writerSliceConfig, taskPluginCollector, this.tableInfo); } proxy.startWriteWithConnection(recordReceiver, connection, columnNumber); } } @Override public void post() { if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { odpsWriterTaskProxy.post(); } else { //do noting until now } } @Override public void destroy() { if(Constant.LOADMODE.equalsIgnoreCase(this.writeMode)) { odpsWriterTaskProxy.destroy(); } else { //do noting until now if (null != this.proxy) { this.proxy.closeResource(); } } } } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/AdsWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.adswriter; import com.alibaba.datax.common.spi.ErrorCode; public enum AdsWriterErrorCode implements ErrorCode { REQUIRED_VALUE("AdsWriter-00", "您缺失了必须填写的参数值."), NO_ADS_TABLE("AdsWriter-01", "ADS表不存在."), ODPS_CREATETABLE_FAILED("AdsWriter-02", "创建ODPS临时表失败,请联系ADS 技术支持"), ADS_LOAD_TEMP_ODPS_FAILED("AdsWriter-03", "ADS从ODPS临时表导数据失败,请联系ADS 技术支持"), TABLE_TRUNCATE_ERROR("AdsWriter-04", "清空 ODPS 目的表时出错."), CREATE_ADS_HELPER_FAILED("AdsWriter-05", "创建ADSHelper对象出错,请联系ADS 技术支持"), ODPS_PARTITION_FAILED("AdsWriter-06", "ODPS Reader不允许配置多个partition,目前只支持三种配置方式,\"partition\":[\"pt=*,ds=*\"](读取test表所有分区的数据); \n" + "\"partition\":[\"pt=1,ds=*\"](读取test表下面,一级分区pt=1下面的所有二级分区); \n" + "\"partition\":[\"pt=1,ds=hangzhou\"](读取test表下面,一级分区pt=1下面,二级分区ds=hz的数据)"), ADS_LOAD_ODPS_FAILED("AdsWriter-07", "ADS从ODPS导数据失败,请联系ADS 技术支持,先检查ADS账号是否已加到该ODPS Project中。ADS账号为:"), INVALID_CONFIG_VALUE("AdsWriter-08", "不合法的配置值."), GET_ADS_TABLE_MEATA_FAILED("AdsWriter-11", "获取ADS table原信息失败"); private final String code; private final String description; private String adsAccount; private AdsWriterErrorCode(String code, String description) { this.code = code; this.description = description; } public void setAdsAccount(String adsAccount) { this.adsAccount = adsAccount; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { if (this.code.equals("AdsWriter-07")){ return String.format("Code:[%s], Description:[%s][%s]. ", this.code, this.description,adsAccount); }else{ return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/ads/ColumnDataType.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.ads; import java.math.BigDecimal; import java.sql.Date; import java.sql.Time; import java.sql.Timestamp; import java.sql.Types; import java.util.ArrayList; import java.util.HashMap; import java.util.List; /** * ADS column data type. * * @since 0.0.1 */ public class ColumnDataType { // public static final int NULL = 0; public static final int BOOLEAN = 1; public static final int BYTE = 2; public static final int SHORT = 3; public static final int INT = 4; public static final int LONG = 5; public static final int DECIMAL = 6; public static final int DOUBLE = 7; public static final int FLOAT = 8; public static final int TIME = 9; public static final int DATE = 10; public static final int TIMESTAMP = 11; public static final int STRING = 13; // public static final int STRING_IGNORECASE = 14; // public static final int STRING_FIXED = 21; public static final int MULTI_VALUE = 22; public static final int TYPE_COUNT = MULTI_VALUE + 1; /** * The list of types. An ArrayList so that Tomcat doesn't set it to null when clearing references. */ private static final ArrayList TYPES = new ArrayList(); private static final HashMap TYPES_BY_NAME = new HashMap(); private static final ArrayList TYPES_BY_VALUE_TYPE = new ArrayList(); /** * @param dataTypes * @return */ public static String getNames(int[] dataTypes) { List names = new ArrayList(dataTypes.length); for (final int dataType : dataTypes) { names.add(ColumnDataType.getDataType(dataType).name); } return names.toString(); } public int type; public String name; public int sqlType; public String jdbc; /** * How closely the data type maps to the corresponding JDBC SQL type (low is best). */ public int sqlTypePos; static { for (int i = 0; i < TYPE_COUNT; i++) { TYPES_BY_VALUE_TYPE.add(null); } // add(NULL, Types.NULL, "Null", new String[] { "NULL" }); add(STRING, Types.VARCHAR, "String", new String[] { "VARCHAR", "VARCHAR2", "NVARCHAR", "NVARCHAR2", "VARCHAR_CASESENSITIVE", "CHARACTER VARYING", "TID" }); add(STRING, Types.LONGVARCHAR, "String", new String[] { "LONGVARCHAR", "LONGNVARCHAR" }); // add(STRING_FIXED, Types.CHAR, "String", new String[] { "CHAR", "CHARACTER", "NCHAR" }); // add(STRING_IGNORECASE, Types.VARCHAR, "String", new String[] { "VARCHAR_IGNORECASE" }); add(BOOLEAN, Types.BOOLEAN, "Boolean", new String[] { "BOOLEAN", "BIT", "BOOL" }); add(BYTE, Types.TINYINT, "Byte", new String[] { "TINYINT" }); add(SHORT, Types.SMALLINT, "Short", new String[] { "SMALLINT", "YEAR", "INT2" }); add(INT, Types.INTEGER, "Int", new String[] { "INTEGER", "INT", "MEDIUMINT", "INT4", "SIGNED" }); add(INT, Types.INTEGER, "Int", new String[] { "SERIAL" }); add(LONG, Types.BIGINT, "Long", new String[] { "BIGINT", "INT8", "LONG" }); add(LONG, Types.BIGINT, "Long", new String[] { "IDENTITY", "BIGSERIAL" }); add(DECIMAL, Types.DECIMAL, "BigDecimal", new String[] { "DECIMAL", "DEC" }); add(DECIMAL, Types.NUMERIC, "BigDecimal", new String[] { "NUMERIC", "NUMBER" }); add(FLOAT, Types.REAL, "Float", new String[] { "REAL", "FLOAT4" }); add(DOUBLE, Types.DOUBLE, "Double", new String[] { "DOUBLE", "DOUBLE PRECISION" }); add(DOUBLE, Types.FLOAT, "Double", new String[] { "FLOAT", "FLOAT8" }); add(TIME, Types.TIME, "Time", new String[] { "TIME" }); add(DATE, Types.DATE, "Date", new String[] { "DATE" }); add(TIMESTAMP, Types.TIMESTAMP, "Timestamp", new String[] { "TIMESTAMP", "DATETIME", "SMALLDATETIME" }); add(MULTI_VALUE, Types.VARCHAR, "String", new String[] { "MULTIVALUE" }); } private static void add(int type, int sqlType, String jdbc, String[] names) { for (int i = 0; i < names.length; i++) { ColumnDataType dt = new ColumnDataType(); dt.type = type; dt.sqlType = sqlType; dt.jdbc = jdbc; dt.name = names[i]; for (ColumnDataType t2 : TYPES) { if (t2.sqlType == dt.sqlType) { dt.sqlTypePos++; } } TYPES_BY_NAME.put(dt.name, dt); if (TYPES_BY_VALUE_TYPE.get(type) == null) { TYPES_BY_VALUE_TYPE.set(type, dt); } TYPES.add(dt); } } /** * Get the list of data types. * * @return the list */ public static ArrayList getTypes() { return TYPES; } /** * Get the name of the Java class for the given value type. * * @param type the value type * @return the class name */ public static String getTypeClassName(int type) { switch (type) { case BOOLEAN: // "java.lang.Boolean"; return Boolean.class.getName(); case BYTE: // "java.lang.Byte"; return Byte.class.getName(); case SHORT: // "java.lang.Short"; return Short.class.getName(); case INT: // "java.lang.Integer"; return Integer.class.getName(); case LONG: // "java.lang.Long"; return Long.class.getName(); case DECIMAL: // "java.math.BigDecimal"; return BigDecimal.class.getName(); case TIME: // "java.sql.Time"; return Time.class.getName(); case DATE: // "java.sql.Date"; return Date.class.getName(); case TIMESTAMP: // "java.sql.Timestamp"; return Timestamp.class.getName(); case STRING: // case STRING_IGNORECASE: // case STRING_FIXED: case MULTI_VALUE: // "java.lang.String"; return String.class.getName(); case DOUBLE: // "java.lang.Double"; return Double.class.getName(); case FLOAT: // "java.lang.Float"; return Float.class.getName(); // case NULL: // return null; default: throw new IllegalArgumentException("type=" + type); } } /** * Get the data type object for the given value type. * * @param type the value type * @return the data type object */ public static ColumnDataType getDataType(int type) { if (type < 0 || type >= TYPE_COUNT) { throw new IllegalArgumentException("type=" + type); } ColumnDataType dt = TYPES_BY_VALUE_TYPE.get(type); // if (dt == null) { // dt = TYPES_BY_VALUE_TYPE.get(NULL); // } return dt; } /** * Convert a value type to a SQL type. * * @param type the value type * @return the SQL type */ public static int convertTypeToSQLType(int type) { return getDataType(type).sqlType; } /** * Convert a SQL type to a value type. * * @param sqlType the SQL type * @return the value type */ public static int convertSQLTypeToValueType(int sqlType) { switch (sqlType) { // case Types.CHAR: // case Types.NCHAR: // return STRING_FIXED; case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: return STRING; case Types.NUMERIC: case Types.DECIMAL: return DECIMAL; case Types.BIT: case Types.BOOLEAN: return BOOLEAN; case Types.INTEGER: return INT; case Types.SMALLINT: return SHORT; case Types.TINYINT: return BYTE; case Types.BIGINT: return LONG; case Types.REAL: return FLOAT; case Types.DOUBLE: case Types.FLOAT: return DOUBLE; case Types.DATE: return DATE; case Types.TIME: return TIME; case Types.TIMESTAMP: return TIMESTAMP; // case Types.NULL: // return NULL; default: throw new IllegalArgumentException("JDBC Type: " + sqlType); } } /** * Get the value type for the given Java class. * * @param x the Java class * @return the value type */ public static int getTypeFromClass(Class x) { // if (x == null || Void.TYPE == x) { // return NULL; // } if (x.isPrimitive()) { x = getNonPrimitiveClass(x); } if (String.class == x) { return STRING; } else if (Integer.class == x) { return INT; } else if (Long.class == x) { return LONG; } else if (Boolean.class == x) { return BOOLEAN; } else if (Double.class == x) { return DOUBLE; } else if (Byte.class == x) { return BYTE; } else if (Short.class == x) { return SHORT; } else if (Float.class == x) { return FLOAT; // } else if (Void.class == x) { // return NULL; } else if (BigDecimal.class.isAssignableFrom(x)) { return DECIMAL; } else if (Date.class.isAssignableFrom(x)) { return DATE; } else if (Time.class.isAssignableFrom(x)) { return TIME; } else if (Timestamp.class.isAssignableFrom(x)) { return TIMESTAMP; } else if (java.util.Date.class.isAssignableFrom(x)) { return TIMESTAMP; } else { throw new IllegalArgumentException("class=" + x); } } /** * Convert primitive class names to java.lang.* class names. * * @param clazz the class (for example: int) * @return the non-primitive class (for example: java.lang.Integer) */ public static Class getNonPrimitiveClass(Class clazz) { if (!clazz.isPrimitive()) { return clazz; } else if (clazz == boolean.class) { return Boolean.class; } else if (clazz == byte.class) { return Byte.class; } else if (clazz == char.class) { return Character.class; } else if (clazz == double.class) { return Double.class; } else if (clazz == float.class) { return Float.class; } else if (clazz == int.class) { return Integer.class; } else if (clazz == long.class) { return Long.class; } else if (clazz == short.class) { return Short.class; } else if (clazz == void.class) { return Void.class; } return clazz; } /** * Get a data type object from a type name. * * @param s the type name * @return the data type object */ public static ColumnDataType getTypeByName(String s) { return TYPES_BY_NAME.get(s); } /** * Check if the given value type is a String (VARCHAR,...). * * @param type the value type * @return true if the value type is a String type */ public static boolean isStringType(int type) { if (type == STRING /* || type == STRING_FIXED || type == STRING_IGNORECASE */ || type == MULTI_VALUE) { return true; } return false; } /** * @return */ public boolean supportsAdd() { return supportsAdd(type); } /** * Check if the given value type supports the add operation. * * @param type the value type * @return true if add is supported */ public static boolean supportsAdd(int type) { switch (type) { case BYTE: case DECIMAL: case DOUBLE: case FLOAT: case INT: case LONG: case SHORT: return true; default: return false; } } /** * Get the data type that will not overflow when calling 'add' 2 billion times. * * @param type the value type * @return the data type that supports adding */ public static int getAddProofType(int type) { switch (type) { case BYTE: return LONG; case FLOAT: return DOUBLE; case INT: return LONG; case LONG: return DECIMAL; case SHORT: return LONG; default: return type; } } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/ads/ColumnInfo.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.ads; /** * ADS column meta.
*

* select ordinal_position,column_name,data_type,type_name,column_comment
* from information_schema.columns
* where table_schema='db_name' and table_name='table_name'
* and is_deleted=0
* order by ordinal_position limit 1000
*

* * @since 0.0.1 */ public class ColumnInfo { private int ordinal; private String name; private ColumnDataType dataType; private boolean isDeleted; private String comment; public int getOrdinal() { return ordinal; } public void setOrdinal(int ordinal) { this.ordinal = ordinal; } public String getName() { return name; } public void setName(String name) { this.name = name; } public ColumnDataType getDataType() { return dataType; } public void setDataType(ColumnDataType dataType) { this.dataType = dataType; } public boolean isDeleted() { return isDeleted; } public void setDeleted(boolean isDeleted) { this.isDeleted = isDeleted; } public String getComment() { return comment; } public void setComment(String comment) { this.comment = comment; } @Override public String toString() { StringBuilder builder = new StringBuilder(); builder.append("ColumnInfo [ordinal=").append(ordinal).append(", name=").append(name).append(", dataType=") .append(dataType).append(", isDeleted=").append(isDeleted).append(", comment=").append(comment) .append("]"); return builder.toString(); } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/ads/TableInfo.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.ads; import java.util.ArrayList; import java.util.List; /** * ADS table meta.
*

* select table_schema, table_name,comments
* from information_schema.tables
* where table_schema='alimama' and table_name='click_af' limit 1
*

*

* select ordinal_position,column_name,data_type,type_name,column_comment
* from information_schema.columns
* where table_schema='db_name' and table_name='table_name'
* and is_deleted=0
* order by ordinal_position limit 1000
*

* * @since 0.0.1 */ public class TableInfo { private String tableSchema; private String tableName; private List columns; private String comments; private String tableType; private String updateType; private String partitionType; private String partitionColumn; private int partitionCount; private List primaryKeyColumns; @Override public String toString() { StringBuilder builder = new StringBuilder(); builder.append("TableInfo [tableSchema=").append(tableSchema).append(", tableName=").append(tableName) .append(", columns=").append(columns).append(", comments=").append(comments).append(",updateType=").append(updateType) .append(",partitionType=").append(partitionType).append(",partitionColumn=").append(partitionColumn).append(",partitionCount=").append(partitionCount) .append(",primaryKeyColumns=").append(primaryKeyColumns).append("]"); return builder.toString(); } public String getTableSchema() { return tableSchema; } public void setTableSchema(String tableSchema) { this.tableSchema = tableSchema; } public String getTableName() { return tableName; } public void setTableName(String tableName) { this.tableName = tableName; } public List getColumns() { return columns; } public List getColumnsNames() { List columnNames = new ArrayList(); for (ColumnInfo column : this.getColumns()) { columnNames.add(column.getName()); } return columnNames; } public void setColumns(List columns) { this.columns = columns; } public String getComments() { return comments; } public void setComments(String comments) { this.comments = comments; } public String getTableType() { return tableType; } public void setTableType(String tableType) { this.tableType = tableType; } public String getUpdateType() { return updateType; } public void setUpdateType(String updateType) { this.updateType = updateType; } public String getPartitionType() { return partitionType; } public void setPartitionType(String partitionType) { this.partitionType = partitionType; } public String getPartitionColumn() { return partitionColumn; } public void setPartitionColumn(String partitionColumn) { this.partitionColumn = partitionColumn; } public int getPartitionCount() { return partitionCount; } public void setPartitionCount(int partitionCount) { this.partitionCount = partitionCount; } public List getPrimaryKeyColumns() { return primaryKeyColumns; } public void setPrimaryKeyColumns(List primaryKeyColumns) { this.primaryKeyColumns = primaryKeyColumns; } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/ads/package-info.java ================================================ /** * ADS meta and service. * * @since 0.0.1 */ package com.alibaba.datax.plugin.writer.adswriter.ads; ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/AdsClientProxy.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.insert; import com.alibaba.cloud.analyticdb.adbclient.AdbClient; import com.alibaba.cloud.analyticdb.adbclient.AdbClientException; import com.alibaba.cloud.analyticdb.adbclient.DatabaseConfig; import com.alibaba.cloud.analyticdb.adbclient.Row; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.transport.record.DefaultRecord; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.writer.adswriter.AdsWriterErrorCode; import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo; import com.alibaba.datax.plugin.writer.adswriter.util.Constant; import com.alibaba.datax.plugin.writer.adswriter.util.Key; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.Types; import java.util.*; public class AdsClientProxy implements AdsProxy { private static final Logger LOG = LoggerFactory.getLogger(AdsClientProxy.class); private String table; private TaskPluginCollector taskPluginCollector; public Configuration configuration; // columnName: private Map> adsTableColumnsMetaData; private Map> userConfigColumnsMetaData; private boolean useRawData[]; private AdbClient adbClient; /** * warn: not support columns as * */ public AdsClientProxy(String table, List columns, Configuration configuration, TaskPluginCollector taskPluginCollector, TableInfo tableInfo) { this.configuration = configuration; this.taskPluginCollector = taskPluginCollector; this.adsTableColumnsMetaData = AdsInsertUtil.getColumnMetaData(tableInfo, columns); this.userConfigColumnsMetaData = new HashMap>(); List adsColumnsNames = tableInfo.getColumnsNames(); // 要使用用户配置的column顺序 this.useRawData = new boolean[columns.size()]; for (int i = 0; i < columns.size(); i++) { String oriEachColumn = columns.get(i); String eachColumn = oriEachColumn; // 防御性保留字 if (eachColumn.startsWith(Constant.ADS_QUOTE_CHARACTER) && eachColumn.endsWith(Constant.ADS_QUOTE_CHARACTER)) { eachColumn = eachColumn.substring(1, eachColumn.length() - 1); } for (String eachAdsColumn : adsColumnsNames) { if (eachColumn.equalsIgnoreCase(eachAdsColumn)) { Pair eachMeta = this.adsTableColumnsMetaData.get(eachAdsColumn); this.userConfigColumnsMetaData.put(oriEachColumn, eachMeta); int columnSqltype = eachMeta.getLeft(); switch (columnSqltype) { case Types.DATE: case Types.TIME: case Types.TIMESTAMP: this.useRawData[i] = false; break; default: this.useRawData[i] = true; break; } } } } DatabaseConfig databaseConfig = new DatabaseConfig(); String url = configuration.getString(Key.ADS_URL); String[] hostAndPort = StringUtils.split(url, ":"); if (hostAndPort.length != 2) { throw DataXException.asDataXException(AdsWriterErrorCode.INVALID_CONFIG_VALUE, "url should be in host:port format!"); } this.table = table.toLowerCase(); databaseConfig.setHost(hostAndPort[0]); databaseConfig.setPort(Integer.parseInt(hostAndPort[1])); databaseConfig.setUser(configuration.getString(Key.USERNAME)); databaseConfig.setPassword(configuration.getString(Key.PASSWORD)); databaseConfig.setDatabase(configuration.getString(Key.SCHEMA)); databaseConfig.setTable(Collections.singletonList(this.table)); databaseConfig.setColumns(this.table, columns); // 如果出现insert失败,是否跳过 boolean ignoreInsertError = configuration.getBool("ignoreInsertError", false); databaseConfig.setIgnoreInsertError(ignoreInsertError); // If the value of column is empty, set null boolean emptyAsNull = configuration.getBool(Key.EMPTY_AS_NULL, false); databaseConfig.setEmptyAsNull(emptyAsNull); // 使用insert ignore into方式进行插入 boolean ignoreInsert = configuration.getBool(Key.IGNORE_INSERT, false); databaseConfig.setInsertIgnore(ignoreInsert); // commit时,写入ADB出现异常时重试的3次 int retryTimes = configuration.getInt(Key.RETRY_CONNECTION_TIME, Constant.DEFAULT_RETRY_TIMES); databaseConfig.setRetryTimes(retryTimes); // 重试间隔的时间为1s,单位是ms int retryIntervalTime = configuration.getInt(Key.RETRY_INTERVAL_TIME, 1000); databaseConfig.setRetryIntervalTime(retryIntervalTime); // 设置自动提交的SQL长度(单位Byte),默认为32KB,一般不建议设置 int commitSize = configuration.getInt("commitSize", 32768); databaseConfig.setCommitSize(commitSize); // sdk默认为true boolean partitionBatch = configuration.getBool("partitionBatch", true); databaseConfig.setPartitionBatch(partitionBatch); // 设置写入adb时的并发线程数,默认4,针对配置的所有表 int parallelNumber = configuration.getInt("parallelNumber", 4); databaseConfig.setParallelNumber(parallelNumber); // 设置client中使用的logger对象,此处使用slf4j.Logger databaseConfig.setLogger(AdsClientProxy.LOG); // 设置在拼接insert sql时是否需要带上字段名,默认为true boolean insertWithColumnName = configuration.getBool("insertWithColumnName", true); databaseConfig.setInsertWithColumnName(insertWithColumnName); // sdk 默认值为true boolean shareDataSource = configuration.getBool("shareDataSource", true); databaseConfig.setShareDataSource(shareDataSource); String password = databaseConfig.getPassword(); databaseConfig.setPassword(password.replaceAll(".", "*")); // 避免敏感信息直接打印 LOG.info("Adb database config is : {}", JSON.toJSONString(databaseConfig)); databaseConfig.setPassword(password); // Initialize AdbClient,初始化实例之后,databaseConfig的配置信息不能再修改 this.adbClient = new AdbClient(databaseConfig); } @Override public void startWriteWithConnection(RecordReceiver recordReceiver, Connection connection, int columnNumber) { try { Record record; while ((record = recordReceiver.getFromReader()) != null) { Row row = new Row(); List values = new ArrayList(); this.prepareColumnTypeValue(record, values); row.setColumnValues(values); try { this.adbClient.addRow(this.table, row); } catch (AdbClientException e) { if (101 == e.getCode()) { for (String each : e.getErrData()) { Record dirtyData = new DefaultRecord(); dirtyData.addColumn(new StringColumn(each)); this.taskPluginCollector.collectDirtyRecord(dirtyData, e.getMessage()); } } else { throw e; } } } try { this.adbClient.commit(); } catch (AdbClientException e) { if (101 == e.getCode()) { for (String each : e.getErrData()) { Record dirtyData = new DefaultRecord(); dirtyData.addColumn(new StringColumn(each)); this.taskPluginCollector.collectDirtyRecord(dirtyData, e.getMessage()); } } else { throw e; } } } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); } finally { DBUtil.closeDBResources(null, null, connection); } } private void prepareColumnTypeValue(Record record, List values) { for (int i = 0; i < this.useRawData.length; i++) { Column column = record.getColumn(i); if (this.useRawData[i]) { values.add(column.getRawData()); } else { values.add(column.asString()); } } } @Override public void closeResource() { try { LOG.info("stop the adbClient"); this.adbClient.stop(); } catch (Exception e) { LOG.warn("stop adbClient meet a exception, ignore it: {}", e.getMessage(), e); } } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/AdsInsertProxy.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.insert; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo; import com.alibaba.datax.plugin.writer.adswriter.util.AdsUtil; import com.alibaba.datax.plugin.writer.adswriter.util.Constant; import com.alibaba.datax.plugin.writer.adswriter.util.Key; import com.mysql.jdbc.JDBC4PreparedStatement; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.Set; import java.util.concurrent.Callable; import java.util.zip.CRC32; import java.util.zip.Checksum; public class AdsInsertProxy implements AdsProxy { private static final Logger LOG = LoggerFactory .getLogger(AdsInsertProxy.class); private static final boolean IS_DEBUG_ENABLE = LOG.isDebugEnabled(); private static final int MAX_EXCEPTION_CAUSE_ITER = 100; private String table; private List columns; private TaskPluginCollector taskPluginCollector; private Configuration configuration; private Boolean emptyAsNull; private String writeMode; private String insertSqlPrefix; private String deleteSqlPrefix; private int opColumnIndex; private String lastDmlMode; // columnName: private Map> adsTableColumnsMetaData; private Map> userConfigColumnsMetaData; // columnName: index @ ads column private Map primaryKeyNameIndexMap; private int retryTimeUpperLimit; private Connection currentConnection; private String partitionColumn; private int partitionColumnIndex = -1; private int partitionCount; public AdsInsertProxy(String table, List columns, Configuration configuration, TaskPluginCollector taskPluginCollector, TableInfo tableInfo) { this.table = table; this.columns = columns; this.configuration = configuration; this.taskPluginCollector = taskPluginCollector; this.emptyAsNull = configuration.getBool(Key.EMPTY_AS_NULL, false); this.writeMode = configuration.getString(Key.WRITE_MODE); this.insertSqlPrefix = String.format(Constant.INSERT_TEMPLATE, this.table, StringUtils.join(columns, ",")); this.deleteSqlPrefix = String.format(Constant.DELETE_TEMPLATE, this.table); this.opColumnIndex = configuration.getInt(Key.OPIndex, 0); this.retryTimeUpperLimit = configuration.getInt( Key.RETRY_CONNECTION_TIME, Constant.DEFAULT_RETRY_TIMES); this.partitionCount = tableInfo.getPartitionCount(); this.partitionColumn = tableInfo.getPartitionColumn(); //目前ads新建的表如果未插入数据不能通过select colums from table where 1=2,获取列信息,需要读取ads数据字典 //not this: this.resultSetMetaData = DBUtil.getColumnMetaData(connection, this.table, StringUtils.join(this.columns, ",")); //no retry here(fetch meta data) 注意实时表列换序的可能 this.adsTableColumnsMetaData = AdsInsertUtil.getColumnMetaData(tableInfo, this.columns); this.userConfigColumnsMetaData = new HashMap>(); List primaryKeyColumnName = tableInfo.getPrimaryKeyColumns(); List adsColumnsNames = tableInfo.getColumnsNames(); this.primaryKeyNameIndexMap = new HashMap(); //warn: 要使用用户配置的column顺序, 不要使用从ads元数据获取的column顺序, 原来复用load列顺序其实有问题的 for (int i = 0; i < this.columns.size(); i++) { String oriEachColumn = this.columns.get(i); String eachColumn = oriEachColumn; // 防御性保留字 if (eachColumn.startsWith(Constant.ADS_QUOTE_CHARACTER) && eachColumn.endsWith(Constant.ADS_QUOTE_CHARACTER)) { eachColumn = eachColumn.substring(1, eachColumn.length() - 1); } for (String eachPrimary : primaryKeyColumnName) { if (eachColumn.equalsIgnoreCase(eachPrimary)) { this.primaryKeyNameIndexMap.put(oriEachColumn, i); } } for (String eachAdsColumn : adsColumnsNames) { if (eachColumn.equalsIgnoreCase(eachAdsColumn)) { this.userConfigColumnsMetaData.put(oriEachColumn, this.adsTableColumnsMetaData.get(eachAdsColumn)); } } // 根据第几个column分区列排序,ads实时表只有一级分区、最多256个分区 if (eachColumn.equalsIgnoreCase(this.partitionColumn)) { this.partitionColumnIndex = i; } } } public void startWriteWithConnection(RecordReceiver recordReceiver, Connection connection, int columnNumber) { this.currentConnection = connection; int batchSize = this.configuration.getInt(Key.BATCH_SIZE, Constant.DEFAULT_BATCH_SIZE); // 默认情况下bufferSize需要和batchSize一致 int bufferSize = this.configuration.getInt(Key.BUFFER_SIZE, batchSize); // insert缓冲,多个分区排序后insert合并发送到ads List writeBuffer = new ArrayList(bufferSize); List deleteBuffer = null; if (this.writeMode.equalsIgnoreCase(Constant.STREAMMODE)) { // delete缓冲,多个分区排序后delete合并发送到ads deleteBuffer = new ArrayList(bufferSize); } try { Record record; while ((record = recordReceiver.getFromReader()) != null) { if (this.writeMode.equalsIgnoreCase(Constant.INSERTMODE)) { if (record.getColumnNumber() != columnNumber) { // 源头读取字段列数与目的表字段写入列数不相等,直接报错 throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "列配置信息有错误. 因为您配置的任务中,源头读取字段数:%s 与 目的表要写入的字段数:%s 不相等. 请检查您的配置并作出修改.", record.getColumnNumber(), columnNumber)); } writeBuffer.add(record); if (writeBuffer.size() >= bufferSize) { this.doBatchRecordWithPartitionSort(writeBuffer, Constant.INSERTMODE, bufferSize, batchSize); writeBuffer.clear(); } } else { if (record.getColumnNumber() != columnNumber + 1) { // 源头读取字段列数需要为目的表字段写入列数+1, 直接报错, 源头多了一列OP throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "列配置信息有错误. 因为您配置的任务中,源头读取字段数:%s 与 目的表要写入的字段数:%s 不满足源头多1列操作类型列. 请检查您的配置并作出修改.", record.getColumnNumber(), columnNumber)); } String optionColumnValue = record.getColumn(this.opColumnIndex).asString(); OperationType operationType = OperationType.asOperationType(optionColumnValue); if (operationType.isInsertTemplate()) { writeBuffer.add(record); if (this.lastDmlMode == null || this.lastDmlMode == Constant.INSERTMODE) { this.lastDmlMode = Constant.INSERTMODE; if (writeBuffer.size() >= bufferSize) { this.doBatchRecordWithPartitionSort(writeBuffer, Constant.INSERTMODE, bufferSize, batchSize); writeBuffer.clear(); } } else { this.lastDmlMode = Constant.INSERTMODE; // 模式变换触发一次提交ads delete, 并进入insert模式 this.doBatchRecordWithPartitionSort(deleteBuffer, Constant.DELETEMODE, bufferSize, batchSize); deleteBuffer.clear(); } } else if (operationType.isDeleteTemplate()) { deleteBuffer.add(record); if (this.lastDmlMode == null || this.lastDmlMode == Constant.DELETEMODE) { this.lastDmlMode = Constant.DELETEMODE; if (deleteBuffer.size() >= bufferSize) { this.doBatchRecordWithPartitionSort(deleteBuffer, Constant.DELETEMODE, bufferSize, batchSize); deleteBuffer.clear(); } } else { this.lastDmlMode = Constant.DELETEMODE; // 模式变换触发一次提交ads insert, 并进入delete模式 this.doBatchRecordWithPartitionSort(writeBuffer, Constant.INSERTMODE, bufferSize, batchSize); writeBuffer.clear(); } } else { // 注意OP操作类型的脏数据, 这里不需要重试 this.taskPluginCollector.collectDirtyRecord(record, String.format("不支持您的更新类型:%s", optionColumnValue)); } } } if (!writeBuffer.isEmpty()) { //doOneRecord(writeBuffer, Constant.INSERTMODE); this.doBatchRecordWithPartitionSort(writeBuffer, Constant.INSERTMODE, bufferSize, batchSize); writeBuffer.clear(); } // 2个缓冲最多一个不为空同时 if (null != deleteBuffer && !deleteBuffer.isEmpty()) { //doOneRecord(deleteBuffer, Constant.DELETEMODE); this.doBatchRecordWithPartitionSort(deleteBuffer, Constant.DELETEMODE, bufferSize, batchSize); deleteBuffer.clear(); } } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } finally { writeBuffer.clear(); DBUtil.closeDBResources(null, null, connection); } } /** * @param bufferSize datax缓冲记录条数 * @param batchSize datax向ads系统一次发送数据条数 * @param buffer datax缓冲区 * @param mode 实时表模式insert 或者 stream */ private void doBatchRecordWithPartitionSort(List buffer, String mode, int bufferSize, int batchSize) throws SQLException { //warn: 排序会影响数据插入顺序, 如果源头没有数据约束, 排序可能造成数据不一致, 快速排序是一种不稳定的排序算法 //warn: 不明确配置bufferSize或者小于batchSize的情况下,不要进行排序;如果缓冲区实际内容条数少于batchSize也不排序了,最后一次的余量 int recordBufferedNumber = buffer.size(); if (bufferSize > batchSize && recordBufferedNumber > batchSize && this.partitionColumnIndex >= 0) { final int partitionColumnIndex = this.partitionColumnIndex; final int partitionCount = this.partitionCount; Collections.sort(buffer, new Comparator() { @Override public int compare(Record record1, Record record2) { int hashPartition1 = AdsInsertProxy.getHashPartition(record1.getColumn(partitionColumnIndex).asString(), partitionCount); int hashPartition2 = AdsInsertProxy.getHashPartition(record2.getColumn(partitionColumnIndex).asString(), partitionCount); return hashPartition1 - hashPartition2; } }); } // 将缓冲区的Record输出到ads, 使用recordBufferedNumber哦 for (int i = 0; i < recordBufferedNumber; i += batchSize) { int toIndex = i + batchSize; if (toIndex > recordBufferedNumber) { toIndex = recordBufferedNumber; } this.doBatchRecord(buffer.subList(i, toIndex), mode); } } private void doBatchRecord(final List buffer, final String mode) throws SQLException { List> retryExceptionClasss = new ArrayList>(); retryExceptionClasss.add(com.mysql.jdbc.exceptions.jdbc4.CommunicationsException.class); retryExceptionClasss.add(java.net.SocketException.class); try { RetryUtil.executeWithRetry(new Callable() { @Override public Boolean call() throws Exception { doBatchRecordDml(buffer, mode); return true; } }, this.retryTimeUpperLimit, 2000L, true, retryExceptionClasss); } catch (SQLException e) { LOG.warn(String.format("after retry %s times, doBatchRecord meet a exception: ", this.retryTimeUpperLimit), e); LOG.info("try to re execute for each record..."); doOneRecord(buffer, mode); // below is the old way // for (Record eachRecord : buffer) { // this.taskPluginCollector.collectDirtyRecord(eachRecord, e); // } } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } } //warn: ADS 无法支持事物roll back都是不管用 @SuppressWarnings("resource") private void doBatchRecordDml(List buffer, String mode) throws Exception { Statement statement = null; String sql = null; try { int bufferSize = buffer.size(); if (buffer.isEmpty()) { return; } StringBuilder sqlSb = new StringBuilder(); // connection.setAutoCommit(true); //mysql impl warn: if a database access error occurs or this method is called on a closed connection throw SQLException statement = this.currentConnection.createStatement(); sqlSb.append(this.generateDmlSql(this.currentConnection, buffer.get(0), mode)); for (int i = 1; i < bufferSize; i++) { Record record = buffer.get(i); this.appendDmlSqlValues(this.currentConnection, record, sqlSb, mode); } sql = sqlSb.toString(); if (IS_DEBUG_ENABLE) { LOG.debug(sql); } @SuppressWarnings("unused") int status = statement.executeUpdate(sql); sql = null; } catch (SQLException e) { LOG.warn("doBatchRecordDml meet a exception: " + sql, e); Exception eachException = e; int maxIter = 0;// 避免死循环 while (null != eachException && maxIter < AdsInsertProxy.MAX_EXCEPTION_CAUSE_ITER) { if (this.isRetryable(eachException)) { LOG.warn("doBatchRecordDml meet a retry exception: " + e.getMessage()); this.currentConnection = AdsUtil.getAdsConnect(this.configuration); throw eachException; } else { try { Throwable causeThrowable = eachException.getCause(); eachException = causeThrowable == null ? null : (Exception) causeThrowable; } catch (Exception castException) { LOG.warn("doBatchRecordDml meet a no! retry exception: " + e.getMessage()); throw e; } } maxIter++; } throw e; } catch (Exception e) { LOG.error("插入异常, sql: " + sql); throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } finally { DBUtil.closeDBResources(statement, null); } } private void doOneRecord(List buffer, final String mode) { List> retryExceptionClasss = new ArrayList>(); retryExceptionClasss.add(com.mysql.jdbc.exceptions.jdbc4.CommunicationsException.class); retryExceptionClasss.add(java.net.SocketException.class); for (final Record record : buffer) { try { RetryUtil.executeWithRetry(new Callable() { @Override public Boolean call() throws Exception { doOneRecordDml(record, mode); return true; } }, this.retryTimeUpperLimit, 2000L, true, retryExceptionClasss); } catch (Exception e) { // 不能重试的一行,记录脏数据 this.taskPluginCollector.collectDirtyRecord(record, e); } } } @SuppressWarnings("resource") private void doOneRecordDml(Record record, String mode) throws Exception { Statement statement = null; String sql = null; try { // connection.setAutoCommit(true); statement = this.currentConnection.createStatement(); sql = generateDmlSql(this.currentConnection, record, mode); if (IS_DEBUG_ENABLE) { LOG.debug(sql); } @SuppressWarnings("unused") int status = statement.executeUpdate(sql); sql = null; } catch (SQLException e) { LOG.error("doOneDml meet a exception: " + sql, e); //need retry before record dirty data //this.taskPluginCollector.collectDirtyRecord(record, e); // 更新当前可用连接 Exception eachException = e; int maxIter = 0;// 避免死循环 while (null != eachException && maxIter < AdsInsertProxy.MAX_EXCEPTION_CAUSE_ITER) { if (this.isRetryable(eachException)) { LOG.warn("doOneDml meet a retry exception: " + e.getMessage()); this.currentConnection = AdsUtil.getAdsConnect(this.configuration); throw eachException; } else { try { Throwable causeThrowable = eachException.getCause(); eachException = causeThrowable == null ? null : (Exception) causeThrowable; } catch (Exception castException) { LOG.warn("doOneDml meet a no! retry exception: " + e.getMessage()); throw e; } } maxIter++; } throw e; } catch (Exception e) { LOG.error("插入异常, sql: " + sql); throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } finally { DBUtil.closeDBResources(statement, null); } } private boolean isRetryable(Throwable e) { Class meetExceptionClass = e.getClass(); if (meetExceptionClass == com.mysql.jdbc.exceptions.jdbc4.CommunicationsException.class) { return true; } if (meetExceptionClass == java.net.SocketException.class) { return true; } return false; } private String generateDmlSql(Connection connection, Record record, String mode) throws SQLException { String sql = null; StringBuilder sqlSb = new StringBuilder(); if (mode.equalsIgnoreCase(Constant.INSERTMODE)) { sqlSb.append(this.insertSqlPrefix); sqlSb.append("("); int columnsSize = this.columns.size(); for (int i = 0; i < columnsSize; i++) { if ((i + 1) != columnsSize) { sqlSb.append("?,"); } else { sqlSb.append("?"); } } sqlSb.append(")"); //mysql impl warn: if a database access error occurs or this method is called on a closed connection PreparedStatement statement = connection.prepareStatement(sqlSb.toString()); for (int i = 0; i < this.columns.size(); i++) { int preparedParamsIndex = i; if (Constant.STREAMMODE.equalsIgnoreCase(this.writeMode)) { if (preparedParamsIndex >= this.opColumnIndex) { preparedParamsIndex = i + 1; } } String columnName = this.columns.get(i); int columnSqltype = this.userConfigColumnsMetaData.get(columnName).getLeft(); prepareColumnTypeValue(statement, columnSqltype, record.getColumn(preparedParamsIndex), i, columnName); } sql = ((JDBC4PreparedStatement) statement).asSql(); DBUtil.closeDBResources(statement, null); } else { sqlSb.append(this.deleteSqlPrefix); sqlSb.append("("); Set> primaryEntrySet = this.primaryKeyNameIndexMap.entrySet(); int entrySetSize = primaryEntrySet.size(); int i = 0; for (Entry eachEntry : primaryEntrySet) { if ((i + 1) != entrySetSize) { sqlSb.append(String.format(" (%s = ?) and ", eachEntry.getKey())); } else { sqlSb.append(String.format(" (%s = ?) ", eachEntry.getKey())); } i++; } sqlSb.append(")"); //mysql impl warn: if a database access error occurs or this method is called on a closed connection PreparedStatement statement = connection.prepareStatement(sqlSb.toString()); i = 0; //ads的real time表只能是1级分区、且分区列类型是long, 但是这里是需要主键删除的 for (Entry each : primaryEntrySet) { String columnName = each.getKey(); int columnSqlType = this.userConfigColumnsMetaData.get(columnName).getLeft(); int primaryKeyInUserConfigIndex = this.primaryKeyNameIndexMap.get(columnName); if (primaryKeyInUserConfigIndex >= this.opColumnIndex) { primaryKeyInUserConfigIndex++; } prepareColumnTypeValue(statement, columnSqlType, record.getColumn(primaryKeyInUserConfigIndex), i, columnName); i++; } sql = ((JDBC4PreparedStatement) statement).asSql(); DBUtil.closeDBResources(statement, null); } return sql; } private void appendDmlSqlValues(Connection connection, Record record, StringBuilder sqlSb, String mode) throws SQLException { String sqlResult = this.generateDmlSql(connection, record, mode); if (mode.equalsIgnoreCase(Constant.INSERTMODE)) { sqlSb.append(","); sqlSb.append(sqlResult.substring(this.insertSqlPrefix.length())); } else { // 之前已经充分增加过括号了 sqlSb.append(" or "); sqlSb.append(sqlResult.substring(this.deleteSqlPrefix.length())); } } private void prepareColumnTypeValue(PreparedStatement statement, int columnSqltype, Column column, int preparedPatamIndex, String columnName) throws SQLException { java.util.Date utilDate; switch (columnSqltype) { case Types.CHAR: case Types.NCHAR: case Types.CLOB: case Types.NCLOB: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: String strValue = column.asString(); statement.setString(preparedPatamIndex + 1, strValue); break; case Types.SMALLINT: case Types.INTEGER: case Types.BIGINT: case Types.NUMERIC: case Types.DECIMAL: case Types.REAL: String numValue = column.asString(); if (emptyAsNull && "".equals(numValue) || numValue == null) { //statement.setObject(preparedPatamIndex + 1, null); statement.setNull(preparedPatamIndex + 1, Types.BIGINT); } else { statement.setLong(preparedPatamIndex + 1, column.asLong()); } break; case Types.FLOAT: case Types.DOUBLE: String floatValue = column.asString(); if (emptyAsNull && "".equals(floatValue) || floatValue == null) { //statement.setObject(preparedPatamIndex + 1, null); statement.setNull(preparedPatamIndex + 1, Types.DOUBLE); } else { statement.setDouble(preparedPatamIndex + 1, column.asDouble()); } break; //tinyint is a little special in some database like mysql {boolean->tinyint(1)} case Types.TINYINT: Long longValue = column.asLong(); if (null == longValue) { statement.setNull(preparedPatamIndex + 1, Types.BIGINT); } else { statement.setLong(preparedPatamIndex + 1, longValue); } break; case Types.DATE: java.sql.Date sqlDate = null; try { if ("".equals(column.getRawData())) { utilDate = null; } else { utilDate = column.asDate(); } } catch (DataXException e) { throw new SQLException(String.format( "Date 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlDate = new java.sql.Date(utilDate.getTime()); } statement.setDate(preparedPatamIndex + 1, sqlDate); break; case Types.TIME: java.sql.Time sqlTime = null; try { if ("".equals(column.getRawData())) { utilDate = null; } else { utilDate = column.asDate(); } } catch (DataXException e) { throw new SQLException(String.format( "TIME 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlTime = new java.sql.Time(utilDate.getTime()); } statement.setTime(preparedPatamIndex + 1, sqlTime); break; case Types.TIMESTAMP: java.sql.Timestamp sqlTimestamp = null; try { if ("".equals(column.getRawData())) { utilDate = null; } else { utilDate = column.asDate(); } } catch (DataXException e) { throw new SQLException(String.format( "TIMESTAMP 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlTimestamp = new java.sql.Timestamp( utilDate.getTime()); } statement.setTimestamp(preparedPatamIndex + 1, sqlTimestamp); break; case Types.BOOLEAN: //case Types.BIT: ads 没有bit Boolean booleanValue = column.asBoolean(); if (null == booleanValue) { statement.setNull(preparedPatamIndex + 1, Types.BOOLEAN); } else { statement.setBoolean(preparedPatamIndex + 1, booleanValue); } break; default: Pair columnMetaPair = this.userConfigColumnsMetaData.get(columnName); throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%s], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.", columnName, columnMetaPair.getRight(), columnMetaPair.getLeft())); } } private static int getHashPartition(String value, int totalHashPartitionNum) { long crc32 = (value == null ? getCRC32("-1") : getCRC32(value)); return (int) (crc32 % totalHashPartitionNum); } private static long getCRC32(String value) { Checksum checksum = new CRC32(); byte[] bytes = value.getBytes(); checksum.update(bytes, 0, bytes.length); return checksum.getValue(); } @Override public void closeResource() { } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/AdsInsertUtil.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.insert; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.ListUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.writer.adswriter.AdsException; import com.alibaba.datax.plugin.writer.adswriter.AdsWriterErrorCode; import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnInfo; import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo; import com.alibaba.datax.plugin.writer.adswriter.load.AdsHelper; import com.alibaba.datax.plugin.writer.adswriter.util.AdsUtil; import com.alibaba.datax.plugin.writer.adswriter.util.Constant; import com.alibaba.datax.plugin.writer.adswriter.util.Key; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.ImmutablePair; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; public class AdsInsertUtil { private static final Logger LOG = LoggerFactory .getLogger(AdsInsertUtil.class); public static TableInfo getAdsTableInfo(Configuration conf) { AdsHelper adsHelper = AdsUtil.createAdsHelper(conf); TableInfo tableInfo= null; try { tableInfo = adsHelper.getTableInfo(conf.getString(Key.ADS_TABLE)); } catch (AdsException e) { throw DataXException.asDataXException(AdsWriterErrorCode.GET_ADS_TABLE_MEATA_FAILED, e); } return tableInfo; } /* * 返回列顺序为ads建表列顺序 * */ public static List getAdsTableColumnNames(Configuration conf) { List tableColumns = new ArrayList(); AdsHelper adsHelper = AdsUtil.createAdsHelper(conf); TableInfo tableInfo= null; String adsTable = conf.getString(Key.ADS_TABLE); try { tableInfo = adsHelper.getTableInfo(adsTable); } catch (AdsException e) { throw DataXException.asDataXException(AdsWriterErrorCode.GET_ADS_TABLE_MEATA_FAILED, e); } List columnInfos = tableInfo.getColumns(); for(ColumnInfo columnInfo: columnInfos) { tableColumns.add(columnInfo.getName()); } LOG.info("table:[{}] all columns:[\n{}\n].", adsTable, StringUtils.join(tableColumns, ",")); return tableColumns; } public static Map> getColumnMetaData (Configuration configuration, List userColumns) { Map> columnMetaData = new HashMap>(); List columnInfoList = getAdsTableColumns(configuration); for(String column : userColumns) { if (column.startsWith(Constant.ADS_QUOTE_CHARACTER) && column.endsWith(Constant.ADS_QUOTE_CHARACTER)) { column = column.substring(1, column.length() - 1); } for (ColumnInfo columnInfo : columnInfoList) { if(column.equalsIgnoreCase(columnInfo.getName())) { Pair eachPair = new ImmutablePair(columnInfo.getDataType().sqlType, columnInfo.getDataType().name); columnMetaData.put(columnInfo.getName(), eachPair); } } } return columnMetaData; } public static Map> getColumnMetaData(TableInfo tableInfo, List userColumns){ Map> columnMetaData = new HashMap>(); List columnInfoList = tableInfo.getColumns(); for(String column : userColumns) { if (column.startsWith(Constant.ADS_QUOTE_CHARACTER) && column.endsWith(Constant.ADS_QUOTE_CHARACTER)) { column = column.substring(1, column.length() - 1); } for (ColumnInfo columnInfo : columnInfoList) { if(column.equalsIgnoreCase(columnInfo.getName())) { Pair eachPair = new ImmutablePair(columnInfo.getDataType().sqlType, columnInfo.getDataType().name); columnMetaData.put(columnInfo.getName(), eachPair); } } } return columnMetaData; } /* * 返回列顺序为ads建表列顺序 * */ public static List getAdsTableColumns(Configuration conf) { AdsHelper adsHelper = AdsUtil.createAdsHelper(conf); TableInfo tableInfo= null; String adsTable = conf.getString(Key.ADS_TABLE); try { tableInfo = adsHelper.getTableInfo(adsTable); } catch (AdsException e) { throw DataXException.asDataXException(AdsWriterErrorCode.GET_ADS_TABLE_MEATA_FAILED, e); } List columnInfos = tableInfo.getColumns(); return columnInfos; } public static void dealColumnConf(Configuration originalConfig, List tableColumns) { List userConfiguredColumns = originalConfig.getList(Key.COLUMN, String.class); if (null == userConfiguredColumns || userConfiguredColumns.isEmpty()) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "您的配置文件中的列配置信息有误. 因为您未配置写入数据库表的列名称,DataX获取不到列信息. 请检查您的配置并作出修改."); } else { if (1 == userConfiguredColumns.size() && "*".equals(userConfiguredColumns.get(0))) { LOG.warn("您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改."); // 回填其值,需要以 String 的方式转交后续处理 originalConfig.set(Key.COLUMN, tableColumns); } else if (userConfiguredColumns.size() > tableColumns.size()) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, String.format("您的配置文件中的列配置信息有误. 因为您所配置的写入数据库表的字段个数:%s 大于目的表的总字段总个数:%s. 请检查您的配置并作出修改.", userConfiguredColumns.size(), tableColumns.size())); } else { // 确保用户配置的 column 不重复 ListUtil.makeSureNoValueDuplicate(userConfiguredColumns, false); // 检查列是否都为数据库表中正确的列(通过执行一次 select column from table 进行判断) // ListUtil.makeSureBInA(tableColumns, userConfiguredColumns, true); // 支持关键字和保留字, ads列是不区分大小写的 List removeQuotedColumns = new ArrayList(); for (String each : userConfiguredColumns) { if (each.startsWith(Constant.ADS_QUOTE_CHARACTER) && each.endsWith(Constant.ADS_QUOTE_CHARACTER)) { removeQuotedColumns.add(each.substring(1, each.length() - 1)); } else { removeQuotedColumns.add(each); } } ListUtil.makeSureBInA(tableColumns, removeQuotedColumns, false); } } } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/AdsProxy.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.insert; import com.alibaba.datax.common.plugin.RecordReceiver; import java.sql.Connection; public interface AdsProxy { public abstract void startWriteWithConnection(RecordReceiver recordReceiver, Connection connection, int columnNumber); public void closeResource(); } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/insert/OperationType.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.insert; public enum OperationType { // i: insert uo:before image uu:before image un: after image d: delete // u:update I("i"), UO("uo"), UU("uu"), UN("un"), D("d"), U("u"), UNKNOWN("unknown"), ; private OperationType(String type) { this.type = type; } private String type; public String getType() { return this.type; } public static OperationType asOperationType(String type) { if ("i".equalsIgnoreCase(type)) { return I; } else if ("uo".equalsIgnoreCase(type)) { return UO; } else if ("uu".equalsIgnoreCase(type)) { return UU; } else if ("un".equalsIgnoreCase(type)) { return UN; } else if ("d".equalsIgnoreCase(type)) { return D; } else if ("u".equalsIgnoreCase(type)) { return U; } else { return UNKNOWN; } } public boolean isInsertTemplate() { switch (this) { // 建议merge 过后应该只有I和U两类 case I: case UO: case UU: case UN: case U: return true; case D: return false; default: return false; } } public boolean isDeleteTemplate() { switch (this) { // 建议merge 过后应该只有I和U两类 case I: case UO: case UU: case UN: case U: return false; case D: return true; default: return false; } } public boolean isLegal() { return this.type != UNKNOWN.getType(); } @Override public String toString() { return this.name(); } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/AdsHelper.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.adswriter.load; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.writer.adswriter.AdsException; import com.alibaba.datax.plugin.writer.adswriter.AdsWriterErrorCode; import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnDataType; import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnInfo; import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo; import com.alibaba.datax.plugin.writer.adswriter.util.AdsUtil; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.Properties; import java.util.concurrent.Callable; public class AdsHelper { private static final Logger LOG = LoggerFactory .getLogger(AdsHelper.class); private String adsURL; private String userName; private String password; private String schema; private Long socketTimeout; private String suffix; public AdsHelper(String adsUrl, String userName, String password, String schema, Long socketTimeout, String suffix) { this.adsURL = adsUrl; this.userName = userName; this.password = password; this.schema = schema; this.socketTimeout = socketTimeout; this.suffix = suffix; } public String getAdsURL() { return adsURL; } public void setAdsURL(String adsURL) { this.adsURL = adsURL; } public String getUserName() { return userName; } public void setUserName(String userName) { this.userName = userName; } public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } public String getSchema() { return schema; } public void setSchema(String schema) { this.schema = schema; } /** * Obtain the table meta information. * * @param table The table * @return The table meta information * @throws com.alibaba.datax.plugin.writer.adswriter.AdsException */ public TableInfo getTableInfo(String table) throws AdsException { if (table == null) { throw new AdsException(AdsException.ADS_TABLEMETA_TABLE_NULL, "Table is null.", null); } if (adsURL == null) { throw new AdsException(AdsException.ADS_CONN_URL_NOT_SET, "ADS JDBC connection URL was not set.", null); } if (userName == null) { throw new AdsException(AdsException.ADS_CONN_USERNAME_NOT_SET, "ADS JDBC connection user name was not set.", null); } if (password == null) { throw new AdsException(AdsException.ADS_CONN_PASSWORD_NOT_SET, "ADS JDBC connection password was not set.", null); } if (schema == null) { throw new AdsException(AdsException.ADS_CONN_SCHEMA_NOT_SET, "ADS JDBC connection schema was not set.", null); } Connection connection = null; Statement statement = null; ResultSet rs = null; try { Class.forName("com.mysql.jdbc.Driver"); String url = AdsUtil.prepareJdbcUrl(this.adsURL, this.schema, this.socketTimeout, this.suffix); Properties connectionProps = new Properties(); connectionProps.put("user", userName); connectionProps.put("password", password); connection = DriverManager.getConnection(url, connectionProps); statement = connection.createStatement(); // ads 表名、schema名不区分大小写, 提高用户易用性, 注意列顺序性 String columnMetaSql = String.format("select ordinal_position,column_name,data_type,type_name,column_comment from information_schema.columns where table_schema = `'%s'` and table_name = `'%s'` order by ordinal_position", schema.toLowerCase(), table.toLowerCase()); LOG.info(String.format("检查列信息sql语句:%s", columnMetaSql)); rs = statement.executeQuery(columnMetaSql); TableInfo tableInfo = new TableInfo(); List columnInfoList = new ArrayList(); while (DBUtil.asyncResultSetNext(rs)) { ColumnInfo columnInfo = new ColumnInfo(); columnInfo.setOrdinal(rs.getInt(1)); columnInfo.setName(rs.getString(2)); //columnInfo.setDataType(ColumnDataType.getDataType(rs.getInt(3))); //for ads version < 0.7 //columnInfo.setDataType(ColumnDataType.getTypeByName(rs.getString(3).toUpperCase())); //for ads version 0.8 columnInfo.setDataType(ColumnDataType.getTypeByName(rs.getString(4).toUpperCase())); //for ads version 0.8 & 0.7 columnInfo.setComment(rs.getString(5)); columnInfoList.add(columnInfo); } if (columnInfoList.isEmpty()) { throw DataXException.asDataXException(AdsWriterErrorCode.NO_ADS_TABLE, table + "不存在或者查询不到列信息. "); } tableInfo.setColumns(columnInfoList); tableInfo.setTableSchema(schema); tableInfo.setTableName(table); DBUtil.closeDBResources(rs, statement, null); String tableMetaSql = String.format("select update_type, partition_type, partition_column, partition_count, primary_key_columns from information_schema.tables where table_schema = `'%s'` and table_name = `'%s'`", schema.toLowerCase(), table.toLowerCase()); LOG.info(String.format("检查表信息sql语句:%s", tableMetaSql)); statement = connection.createStatement(); rs = statement.executeQuery(tableMetaSql); while (DBUtil.asyncResultSetNext(rs)) { tableInfo.setUpdateType(rs.getString(1)); tableInfo.setPartitionType(rs.getString(2)); tableInfo.setPartitionColumn(rs.getString(3)); tableInfo.setPartitionCount(rs.getInt(4)); //primary_key_columns ads主键是逗号分隔的,可以有多个 String primaryKeyColumns = rs.getString(5); if (StringUtils.isNotBlank(primaryKeyColumns)) { tableInfo.setPrimaryKeyColumns(Arrays.asList(StringUtils.split(primaryKeyColumns, ","))); } else { tableInfo.setPrimaryKeyColumns(null); } break; } DBUtil.closeDBResources(rs, statement, null); return tableInfo; } catch (ClassNotFoundException e) { throw new AdsException(AdsException.OTHER, e.getMessage(), e); } catch (SQLException e) { throw new AdsException(AdsException.OTHER, e.getMessage(), e); } catch ( DataXException e) { throw e; } catch (Exception e) { throw new AdsException(AdsException.OTHER, e.getMessage(), e); } finally { if (rs != null) { try { rs.close(); } catch (SQLException e) { // Ignore exception } } if (statement != null) { try { statement.close(); } catch (SQLException e) { // Ignore exception } } if (connection != null) { try { connection.close(); } catch (SQLException e) { // Ignore exception } } } } /** * Submit LOAD DATA command. * * @param table The target ADS table * @param partition The partition option in the form of "(partition_name,...)" * @param sourcePath The source path * @param overwrite * @return * @throws AdsException */ public String loadData(String table, String partition, String sourcePath, boolean overwrite) throws AdsException { if (table == null) { throw new AdsException(AdsException.ADS_LOADDATA_TABLE_NULL, "ADS LOAD DATA table is null.", null); } if (sourcePath == null) { throw new AdsException(AdsException.ADS_LOADDATA_SOURCEPATH_NULL, "ADS LOAD DATA source path is null.", null); } if (adsURL == null) { throw new AdsException(AdsException.ADS_CONN_URL_NOT_SET, "ADS JDBC connection URL was not set.", null); } if (userName == null) { throw new AdsException(AdsException.ADS_CONN_USERNAME_NOT_SET, "ADS JDBC connection user name was not set.", null); } if (password == null) { throw new AdsException(AdsException.ADS_CONN_PASSWORD_NOT_SET, "ADS JDBC connection password was not set.", null); } if (schema == null) { throw new AdsException(AdsException.ADS_CONN_SCHEMA_NOT_SET, "ADS JDBC connection schema was not set.", null); } StringBuilder sb = new StringBuilder(); sb.append("LOAD DATA FROM "); if (sourcePath.startsWith("'") && sourcePath.endsWith("'")) { sb.append(sourcePath); } else { sb.append("'" + sourcePath + "'"); } if (overwrite) { sb.append(" OVERWRITE"); } sb.append(" INTO TABLE "); sb.append(schema + "." + table); if (partition != null && !partition.trim().equals("")) { String partitionTrim = partition.trim(); if(partitionTrim.startsWith("(") && partitionTrim.endsWith(")")) { sb.append(" PARTITION " + partition); } else { sb.append(" PARTITION " + "(" + partition + ")"); } } Connection connection = null; Statement statement = null; ResultSet rs = null; try { Class.forName("com.mysql.jdbc.Driver"); String url = AdsUtil.prepareJdbcUrl(this.adsURL, this.schema, this.socketTimeout, this.suffix); Properties connectionProps = new Properties(); connectionProps.put("user", userName); connectionProps.put("password", password); connection = DriverManager.getConnection(url, connectionProps); statement = connection.createStatement(); LOG.info("正在从ODPS数据库导数据到ADS中: "+sb.toString()); LOG.info("由于ADS的限制,ADS导数据最少需要20分钟,请耐心等待"); rs = statement.executeQuery(sb.toString()); String jobId = null; while (DBUtil.asyncResultSetNext(rs)) { jobId = rs.getString(1); } if (jobId == null) { throw new AdsException(AdsException.ADS_LOADDATA_JOBID_NOT_AVAIL, "Job id is not available for the submitted LOAD DATA." + jobId, null); } return jobId; } catch (ClassNotFoundException e) { throw new AdsException(AdsException.ADS_LOADDATA_FAILED, e.getMessage(), e); } catch (SQLException e) { throw new AdsException(AdsException.ADS_LOADDATA_FAILED, e.getMessage(), e); } catch (Exception e) { throw new AdsException(AdsException.ADS_LOADDATA_FAILED, e.getMessage(), e); } finally { if (rs != null) { try { rs.close(); } catch (SQLException e) { // Ignore exception } } if (statement != null) { try { statement.close(); } catch (SQLException e) { // Ignore exception } } if (connection != null) { try { connection.close(); } catch (SQLException e) { // Ignore exception } } } } /** * Check the load data job status. * * @param jobId The job id to * @return true if load data job succeeded, false if load data job failed. * @throws AdsException */ public boolean checkLoadDataJobStatus(String jobId) throws AdsException { if (adsURL == null) { throw new AdsException(AdsException.ADS_CONN_URL_NOT_SET, "ADS JDBC connection URL was not set.", null); } if (userName == null) { throw new AdsException(AdsException.ADS_CONN_USERNAME_NOT_SET, "ADS JDBC connection user name was not set.", null); } if (password == null) { throw new AdsException(AdsException.ADS_CONN_PASSWORD_NOT_SET, "ADS JDBC connection password was not set.", null); } if (schema == null) { throw new AdsException(AdsException.ADS_CONN_SCHEMA_NOT_SET, "ADS JDBC connection schema was not set.", null); } try { String state = this.checkLoadDataJobStatusWithRetry(jobId); if (state == null) { throw new AdsException(AdsException.JOB_NOT_EXIST, "Target job does not exist for id: " + jobId, null); } if (state.equals("SUCCEEDED")) { return true; } else if (state.equals("FAILED")) { throw new AdsException(AdsException.JOB_FAILED, "Target job failed for id: " + jobId, null); } else { return false; } } catch (Exception e) { throw new AdsException(AdsException.OTHER, e.getMessage(), e); } } private String checkLoadDataJobStatusWithRetry(final String jobId) throws AdsException { try { Class.forName("com.mysql.jdbc.Driver"); final String finalAdsUrl = this.adsURL; final String finalSchema = this.schema; final Long finalSocketTimeout = this.socketTimeout; final String suffix = this.suffix; return RetryUtil.executeWithRetry(new Callable() { @Override public String call() throws Exception { Connection connection = null; Statement statement = null; ResultSet rs = null; try { String url = AdsUtil.prepareJdbcUrl(finalAdsUrl, finalSchema, finalSocketTimeout, suffix); Properties connectionProps = new Properties(); connectionProps.put("user", userName); connectionProps.put("password", password); connection = DriverManager.getConnection(url, connectionProps); statement = connection.createStatement(); String sql = "select state from information_schema.job_instances where job_id like '" + jobId + "'"; rs = statement.executeQuery(sql); String state = null; while (DBUtil.asyncResultSetNext(rs)) { state = rs.getString(1); } return state; } finally { if (rs != null) { try { rs.close(); } catch (SQLException e) { // Ignore exception } } if (statement != null) { try { statement.close(); } catch (SQLException e) { // Ignore exception } } if (connection != null) { try { connection.close(); } catch (SQLException e) { // Ignore exception } } } } }, 3, 1000L, true); } catch (Exception e) { throw new AdsException(AdsException.OTHER, e.getMessage(), e); } } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/TableMetaHelper.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.load; import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnDataType; import com.alibaba.datax.plugin.writer.adswriter.ads.ColumnInfo; import com.alibaba.datax.plugin.writer.adswriter.ads.TableInfo; import com.alibaba.datax.plugin.writer.adswriter.odps.DataType; import com.alibaba.datax.plugin.writer.adswriter.odps.FieldSchema; import com.alibaba.datax.plugin.writer.adswriter.odps.TableMeta; import java.util.ArrayList; import java.util.List; import java.util.Random; /** * Table meta helper for ADS writer. * * @since 0.0.1 */ public class TableMetaHelper { private TableMetaHelper() { } /** * Create temporary ODPS table. * * @param tableMeta table meta * @param lifeCycle for temporary table * @return ODPS temporary table meta */ public static TableMeta createTempODPSTable(TableInfo tableMeta, int lifeCycle) { TableMeta tempTable = new TableMeta(); tempTable.setComment(tableMeta.getComments()); tempTable.setLifeCycle(lifeCycle); String tableSchema = tableMeta.getTableSchema(); String tableName = tableMeta.getTableName(); tempTable.setTableName(generateTempTableName(tableSchema, tableName)); List tempColumns = new ArrayList(); List columns = tableMeta.getColumns(); for (ColumnInfo column : columns) { FieldSchema tempColumn = new FieldSchema(); tempColumn.setName(column.getName()); tempColumn.setType(toODPSDataType(column.getDataType())); tempColumn.setComment(column.getComment()); tempColumns.add(tempColumn); } tempTable.setCols(tempColumns); tempTable.setPartitionKeys(null); return tempTable; } private static String toODPSDataType(ColumnDataType columnDataType) { int type; switch (columnDataType.type) { case ColumnDataType.BOOLEAN: type = DataType.STRING; break; case ColumnDataType.BYTE: case ColumnDataType.SHORT: case ColumnDataType.INT: case ColumnDataType.LONG: type = DataType.INTEGER; break; case ColumnDataType.DECIMAL: case ColumnDataType.DOUBLE: case ColumnDataType.FLOAT: type = DataType.DOUBLE; break; case ColumnDataType.DATE: case ColumnDataType.TIME: case ColumnDataType.TIMESTAMP: case ColumnDataType.STRING: case ColumnDataType.MULTI_VALUE: type = DataType.STRING; break; default: throw new IllegalArgumentException("columnDataType=" + columnDataType); } return DataType.toString(type); } private static String generateTempTableName(String tableSchema, String tableName) { int randNum = 1000 + new Random(System.currentTimeMillis()).nextInt(1000); return tableSchema + "__" + tableName + "_" + System.currentTimeMillis() + randNum; } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/TransferProjectConf.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.load; import com.alibaba.datax.common.util.Configuration; /** * Created by xiafei.qiuxf on 15/4/13. */ public class TransferProjectConf { public final static String KEY_ACCESS_ID = "odps.accessId"; public final static String KEY_ACCESS_KEY = "odps.accessKey"; public final static String KEY_ACCOUNT = "odps.account"; public final static String KEY_ODPS_SERVER = "odps.odpsServer"; public final static String KEY_ODPS_TUNNEL = "odps.tunnelServer"; public final static String KEY_PROJECT = "odps.project"; private String accessId; private String accessKey; private String account; private String odpsServer; private String odpsTunnel; private String project; public static TransferProjectConf create(Configuration adsWriterConf) { TransferProjectConf res = new TransferProjectConf(); res.accessId = adsWriterConf.getString(KEY_ACCESS_ID); res.accessKey = adsWriterConf.getString(KEY_ACCESS_KEY); res.account = adsWriterConf.getString(KEY_ACCOUNT); res.odpsServer = adsWriterConf.getString(KEY_ODPS_SERVER); res.odpsTunnel = adsWriterConf.getString(KEY_ODPS_TUNNEL); res.project = adsWriterConf.getString(KEY_PROJECT); return res; } public String getAccessId() { return accessId; } public String getAccessKey() { return accessKey; } public String getAccount() { return account; } public String getOdpsServer() { return odpsServer; } public String getOdpsTunnel() { return odpsTunnel; } public String getProject() { return project; } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/odps/DataType.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.odps; /** * ODPS 数据类型. *

* 当前定义了如下类型: *

    *
  • INTEGER *
  • DOUBLE *
  • BOOLEAN *
  • STRING *
  • DATETIME *
*

* * @since 0.0.1 */ public class DataType { public final static byte INTEGER = 0; public final static byte DOUBLE = 1; public final static byte BOOLEAN = 2; public final static byte STRING = 3; public final static byte DATETIME = 4; public static String toString(int type) { switch (type) { case INTEGER: return "bigint"; case DOUBLE: return "double"; case BOOLEAN: return "boolean"; case STRING: return "string"; case DATETIME: return "datetime"; default: throw new IllegalArgumentException("type=" + type); } } /** * 字符串的数据类型转换为byte常量定义的数据类型. *

* 转换规则: *

    *
  • tinyint, int, bigint, long - {@link #INTEGER} *
  • double, float - {@link #DOUBLE} *
  • string - {@link #STRING} *
  • boolean, bool - {@link #BOOLEAN} *
  • datetime - {@link #DATETIME} *
*

* * @param type 字符串的数据类型 * @return byte常量定义的数据类型 * @throws IllegalArgumentException */ public static byte convertToDataType(String type) throws IllegalArgumentException { type = type.toLowerCase().trim(); if ("string".equals(type)) { return STRING; } else if ("bigint".equals(type) || "int".equals(type) || "tinyint".equals(type) || "long".equals(type)) { return INTEGER; } else if ("boolean".equals(type) || "bool".equals(type)) { return BOOLEAN; } else if ("double".equals(type) || "float".equals(type)) { return DOUBLE; } else if ("datetime".equals(type)) { return DATETIME; } else { throw new IllegalArgumentException("unknown type: " + type); } } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/odps/FieldSchema.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.odps; /** * ODPS列属性,包含列名和类型 列名和类型与SQL的DESC表或分区显示的列名和类型一致 * * @since 0.0.1 */ public class FieldSchema { /** 列名 */ private String name; /** 列类型,如:string, bigint, boolean, datetime等等 */ private String type; private String comment; public String getName() { return name; } public void setName(String name) { this.name = name; } public String getType() { return type; } public void setType(String type) { this.type = type; } public String getComment() { return comment; } public void setComment(String comment) { this.comment = comment; } @Override public String toString() { StringBuilder builder = new StringBuilder(); builder.append("FieldSchema [name=").append(name).append(", type=").append(type).append(", comment=") .append(comment).append("]"); return builder.toString(); } /** * @return "col_name data_type [COMMENT col_comment]" */ public String toDDL() { StringBuilder builder = new StringBuilder(); builder.append(name).append(" ").append(type); String comment = this.comment; if (comment != null && comment.length() > 0) { builder.append(" ").append("COMMENT \"" + comment + "\""); } return builder.toString(); } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/odps/TableMeta.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.odps; import java.util.Iterator; import java.util.List; /** * ODPS table meta. * * @since 0.0.1 */ public class TableMeta { private String tableName; private List cols; private List partitionKeys; private int lifeCycle; private String comment; public String getTableName() { return tableName; } public void setTableName(String tableName) { this.tableName = tableName; } public List getCols() { return cols; } public void setCols(List cols) { this.cols = cols; } public List getPartitionKeys() { return partitionKeys; } public void setPartitionKeys(List partitionKeys) { this.partitionKeys = partitionKeys; } public int getLifeCycle() { return lifeCycle; } public void setLifeCycle(int lifeCycle) { this.lifeCycle = lifeCycle; } public String getComment() { return comment; } public void setComment(String comment) { this.comment = comment; } @Override public String toString() { StringBuilder builder = new StringBuilder(); builder.append("TableMeta [tableName=").append(tableName).append(", cols=").append(cols) .append(", partitionKeys=").append(partitionKeys).append(", lifeCycle=").append(lifeCycle) .append(", comment=").append(comment).append("]"); return builder.toString(); } /** * @return
* "CREATE TABLE [IF NOT EXISTS] table_name
* [(col_name data_type [COMMENT col_comment], ...)]
* [COMMENT table_comment]
* [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
* [LIFECYCLE days]
* [AS select_statement] "
*/ public String toDDL() { StringBuilder builder = new StringBuilder(); builder.append("CREATE TABLE " + tableName).append(" "); List cols = this.cols; if (cols != null && cols.size() > 0) { builder.append("(").append(toDDL(cols)).append(")").append(" "); } String comment = this.comment; if (comment != null && comment.length() > 0) { builder.append("COMMENT \"" + comment + "\" "); } List partitionKeys = this.partitionKeys; if (partitionKeys != null && partitionKeys.size() > 0) { builder.append("PARTITIONED BY "); builder.append("(").append(toDDL(partitionKeys)).append(")").append(" "); } if (lifeCycle > 0) { builder.append("LIFECYCLE " + lifeCycle).append(" "); } builder.append(";"); return builder.toString(); } private String toDDL(List cols) { StringBuilder builder = new StringBuilder(); Iterator iter = cols.iterator(); builder.append(iter.next().toDDL()); while (iter.hasNext()) { builder.append(", ").append(iter.next().toDDL()); } return builder.toString(); } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/odps/package-info.java ================================================ /** * ODPS meta. * * @since 0.0.1 */ package com.alibaba.datax.plugin.writer.adswriter.odps; ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/package-info.java ================================================ /** * ADS Writer. * * @since 0.0.1 */ package com.alibaba.datax.plugin.writer.adswriter; ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/util/AdsUtil.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.writer.adswriter.load.AdsHelper; import com.alibaba.datax.plugin.writer.adswriter.AdsWriterErrorCode; import com.alibaba.datax.plugin.writer.adswriter.load.TransferProjectConf; import com.alibaba.datax.plugin.writer.adswriter.odps.FieldSchema; import com.alibaba.datax.plugin.writer.adswriter.odps.TableMeta; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.List; public class AdsUtil { private static final Logger LOG = LoggerFactory.getLogger(AdsUtil.class); /*检查配置文件中必填的配置项是否都已填 * */ public static void checkNecessaryConfig(Configuration originalConfig, String writeMode) { //检查ADS必要参数 originalConfig.getNecessaryValue(Key.ADS_URL, AdsWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.USERNAME, AdsWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.PASSWORD, AdsWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.SCHEMA, AdsWriterErrorCode.REQUIRED_VALUE); if(Constant.LOADMODE.equals(writeMode)) { originalConfig.getNecessaryValue(Key.Life_CYCLE, AdsWriterErrorCode.REQUIRED_VALUE); Integer lifeCycle = originalConfig.getInt(Key.Life_CYCLE); if (lifeCycle <= 0) { throw DataXException.asDataXException(AdsWriterErrorCode.INVALID_CONFIG_VALUE, "配置项[lifeCycle]的值必须大于零."); } originalConfig.getNecessaryValue(Key.ADS_TABLE, AdsWriterErrorCode.REQUIRED_VALUE); Boolean overwrite = originalConfig.getBool(Key.OVER_WRITE); if (overwrite == null) { throw DataXException.asDataXException(AdsWriterErrorCode.REQUIRED_VALUE, "配置项[overWrite]是必填项."); } } if (Constant.STREAMMODE.equalsIgnoreCase(writeMode)) { originalConfig.getNecessaryValue(Key.OPIndex, AdsWriterErrorCode.REQUIRED_VALUE); } } /*生成AdsHelp实例 * */ public static AdsHelper createAdsHelper(Configuration originalConfig){ //Get adsUrl,userName,password,schema等参数,创建AdsHelp实例 String adsUrl = originalConfig.getString(Key.ADS_URL); String userName = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); String schema = originalConfig.getString(Key.SCHEMA); Long socketTimeout = originalConfig.getLong(Key.SOCKET_TIMEOUT, Constant.DEFAULT_SOCKET_TIMEOUT); String suffix = originalConfig.getString(Key.JDBC_URL_SUFFIX, ""); return new AdsHelper(adsUrl,userName,password,schema,socketTimeout,suffix); } public static AdsHelper createAdsHelperWithOdpsAccount(Configuration originalConfig) { String adsUrl = originalConfig.getString(Key.ADS_URL); String userName = originalConfig.getString(TransferProjectConf.KEY_ACCESS_ID); String password = originalConfig.getString(TransferProjectConf.KEY_ACCESS_KEY); String schema = originalConfig.getString(Key.SCHEMA); Long socketTimeout = originalConfig.getLong(Key.SOCKET_TIMEOUT, Constant.DEFAULT_SOCKET_TIMEOUT); String suffix = originalConfig.getString(Key.JDBC_URL_SUFFIX, ""); return new AdsHelper(adsUrl, userName, password, schema,socketTimeout,suffix); } /*生成ODPSWriter Plugin所需要的配置文件 * */ public static Configuration generateConf(Configuration originalConfig, String odpsTableName, TableMeta tableMeta, TransferProjectConf transConf){ Configuration newConfig = originalConfig.clone(); newConfig.set(Key.ODPSTABLENAME, odpsTableName); newConfig.set(Key.ODPS_SERVER, transConf.getOdpsServer()); newConfig.set(Key.TUNNEL_SERVER,transConf.getOdpsTunnel()); newConfig.set(Key.ACCESS_ID,transConf.getAccessId()); newConfig.set(Key.ACCESS_KEY,transConf.getAccessKey()); newConfig.set(Key.PROJECT,transConf.getProject()); newConfig.set(Key.TRUNCATE, true); newConfig.set(Key.PARTITION,null); // newConfig.remove(Key.PARTITION); List cols = tableMeta.getCols(); List allColumns = new ArrayList(); if(cols != null && !cols.isEmpty()){ for(FieldSchema col:cols){ allColumns.add(col.getName()); } } newConfig.set(Key.COLUMN,allColumns); return newConfig; } /*生成ADS数据导入时的source_path * */ public static String generateSourcePath(String project, String tmpOdpsTableName, String odpsPartition){ StringBuilder builder = new StringBuilder(); String partition = transferOdpsPartitionToAds(odpsPartition); builder.append("odps://").append(project).append("/").append(tmpOdpsTableName); if(odpsPartition != null && !odpsPartition.isEmpty()){ builder.append("/").append(partition); } return builder.toString(); } public static String transferOdpsPartitionToAds(String odpsPartition){ if(odpsPartition == null || odpsPartition.isEmpty()) return null; String adsPartition = formatPartition(odpsPartition);; String[] partitions = adsPartition.split("/"); for(int last = partitions.length; last > 0; last--){ String partitionPart = partitions[last-1]; String newPart = partitionPart.replace(".*", "*").replace("*", ".*"); if(newPart.split("=")[1].equals(".*")){ adsPartition = adsPartition.substring(0,adsPartition.length()-partitionPart.length()); }else{ break; } if(adsPartition.endsWith("/")){ adsPartition = adsPartition.substring(0,adsPartition.length()-1); } } if (adsPartition.contains("*")) throw DataXException.asDataXException(AdsWriterErrorCode.ODPS_PARTITION_FAILED, ""); return adsPartition; } public static String formatPartition(String partition) { return partition.trim().replaceAll(" *= *", "=") .replaceAll(" */ *", ",").replaceAll(" *, *", ",") .replaceAll("'", "").replaceAll(",", "/"); } public static String prepareJdbcUrl(Configuration conf) { String adsURL = conf.getString(Key.ADS_URL); String schema = conf.getString(Key.SCHEMA); Long socketTimeout = conf.getLong(Key.SOCKET_TIMEOUT, Constant.DEFAULT_SOCKET_TIMEOUT); String suffix = conf.getString(Key.JDBC_URL_SUFFIX, ""); return AdsUtil.prepareJdbcUrl(adsURL, schema, socketTimeout, suffix); } public static String prepareJdbcUrl(String adsURL, String schema, Long socketTimeout, String suffix) { String jdbcUrl = null; // like autoReconnect=true&failOverReadOnly=false&maxReconnects=10 if (StringUtils.isNotBlank(suffix)) { jdbcUrl = String .format("jdbc:mysql://%s/%s?useUnicode=true&characterEncoding=UTF-8&socketTimeout=%s&%s", adsURL, schema, socketTimeout, suffix); } else { jdbcUrl = String .format("jdbc:mysql://%s/%s?useUnicode=true&characterEncoding=UTF-8&socketTimeout=%s", adsURL, schema, socketTimeout); } return jdbcUrl; } public static Connection getAdsConnect(Configuration conf) { String userName = conf.getString(Key.USERNAME); String passWord = conf.getString(Key.PASSWORD); String jdbcUrl = AdsUtil.prepareJdbcUrl(conf); Connection connection = DBUtil.getConnection(DataBaseType.ADS, jdbcUrl, userName, passWord); return connection; } } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/util/Constant.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.util; public class Constant { public static final String LOADMODE = "load"; public static final String INSERTMODE = "insert"; public static final String DELETEMODE = "delete"; public static final String REPLACEMODE = "replace"; public static final String STREAMMODE = "stream"; public static final int DEFAULT_BATCH_SIZE = 32; public static final long DEFAULT_SOCKET_TIMEOUT = 3600000L; public static final int DEFAULT_RETRY_TIMES = 3; public static final String INSERT_TEMPLATE = "insert into %s ( %s ) values "; public static final String DELETE_TEMPLATE = "delete from %s where "; public static final String ADS_TABLE_INFO = "adsTableInfo"; public static final String ADS_QUOTE_CHARACTER = "`"; } ================================================ FILE: adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/util/Key.java ================================================ package com.alibaba.datax.plugin.writer.adswriter.util; public final class Key { public final static String ADS_URL = "url"; public final static String USERNAME = "username"; public final static String PASSWORD = "password"; public final static String SCHEMA = "schema"; public final static String ADS_TABLE = "table"; public final static String Life_CYCLE = "lifeCycle"; public final static String OVER_WRITE = "overWrite"; public final static String WRITE_MODE = "writeMode"; public final static String COLUMN = "column"; public final static String OPIndex = "opIndex"; public final static String EMPTY_AS_NULL = "emptyAsNull"; public final static String BATCH_SIZE = "batchSize"; public final static String BUFFER_SIZE = "bufferSize"; public final static String IGNORE_INSERT = "ignoreInsert"; public final static String PRE_SQL = "preSql"; public final static String POST_SQL = "postSql"; public final static String SOCKET_TIMEOUT = "socketTimeout"; public final static String RETRY_CONNECTION_TIME = "retryTimes"; public final static String RETRY_INTERVAL_TIME = "retryIntervalTime"; public final static String JDBC_URL_SUFFIX = "urlSuffix"; /** * 以下是odps writer的key */ public final static String PARTITION = "partition"; public final static String ODPSTABLENAME = "table"; public final static String ODPS_SERVER = "odpsServer"; public final static String TUNNEL_SERVER = "tunnelServer"; public final static String ACCESS_ID = "accessId"; public final static String ACCESS_KEY = "accessKey"; public final static String PROJECT = "project"; public final static String TRUNCATE = "truncate"; } ================================================ FILE: adswriter/src/main/resources/plugin.json ================================================ { "name": "adswriter", "class": "com.alibaba.datax.plugin.writer.adswriter.AdsWriter", "description": "", "developer": "alibaba" } ================================================ FILE: adswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "adswriter", "parameter": { "url": "", "username": "", "password": "", "schema": "", "table": "", "partition": "", "overWrite": "", "lifeCycle": 2 } } ================================================ FILE: cassandrareader/doc/cassandrareader.md ================================================ # CassandraReader 插件文档 ___ ## 1 快速介绍 CassandraReader插件实现了从Cassandra读取数据。在底层实现上,CassandraReader通过datastax的java driver连接Cassandra实例,并执行相应的cql语句将数据从cassandra中SELECT出来。 ## 2 实现原理 简而言之,CassandraReader通过java driver连接到Cassandra实例,并根据用户配置的信息生成查询SELECT CQL语句,然后发送到Cassandra,并将该CQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column的信息,CassandraReader将其拼接为CQL语句发送到Cassandra。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Cassandra同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { "channel": 3 } }, "content": [ { "reader": { "name": "cassandrareader", "parameter": { "host": "localhost", "port": 9042, "useSSL": false, "keyspace": "test", "table": "datax_src", "column": [ "textCol", "blobCol", "writetime(blobCol)", "boolCol", "smallintCol", "tinyintCol", "intCol", "bigintCol", "varintCol", "floatCol", "doubleCol", "decimalCol", "dateCol", "timeCol", "timeStampCol", "uuidCol", "inetCol", "durationCol", "listCol", "mapCol", "setCol" "tupleCol" "udtCol", ] } }, "writer": { "name": "streamwriter", "parameter": { "print":true } } } ] } } ``` ### 3.2 参数说明 * **host** * 描述:Cassandra连接点的域名或ip,多个node之间用逗号分隔。
* 必选:是
* 默认值:无
* **port** * 描述:Cassandra端口。
* 必选:是
* 默认值:9042
* **username** * 描述:数据源的用户名
* 必选:否
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:否
* 默认值:无
* **useSSL** * 描述:是否使用SSL连接。
* 必选:否
* 默认值:false
* **keyspace** * 描述:需要同步的表所在的keyspace。
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要同步的表。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列集合。
其中的元素可以指定列的名称或writetime(column_name),后一种形式会读取column_name列的时间戳而不是数据。 * 必选:是
* 默认值:无
* **where** * 描述:数据筛选条件的cql表达式,例如:
``` "where":"textcol='a'" ``` * 必选:否
* 默认值:无
* **allowFiltering** * 描述:是否在服务端过滤数据。参考cassandra文档中ALLOW FILTERING关键字的相关描述。
* 必选:否
* 默认值:无
* **consistancyLevel** * 描述:数据一致性级别。可选ONE|QUORUM|LOCAL_QUORUM|EACH_QUORUM|ALL|ANY|TWO|THREE|LOCAL_ONE
* 必选:否
* 默认值:LOCAL_QUORUM
### 3.3 类型转换 目前CassandraReader支持除counter和Custom类型之外的所有类型。 下面列出CassandraReader针对Cassandra类型转换列表: | DataX 内部类型| Cassandra 数据类型 | | -------- | ----- | | Long |int, tinyint, smallint,varint,bigint,time| | Double |float, double, decimal| | String |ascii,varchar, text,uuid,timeuuid,duration,list,map,set,tuple,udt,inet | | Date |date, timestamp | | Boolean |bool | | Bytes |blob | 请注意: * 目前不支持counter类型和custom类型。 ## 4 性能报告 略 ## 5 约束限制 ### 5.1 主备同步数据恢复问题 略 ## 6 FAQ ================================================ FILE: cassandrareader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT cassandrareader cassandrareader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.datastax.cassandra cassandra-driver-core 3.7.2 shaded com.google.guava guava com.google.guava guava 16.0.1 commons-codec commons-codec 1.9 junit junit test com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-service-face org.apache.hadoop hadoop-common org.apache.hive hive-exec org.apache.hive hive-serde javolution javolution test org.mockito mockito-all 1.9.5 test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: cassandrareader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/cassandrareader target/ cassandrareader-0.0.1-SNAPSHOT.jar plugin/reader/cassandrareader false plugin/reader/cassandrareader/libs runtime ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/CassandraReader.java ================================================ package com.alibaba.datax.plugin.reader.cassandrareader; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.datastax.driver.core.Cluster; import com.datastax.driver.core.ConsistencyLevel; import com.datastax.driver.core.ResultSet; import com.datastax.driver.core.Row; import com.datastax.driver.core.Session; import com.datastax.driver.core.SimpleStatement; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class CassandraReader extends Reader { private static final Logger LOG = LoggerFactory .getLogger(CassandraReader.class); public static class Job extends Reader.Job { private Configuration jobConfig = null; private Cluster cluster = null; @Override public void init() { this.jobConfig = super.getPluginJobConf(); this.jobConfig = super.getPluginJobConf(); String username = jobConfig.getString(Key.USERNAME); String password = jobConfig.getString(Key.PASSWORD); String hosts = jobConfig.getString(Key.HOST); Integer port = jobConfig.getInt(Key.PORT,9042); boolean useSSL = jobConfig.getBool(Key.USESSL); if ((username != null) && !username.isEmpty()) { Cluster.Builder clusterBuilder = Cluster.builder().withCredentials(username, password) .withPort(Integer.valueOf(port)).addContactPoints(hosts.split(",")); if (useSSL) { clusterBuilder = clusterBuilder.withSSL(); } cluster = clusterBuilder.build(); } else { cluster = Cluster.builder().withPort(Integer.valueOf(port)) .addContactPoints(hosts.split(",")).build(); } CassandraReaderHelper.checkConfig(jobConfig,cluster); } @Override public void destroy() { } @Override public List split(int adviceNumber) { List splittedConfigs = CassandraReaderHelper.splitJob(adviceNumber,jobConfig,cluster); return splittedConfigs; } } public static class Task extends Reader.Task { private Configuration taskConfig; private Cluster cluster = null; private Session session = null; private String queryString = null; private ConsistencyLevel consistencyLevel; private int columnNumber = 0; private List columnMeta = null; @Override public void init() { this.taskConfig = super.getPluginJobConf(); String username = taskConfig.getString(Key.USERNAME); String password = taskConfig.getString(Key.PASSWORD); String hosts = taskConfig.getString(Key.HOST); Integer port = taskConfig.getInt(Key.PORT); boolean useSSL = taskConfig.getBool(Key.USESSL); String keyspace = taskConfig.getString(Key.KEYSPACE); this.columnMeta = taskConfig.getList(Key.COLUMN,String.class); columnNumber = columnMeta.size(); if ((username != null) && !username.isEmpty()) { Cluster.Builder clusterBuilder = Cluster.builder().withCredentials(username, password) .withPort(Integer.valueOf(port)).addContactPoints(hosts.split(",")); if (useSSL) { clusterBuilder = clusterBuilder.withSSL(); } cluster = clusterBuilder.build(); } else { cluster = Cluster.builder().withPort(Integer.valueOf(port)) .addContactPoints(hosts.split(",")).build(); } session = cluster.connect(keyspace); String cl = taskConfig.getString(Key.CONSITANCY_LEVEL); if( cl != null && !cl.isEmpty() ) { consistencyLevel = ConsistencyLevel.valueOf(cl); } else { consistencyLevel = ConsistencyLevel.LOCAL_QUORUM; } queryString = CassandraReaderHelper.getQueryString(taskConfig,cluster); LOG.info("query = " + queryString); } @Override public void startRead(RecordSender recordSender) { ResultSet r = session.execute(new SimpleStatement(queryString).setConsistencyLevel(consistencyLevel)); for (Row row : r ) { Record record = recordSender.createRecord(); record = CassandraReaderHelper.buildRecord(record,row,r.getColumnDefinitions(),columnNumber, super.getTaskPluginCollector()); if( record != null ) recordSender.sendToWriter(record); } } @Override public void destroy() { } } } ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/CassandraReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.cassandrareader; import com.alibaba.datax.common.spi.ErrorCode; public enum CassandraReaderErrorCode implements ErrorCode { CONF_ERROR("CassandraReader-00", "配置错误."), ; private final String code; private final String description; private CassandraReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/CassandraReaderHelper.java ================================================ package com.alibaba.datax.plugin.reader.cassandrareader; import java.math.BigDecimal; import java.math.BigInteger; import java.net.InetAddress; import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.Arrays; import java.util.Date; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; import com.alibaba.datax.common.element.BoolColumn; import com.alibaba.datax.common.element.BytesColumn; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.datastax.driver.core.Cluster; import com.datastax.driver.core.CodecRegistry; import com.datastax.driver.core.ColumnDefinitions; import com.datastax.driver.core.ColumnMetadata; import com.datastax.driver.core.DataType; import com.datastax.driver.core.Duration; import com.datastax.driver.core.LocalDate; import com.datastax.driver.core.Row; import com.datastax.driver.core.TableMetadata; import com.datastax.driver.core.TupleType; import com.datastax.driver.core.TupleValue; import com.datastax.driver.core.UDTValue; import com.datastax.driver.core.UserType; import com.google.common.reflect.TypeToken; import org.apache.commons.codec.binary.Base64; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * Created by mazhenlin on 2019/8/21. */ public class CassandraReaderHelper { static CodecRegistry registry = new CodecRegistry(); private static final Logger LOG = LoggerFactory .getLogger(CassandraReader.class); static class TypeNotSupported extends Exception{} static String toJSonString(Object o, DataType type ) throws Exception{ if( o == null ) return JSON.toJSONString(null); switch (type.getName()) { case LIST: case MAP: case SET: case TUPLE: case UDT: return JSON.toJSONString(transferObjectForJson(o,type)); default: return JSON.toJSONString(o); } } static Object transferObjectForJson(Object o,DataType type) throws TypeNotSupported{ if( o == null ) return o; switch (type.getName()) { case ASCII: case TEXT: case VARCHAR: case BOOLEAN: case SMALLINT: case TINYINT: case INT: case BIGINT: case VARINT: case FLOAT: case DOUBLE: case DECIMAL: case UUID: case TIMEUUID: case TIME: return o; case BLOB: ByteBuffer byteBuffer = (ByteBuffer)o; String s = Base64.encodeBase64String( Arrays.copyOfRange(byteBuffer.array(),byteBuffer.position(), byteBuffer.limit())); return s; case DATE: return ((LocalDate)o).getMillisSinceEpoch(); case TIMESTAMP: return ((Date)o).getTime(); case DURATION: return o.toString(); case INET: return ((InetAddress)o).getHostAddress(); case LIST: { return transferListForJson((List)o,type.getTypeArguments().get(0)); } case MAP: { DataType keyType = type.getTypeArguments().get(0); DataType valType = type.getTypeArguments().get(1); return transferMapForJson((Map)o,keyType,valType); } case SET: { return transferSetForJson((Set)o, type.getTypeArguments().get(0)); } case TUPLE: { return transferTupleForJson((TupleValue)o,((TupleType)type).getComponentTypes()); } case UDT: { return transferUDTForJson((UDTValue)o); } default: throw new TypeNotSupported(); } } static List transferListForJson(List clist, DataType eleType) throws TypeNotSupported { List result = new ArrayList(); switch (eleType.getName()) { case ASCII: case TEXT: case VARCHAR: case BOOLEAN: case SMALLINT: case TINYINT: case INT: case BIGINT: case VARINT: case FLOAT: case DOUBLE: case DECIMAL: case TIME: case UUID: case TIMEUUID: return clist; case BLOB: case DATE: case TIMESTAMP: case DURATION: case INET: case LIST: case MAP: case SET: case TUPLE: case UDT: for (Object item : clist) { Object newItem = transferObjectForJson(item, eleType); result.add(newItem); } break; default: throw new TypeNotSupported(); } return result; } static Set transferSetForJson(Set cset,DataType eleType) throws TypeNotSupported{ Set result = new HashSet(); switch (eleType.getName()) { case ASCII: case TEXT: case VARCHAR: case BOOLEAN: case SMALLINT: case TINYINT: case INT: case BIGINT: case VARINT: case FLOAT: case DOUBLE: case DECIMAL: case TIME: case UUID: case TIMEUUID: return cset; case BLOB: case DATE: case TIMESTAMP: case DURATION: case INET: case LIST: case MAP: case SET: case TUPLE: case UDT: for (Object item : cset) { Object newItem = transferObjectForJson(item,eleType); result.add(newItem); } break; default: throw new TypeNotSupported(); } return result; } static Map transferMapForJson(Map cmap,DataType keyType,DataType valueType) throws TypeNotSupported { Map newMap = new HashMap(); for( Object e : cmap.entrySet() ) { Object k = ((Map.Entry)e).getKey(); Object v = ((Map.Entry)e).getValue(); Object newKey = transferObjectForJson(k,keyType); Object newValue = transferObjectForJson(v,valueType); if( !(newKey instanceof String) ) { newKey = JSON.toJSONString(newKey); } newMap.put(newKey,newValue); } return newMap; } static List transferTupleForJson(TupleValue tupleValue,List componentTypes) throws TypeNotSupported { List l = new ArrayList(); for (int j = 0; j < componentTypes.size(); j++ ) { DataType dataType = componentTypes.get(j); TypeToken eltClass = registry.codecFor(dataType).getJavaType(); Object ele = tupleValue.get(j,eltClass); l.add(transferObjectForJson(ele,dataType)); } return l; } static Map transferUDTForJson(UDTValue udtValue) throws TypeNotSupported { Map newMap = new HashMap(); int j = 0; for (UserType.Field f : udtValue.getType()) { DataType dataType = f.getType(); TypeToken eltClass = registry.codecFor(dataType).getJavaType(); Object ele = udtValue.get(j, eltClass); newMap.put(f.getName(),transferObjectForJson(ele,dataType)); j++; } return newMap; } static Record buildRecord(Record record, Row rs, ColumnDefinitions metaData, int columnNumber, TaskPluginCollector taskPluginCollector) { try { for (int i = 0; i < columnNumber; i++) try { if (rs.isNull(i)) { record.addColumn(new StringColumn()); continue; } switch (metaData.getType(i).getName()) { case ASCII: case TEXT: case VARCHAR: record.addColumn(new StringColumn(rs.getString(i))); break; case BLOB: record.addColumn(new BytesColumn(rs.getBytes(i).array())); break; case BOOLEAN: record.addColumn(new BoolColumn(rs.getBool(i))); break; case SMALLINT: record.addColumn(new LongColumn((int)rs.getShort(i))); break; case TINYINT: record.addColumn(new LongColumn((int)rs.getByte(i))); break; case INT: record.addColumn(new LongColumn(rs.getInt(i))); break; case COUNTER: case BIGINT: record.addColumn(new LongColumn(rs.getLong(i))); break; case VARINT: record.addColumn(new LongColumn(rs.getVarint(i))); break; case FLOAT: record.addColumn(new DoubleColumn(rs.getFloat(i))); break; case DOUBLE: record.addColumn(new DoubleColumn(rs.getDouble(i))); break; case DECIMAL: record.addColumn(new DoubleColumn(rs.getDecimal(i))); break; case DATE: record.addColumn(new DateColumn(rs.getDate(i).getMillisSinceEpoch())); break; case TIME: record.addColumn(new LongColumn(rs.getTime(i))); break; case TIMESTAMP: record.addColumn(new DateColumn(rs.getTimestamp(i))); break; case UUID: case TIMEUUID: record.addColumn(new StringColumn(rs.getUUID(i).toString())); break; case INET: record.addColumn(new StringColumn(rs.getInet(i).getHostAddress())); break; case DURATION: record.addColumn(new StringColumn(rs.get(i,Duration.class).toString())); break; case LIST: { TypeToken listEltClass = registry.codecFor(metaData.getType(i).getTypeArguments().get(0)).getJavaType(); List l = rs.getList(i, listEltClass); record.addColumn(new StringColumn(toJSonString(l,metaData.getType(i)))); } break; case MAP: { DataType keyType = metaData.getType(i).getTypeArguments().get(0); DataType valType = metaData.getType(i).getTypeArguments().get(1); TypeToken keyEltClass = registry.codecFor(keyType).getJavaType(); TypeToken valEltClass = registry.codecFor(valType).getJavaType(); Map m = rs.getMap(i, keyEltClass, valEltClass); record.addColumn(new StringColumn(toJSonString(m,metaData.getType(i)))); } break; case SET: { TypeToken setEltClass = registry.codecFor(metaData.getType(i).getTypeArguments().get(0)) .getJavaType(); Set set = rs.getSet(i, setEltClass); record.addColumn(new StringColumn(toJSonString(set,metaData.getType(i)))); } break; case TUPLE: { TupleValue t = rs.getTupleValue(i); record.addColumn(new StringColumn(toJSonString(t,metaData.getType(i)))); } break; case UDT: { UDTValue t = rs.getUDTValue(i); record.addColumn(new StringColumn(toJSonString(t,metaData.getType(i)))); } break; default: throw DataXException .asDataXException( CassandraReaderErrorCode.CONF_ERROR, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库读取这种字段类型. 字段名:[%s], " + "字段类型:[%s]. ", metaData.getName(i), metaData.getType(i))); } } catch (TypeNotSupported t) { throw DataXException .asDataXException( CassandraReaderErrorCode.CONF_ERROR, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库读取这种字段类型. 字段名:[%s], " + "字段类型:[%s]. ", metaData.getName(i), metaData.getType(i))); } } catch (Exception e) { //TODO 这里识别为脏数据靠谱吗? taskPluginCollector.collectDirtyRecord(record, e); if (e instanceof DataXException) { throw (DataXException) e; } return null; } return record; } public static List splitJob(int adviceNumber,Configuration jobConfig,Cluster cluster) { List splittedConfigs = new ArrayList(); if( adviceNumber <= 1 ) { splittedConfigs.add(jobConfig); return splittedConfigs; } String where = jobConfig.getString(Key.WHERE); if(where != null && where.toLowerCase().contains("token(")) { splittedConfigs.add(jobConfig); return splittedConfigs; } String partitioner = cluster.getMetadata().getPartitioner(); if( partitioner.endsWith("RandomPartitioner")) { BigDecimal minToken = BigDecimal.valueOf(-1); BigDecimal maxToken = new BigDecimal(new BigInteger("2").pow(127)); BigDecimal step = maxToken.subtract(minToken) .divide(BigDecimal.valueOf(adviceNumber),2, BigDecimal.ROUND_HALF_EVEN); for ( int i = 0; i < adviceNumber; i++ ) { BigInteger l = minToken.add(step.multiply(BigDecimal.valueOf(i))).toBigInteger(); BigInteger r = minToken.add(step.multiply(BigDecimal.valueOf(i+1))).toBigInteger(); if( i == adviceNumber - 1 ) { r = maxToken.toBigInteger(); } Configuration taskConfig = jobConfig.clone(); taskConfig.set(Key.MIN_TOKEN,l.toString()); taskConfig.set(Key.MAX_TOKEN,r.toString()); splittedConfigs.add(taskConfig); } } else if( partitioner.endsWith("Murmur3Partitioner") ) { BigDecimal minToken = BigDecimal.valueOf(Long.MIN_VALUE); BigDecimal maxToken = BigDecimal.valueOf(Long.MAX_VALUE); BigDecimal step = maxToken.subtract(minToken) .divide(BigDecimal.valueOf(adviceNumber),2, BigDecimal.ROUND_HALF_EVEN); for ( int i = 0; i < adviceNumber; i++ ) { long l = minToken.add(step.multiply(BigDecimal.valueOf(i))).longValue(); long r = minToken.add(step.multiply(BigDecimal.valueOf(i+1))).longValue(); if( i == adviceNumber - 1 ) { r = maxToken.longValue(); } Configuration taskConfig = jobConfig.clone(); taskConfig.set(Key.MIN_TOKEN,String.valueOf(l)); taskConfig.set(Key.MAX_TOKEN,String.valueOf(r)); splittedConfigs.add(taskConfig); } } else { splittedConfigs.add(jobConfig); } return splittedConfigs; } public static String getQueryString(Configuration taskConfig,Cluster cluster) { List columnMeta = taskConfig.getList(Key.COLUMN,String.class); String keyspace = taskConfig.getString(Key.KEYSPACE); String table = taskConfig.getString(Key.TABLE); StringBuilder columns = new StringBuilder(); for( String column : columnMeta ) { if(columns.length() > 0 ) { columns.append(","); } columns.append(column); } StringBuilder where = new StringBuilder(); String whereString = taskConfig.getString(Key.WHERE); if( whereString != null && !whereString.isEmpty() ) { where.append(whereString); } String minToken = taskConfig.getString(Key.MIN_TOKEN); String maxToken = taskConfig.getString(Key.MAX_TOKEN); if( minToken !=null || maxToken !=null ) { LOG.info("range:" + minToken + "~" + maxToken); List pks = cluster.getMetadata().getKeyspace(keyspace).getTable(table).getPartitionKey(); StringBuilder sb = new StringBuilder(); for( ColumnMetadata pk : pks ) { if( sb.length() > 0 ) { sb.append(","); } sb.append(pk.getName()); } String s = sb.toString(); if (minToken != null && !minToken.isEmpty()) { if( where.length() > 0 ){ where.append(" AND "); } where.append("token(").append(s).append(")").append(" > ").append(minToken); } if (maxToken != null && !maxToken.isEmpty()) { if( where.length() > 0 ){ where.append(" AND "); } where.append("token(").append(s).append(")").append(" <= ").append(maxToken); } } boolean allowFiltering = taskConfig.getBool(Key.ALLOW_FILTERING,false); StringBuilder select = new StringBuilder(); select.append("SELECT ").append(columns.toString()).append(" FROM ").append(table); if( where.length() > 0 ){ select.append(" where ").append(where.toString()); } if( allowFiltering ) { select.append(" ALLOW FILTERING"); } select.append(";"); return select.toString(); } public static void checkConfig(Configuration jobConfig,Cluster cluster) { ensureStringExists(jobConfig,Key.HOST); ensureStringExists(jobConfig,Key.KEYSPACE); ensureStringExists(jobConfig,Key.TABLE); ensureExists(jobConfig,Key.COLUMN); ///keyspace,table是否存在 String keyspace = jobConfig.getString(Key.KEYSPACE); if( cluster.getMetadata().getKeyspace(keyspace) == null ) { throw DataXException .asDataXException( CassandraReaderErrorCode.CONF_ERROR, String.format( "配置信息有错误.keyspace'%s'不存在 .", keyspace)); } String table = jobConfig.getString(Key.TABLE); TableMetadata tableMetadata = cluster.getMetadata().getKeyspace(keyspace).getTable(table); if( tableMetadata == null ) { throw DataXException .asDataXException( CassandraReaderErrorCode.CONF_ERROR, String.format( "配置信息有错误.表'%s'不存在 .", table)); } List columns = jobConfig.getList(Key.COLUMN,String.class); for( String name : columns ) { if( name == null || name.isEmpty() ) { throw DataXException .asDataXException( CassandraReaderErrorCode.CONF_ERROR, String.format( "配置信息有错误.列信息中需要包含'%s'字段 .",Key.COLUMN_NAME)); } } } static void ensureExists(Configuration jobConfig,String keyword) { if( jobConfig.get(keyword) == null ) { throw DataXException .asDataXException( CassandraReaderErrorCode.CONF_ERROR, String.format( "配置信息有错误.参数'%s'为必填项 .", keyword)); } } static void ensureStringExists(Configuration jobConfig,String keyword) { ensureExists(jobConfig,keyword); if( jobConfig.getString(keyword).isEmpty() ) { throw DataXException .asDataXException( CassandraReaderErrorCode.CONF_ERROR, String.format( "配置信息有错误.参数'%s'不能为空 .", keyword)); } } } ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/Key.java ================================================ package com.alibaba.datax.plugin.reader.cassandrareader; /** * Created by mazhenlin on 2019/8/19. */ public class Key { public final static String USERNAME = "username"; public final static String PASSWORD = "password"; public final static String HOST = "host"; public final static String PORT = "port"; public final static String USESSL = "useSSL"; public final static String KEYSPACE = "keyspace"; public final static String TABLE = "table"; public final static String COLUMN = "column"; public final static String WHERE = "where"; public final static String ALLOW_FILTERING = "allowFiltering"; public final static String CONSITANCY_LEVEL = "consistancyLevel"; public final static String MIN_TOKEN = "minToken"; public final static String MAX_TOKEN = "maxToken"; /** * 每个列的名字 */ public static final String COLUMN_NAME = "name"; /** * 列分隔符 */ public static final String COLUMN_SPLITTER = "format"; public static final String WRITE_TIME = "writetime("; public static final String ELEMENT_SPLITTER = "splitter"; public static final String ENTRY_SPLITTER = "entrySplitter"; public static final String KV_SPLITTER = "kvSplitter"; public static final String ELEMENT_CONFIG = "element"; public static final String TUPLE_CONNECTOR = "_"; public static final String KEY_CONFIG = "key"; public static final String VALUE_CONFIG = "value"; } ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/LocalStrings.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/LocalStrings_en_US.properties ================================================ ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/LocalStrings_ja_JP.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/LocalStrings_zh_CN.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/LocalStrings_zh_HK.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF ================================================ FILE: cassandrareader/src/main/java/com/alibaba/datax/plugin/reader/cassandrareader/LocalStrings_zh_TW.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF ================================================ FILE: cassandrareader/src/main/resources/plugin.json ================================================ { "name": "cassandrareader", "class": "com.alibaba.datax.plugin.reader.cassandrareader.CassandraReader", "description": "useScene: prod. mechanism: execute select cql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: cassandrareader/src/main/resources/plugin_job_template.json ================================================ { "name": "cassandrareader", "parameter": { "username": "", "password": "", "host": "", "port": "", "useSSL": false, "keyspace": "", "table": "", "column": [ "c1","c2","c3" ] } } ================================================ FILE: cassandrawriter/doc/cassandrawriter.md ================================================ # CassandraWriter 插件文档 ___ ## 1 快速介绍 CassandraWriter插件实现了向Cassandra写入数据。在底层实现上,CassandraWriter通过datastax的java driver连接Cassandra实例,并执行相应的cql语句将数据写入cassandra中。 ## 2 实现原理 简而言之,CassandraWriter通过java driver连接到Cassandra实例,并根据用户配置的信息生成INSERT CQL语句,然后发送到Cassandra。 对于用户配置Table、Column的信息,CassandraReader将其拼接为CQL语句发送到Cassandra。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从内存产生到Cassandra导入的作业: ``` { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ {"value":"name","type": "string"}, {"value":"false","type":"bool"}, {"value":"1988-08-08 08:08:08","type":"date"}, {"value":"addr","type":"bytes"}, {"value":1.234,"type":"double"}, {"value":12345678,"type":"long"}, {"value":2.345,"type":"double"}, {"value":3456789,"type":"long"}, {"value":"4a0ef8c0-4d97-11d0-db82-ebecdb03ffa5","type":"string"}, {"value":"value","type":"bytes"}, {"value":"-838383838,37377373,-383883838,27272772,393993939,-38383883,83883838,-1350403181,817650816,1630642337,251398784,-622020148","type":"string"}, ], "sliceRecordCount": 10000000 } }, "writer": { "name": "cassandrawriter", "parameter": { "host": "localhost", "port": 9042, "useSSL": false, "keyspace": "stresscql", "table": "dst", "batchSize":10, "column": [ "name", "choice", "date", "address", "dbl", "lval", "fval", "ival", "uid", "value", "listval" ] } } } ] } } ``` ### 3.2 参数说明 * **host** * 描述:Cassandra连接点的域名或ip,多个node之间用逗号分隔。
* 必选:是
* 默认值:无
* **port** * 描述:Cassandra端口。
* 必选:是
* 默认值:9042
* **username** * 描述:数据源的用户名
* 必选:否
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:否
* 默认值:无
* **useSSL** * 描述:是否使用SSL连接。
* 必选:否
* 默认值:false
* **connectionsPerHost** * 描述:客户端连接池配置:与服务器每个节点建多少个连接。
* 必选:否
* 默认值:8
* **maxPendingPerConnection** * 描述:客户端连接池配置:每个连接最大请求数。
* 必选:否
* 默认值:128
* **keyspace** * 描述:需要同步的表所在的keyspace。
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要同步的表。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列集合。
内容可以是列的名称或"writetime()"。如果将列名配置为writetime(),会将这一列的内容作为时间戳。 * 必选:是
* 默认值:无
* **consistancyLevel** * 描述:数据一致性级别。可选ONE|QUORUM|LOCAL_QUORUM|EACH_QUORUM|ALL|ANY|TWO|THREE|LOCAL_ONE
* 必选:否
* 默认值:LOCAL_QUORUM
* **batchSize** * 描述:一次批量提交(UNLOGGED BATCH)的记录数大小(条数)。注意batch的大小有如下限制:
(1)不能超过65535。
(2) batch中的内容大小受到服务器端batch_size_fail_threshold_in_kb的限制。
(3) 如果batch中的内容超过了batch_size_warn_threshold_in_kb的限制,会打出warn日志,但并不影响写入,忽略即可。
如果批量提交失败,会把这个批量的所有内容重新逐条写入一遍。 * 必选:否
* 默认值:1
### 3.3 类型转换 目前CassandraReader支持除counter和Custom类型之外的所有类型。 下面列出CassandraReader针对Cassandra类型转换列表: | DataX 内部类型| Cassandra 数据类型 | | -------- | ----- | | Long |int, tinyint, smallint,varint,bigint,time| | Double |float, double, decimal| | String |ascii,varchar, text,uuid,timeuuid,duration,list,map,set,tuple,udt,inet | | Date |date, timestamp | | Boolean |bool | | Bytes |blob | 请注意: * 目前不支持counter类型和custom类型。 ## 4 性能报告 略 ## 5 约束限制 ### 5.1 主备同步数据恢复问题 略 ## 6 FAQ ================================================ FILE: cassandrawriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 cassandrawriter cassandrawriter 0.0.1-SNAPSHOT jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.datastax.cassandra cassandra-driver-core 3.7.2 commons-codec commons-codec 1.9 junit junit test com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-service-face org.apache.hadoop hadoop-common org.apache.hive hive-exec org.apache.hive hive-serde javolution javolution test org.mockito mockito-all 1.9.5 test src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: cassandrawriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/cassandrawriter target/ cassandrawriter-0.0.1-SNAPSHOT.jar plugin/writer/cassandrawriter false plugin/writer/cassandrawriter/libs runtime ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/CassandraWriter.java ================================================ package com.alibaba.datax.plugin.writer.cassandrawriter; import java.util.ArrayList; import java.util.List; import java.util.concurrent.TimeUnit; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.datastax.driver.core.BatchStatement; import com.datastax.driver.core.BatchStatement.Type; import com.datastax.driver.core.BoundStatement; import com.datastax.driver.core.Cluster; import com.datastax.driver.core.ColumnMetadata; import com.datastax.driver.core.ConsistencyLevel; import com.datastax.driver.core.DataType; import com.datastax.driver.core.HostDistance; import com.datastax.driver.core.PoolingOptions; import com.datastax.driver.core.PreparedStatement; import com.datastax.driver.core.ResultSetFuture; import com.datastax.driver.core.Session; import com.datastax.driver.core.TableMetadata; import com.datastax.driver.core.querybuilder.Insert; import com.datastax.driver.core.querybuilder.QueryBuilder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import static com.datastax.driver.core.querybuilder.QueryBuilder.timestamp; /** * Created by mazhenlin on 2019/8/19. */ public class CassandraWriter extends Writer { private static final Logger LOG = LoggerFactory .getLogger(CassandraWriter.class); public static class Job extends Writer.Job { private Configuration originalConfig = null; @Override public List split(int mandatoryNumber) { List splitResultConfigs = new ArrayList(); for (int j = 0; j < mandatoryNumber; j++) { splitResultConfigs.add(originalConfig.clone()); } return splitResultConfigs; } @Override public void init() { originalConfig = getPluginJobConf(); } @Override public void destroy() { } } public static class Task extends Writer.Task { private Configuration taskConfig; private Cluster cluster = null; private Session session = null; private PreparedStatement statement = null; private int columnNumber = 0; private List columnTypes; private List columnMeta = null; private int writeTimeCol = -1; private boolean asyncWrite = false; private long batchSize = 1; private List unConfirmedWrite; private List bufferedWrite; @Override public void startWrite(RecordReceiver lineReceiver) { try { Record record; while ((record = lineReceiver.getFromReader()) != null) { if (record.getColumnNumber() != columnNumber) { // 源头读取字段列数与目的表字段写入列数不相等,直接报错 throw DataXException .asDataXException( CassandraWriterErrorCode.CONF_ERROR, String.format( "列配置信息有错误. 因为您配置的任务中,源头读取字段数:%s 与 目的表要写入的字段数:%s 不相等. 请检查您的配置并作出修改.", record.getColumnNumber(), this.columnNumber)); } BoundStatement boundStmt = statement.bind(); for (int i = 0; i < columnNumber; i++) { if( writeTimeCol != -1 && i == writeTimeCol ) { continue; } Column col = record.getColumn(i); int pos = i; if( writeTimeCol != -1 && pos > writeTimeCol ) { pos = i - 1; } CassandraWriterHelper.setupColumn(boundStmt,pos,columnTypes.get(pos),col); } if(writeTimeCol != -1) { Column col = record.getColumn(writeTimeCol ); boundStmt.setLong(columnNumber - 1,col.asLong()); } if( batchSize <= 1 ) { session.execute(boundStmt); } else { if( asyncWrite ) { unConfirmedWrite.add(session.executeAsync(boundStmt)); if (unConfirmedWrite.size() >= batchSize) { for (ResultSetFuture write : unConfirmedWrite) { write.getUninterruptibly(10000, TimeUnit.MILLISECONDS); } unConfirmedWrite.clear(); } } else { bufferedWrite.add(boundStmt); if( bufferedWrite.size() >= batchSize ) { BatchStatement batchStatement = new BatchStatement(Type.UNLOGGED); batchStatement.addAll(bufferedWrite); try { session.execute(batchStatement); } catch (Exception e ) { LOG.error("batch写入失败,尝试逐条写入.",e); for( BoundStatement stmt: bufferedWrite ) { session.execute(stmt); } } ///LOG.info("batch finished. size = " + bufferedWrite.size()); bufferedWrite.clear(); } } } } if( unConfirmedWrite != null && unConfirmedWrite.size() > 0 ) { for( ResultSetFuture write : unConfirmedWrite ) { write.getUninterruptibly(10000, TimeUnit.MILLISECONDS); } unConfirmedWrite.clear(); } if( bufferedWrite !=null && bufferedWrite.size() > 0 ) { BatchStatement batchStatement = new BatchStatement(Type.UNLOGGED); batchStatement.addAll(bufferedWrite); session.execute(batchStatement); bufferedWrite.clear(); } } catch (Exception e) { throw DataXException.asDataXException( CassandraWriterErrorCode.WRITE_DATA_ERROR, e); } } @Override public void init() { this.taskConfig = super.getPluginJobConf(); String username = taskConfig.getString(Key.USERNAME); String password = taskConfig.getString(Key.PASSWORD); String hosts = taskConfig.getString(Key.HOST); Integer port = taskConfig.getInt(Key.PORT,9042); boolean useSSL = taskConfig.getBool(Key.USESSL); String keyspace = taskConfig.getString(Key.KEYSPACE); String table = taskConfig.getString(Key.TABLE); batchSize = taskConfig.getLong(Key.BATCH_SIZE,1); this.columnMeta = taskConfig.getList(Key.COLUMN,String.class); columnTypes = new ArrayList(columnMeta.size()); columnNumber = columnMeta.size(); asyncWrite = taskConfig.getBool(Key.ASYNC_WRITE,false); int connectionsPerHost = taskConfig.getInt(Key.CONNECTIONS_PER_HOST,8); int maxPendingPerConnection = taskConfig.getInt(Key.MAX_PENDING_CONNECTION,128); PoolingOptions poolingOpts = new PoolingOptions() .setConnectionsPerHost(HostDistance.LOCAL, connectionsPerHost, connectionsPerHost) .setMaxRequestsPerConnection(HostDistance.LOCAL, maxPendingPerConnection) .setNewConnectionThreshold(HostDistance.LOCAL, 100); Cluster.Builder clusterBuilder = Cluster.builder().withPoolingOptions(poolingOpts); if ((username != null) && !username.isEmpty()) { clusterBuilder = clusterBuilder.withCredentials(username, password) .withPort(Integer.valueOf(port)).addContactPoints(hosts.split(",")); if (useSSL) { clusterBuilder = clusterBuilder.withSSL(); } } else { clusterBuilder = clusterBuilder.withPort(Integer.valueOf(port)) .addContactPoints(hosts.split(",")); } cluster = clusterBuilder.build(); session = cluster.connect(keyspace); TableMetadata meta = cluster.getMetadata().getKeyspace(keyspace).getTable(table); Insert insertStmt = QueryBuilder.insertInto(table); for( String colunmnName : columnMeta ) { if( colunmnName.toLowerCase().equals(Key.WRITE_TIME) ) { if( writeTimeCol != -1 ) { throw DataXException .asDataXException( CassandraWriterErrorCode.CONF_ERROR, "列配置信息有错误. 只能有一个时间戳列(writetime())"); } writeTimeCol = columnTypes.size(); continue; } insertStmt.value(colunmnName,QueryBuilder.bindMarker()); ColumnMetadata col = meta.getColumn(colunmnName); if( col == null ) { throw DataXException .asDataXException( CassandraWriterErrorCode.CONF_ERROR, String.format( "列配置信息有错误. 表中未找到列名 '%s' .", colunmnName)); } columnTypes.add(col.getType()); } if(writeTimeCol != -1) { insertStmt.using(timestamp(QueryBuilder.bindMarker())); } String cl = taskConfig.getString(Key.CONSITANCY_LEVEL); if( cl != null && !cl.isEmpty() ) { insertStmt.setConsistencyLevel(ConsistencyLevel.valueOf(cl)); } else { insertStmt.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM); } statement = session.prepare(insertStmt); if( batchSize > 1 ) { if( asyncWrite ) { unConfirmedWrite = new ArrayList(); } else { bufferedWrite = new ArrayList(); } } } @Override public void destroy() { } } } ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/CassandraWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.cassandrawriter; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by mazhenlin on 2019/8/19. */ public enum CassandraWriterErrorCode implements ErrorCode { CONF_ERROR("CassandraWriter-00", "配置错误."), WRITE_DATA_ERROR("CassandraWriter-01", "写入数据时失败."), ; private final String code; private final String description; private CassandraWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/CassandraWriterHelper.java ================================================ package com.alibaba.datax.plugin.writer.cassandrawriter; import java.math.BigDecimal; import java.math.BigInteger; import java.net.InetAddress; import java.nio.ByteBuffer; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Arrays; import java.util.Date; import java.util.HashMap; import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.Set; import java.util.UUID; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONException; import com.alibaba.fastjson2.JSONObject; import com.datastax.driver.core.BoundStatement; import com.datastax.driver.core.CodecRegistry; import com.datastax.driver.core.DataType; import com.datastax.driver.core.DataType.Name; import com.datastax.driver.core.Duration; import com.datastax.driver.core.LocalDate; import com.datastax.driver.core.TupleType; import com.datastax.driver.core.TupleValue; import com.datastax.driver.core.UDTValue; import com.datastax.driver.core.UserType; import com.datastax.driver.core.UserType.Field; import com.google.common.base.Splitter; import org.apache.commons.codec.binary.Base64; /** * Created by mazhenlin on 2019/8/21. */ public class CassandraWriterHelper { static CodecRegistry registry = new CodecRegistry(); public static Object parseFromString(String s, DataType sqlType ) throws Exception { if (s == null || s.isEmpty()) { if (sqlType.getName() == Name.ASCII || sqlType.getName() == Name.TEXT || sqlType.getName() == Name.VARCHAR) { return s; } else { return null; } } switch (sqlType.getName()) { case ASCII: case TEXT: case VARCHAR: return s; case BLOB: if (s.length() == 0) { return new byte[0]; } byte[] byteArray = new byte[s.length() / 2]; for (int i = 0; i < byteArray.length; i++) { String subStr = s.substring(2 * i, 2 * i + 2); byteArray[i] = ((byte) Integer.parseInt(subStr, 16)); } return ByteBuffer.wrap(byteArray); case BOOLEAN: return Boolean.valueOf(s); case TINYINT: return Byte.valueOf(s); case SMALLINT: return Short.valueOf(s); case INT: return Integer.valueOf(s); case BIGINT: return Long.valueOf(s); case VARINT: return new BigInteger(s, 10); case FLOAT: return Float.valueOf(s); case DOUBLE: return Double.valueOf(s); case DECIMAL: return new BigDecimal(s); case DATE: { String[] a = s.split("-"); if (a.length != 3) { throw new Exception(String.format("DATE类型数据 '%s' 格式不正确,必须为yyyy-mm-dd格式", s)); } return LocalDate.fromYearMonthDay(Integer.valueOf(a[0]), Integer.valueOf(a[1]), Integer.valueOf(a[2])); } case TIME: return Long.valueOf(s); case TIMESTAMP: return new Date(Long.valueOf(s)); case UUID: case TIMEUUID: return UUID.fromString(s); case INET: String[] b = s.split("/"); if (b.length < 2) { return InetAddress.getByName(s); } byte[] addr = InetAddress.getByName(b[1]).getAddress(); return InetAddress.getByAddress(b[0], addr); case DURATION: return Duration.from(s); case LIST: case MAP: case SET: case TUPLE: case UDT: Object jsonObject = JSON.parse(s); return parseFromJson(jsonObject,sqlType); default: throw DataXException.asDataXException(CassandraWriterErrorCode.CONF_ERROR, "不支持您配置的列类型:" + sqlType + ", 请检查您的配置 或者 联系 管理员."); } // end switch } public static Object parseFromJson(Object jsonObject,DataType type) throws Exception { if( jsonObject == null ) return null; switch (type.getName()) { case ASCII: case TEXT: case VARCHAR: case BOOLEAN: case TIME: return jsonObject; case TINYINT: return ((Number)jsonObject).byteValue(); case SMALLINT: return ((Number)jsonObject).shortValue(); case INT: return ((Number)jsonObject).intValue(); case BIGINT: return ((Number)jsonObject).longValue(); case VARINT: return new BigInteger(jsonObject.toString()); case FLOAT: return ((Number)jsonObject).floatValue(); case DOUBLE: return ((Number)jsonObject).doubleValue(); case DECIMAL: return new BigDecimal(jsonObject.toString()); case BLOB: return ByteBuffer.wrap(Base64.decodeBase64((String)jsonObject)); case DATE: return LocalDate.fromMillisSinceEpoch(((Number)jsonObject).longValue()); case TIMESTAMP: return new Date(((Number)jsonObject).longValue()); case DURATION: return Duration.from(jsonObject.toString()); case UUID: case TIMEUUID: return UUID.fromString(jsonObject.toString()); case INET: return InetAddress.getByName((String)jsonObject); case LIST: List l = new ArrayList(); for( Object o : (JSONArray)jsonObject ) { l.add(parseFromJson(o,type.getTypeArguments().get(0))); } return l; case MAP: { Map m = new HashMap(); for (Map.Entry e : ((JSONObject)jsonObject).entrySet()) { Object k = parseFromString((String) e.getKey(), type.getTypeArguments().get(0)); Object v = parseFromJson(e.getValue(), type.getTypeArguments().get(1)); m.put(k,v); } return m; } case SET: Set s = new HashSet(); for( Object o : (JSONArray)jsonObject ) { s.add(parseFromJson(o,type.getTypeArguments().get(0))); } return s; case TUPLE: { TupleValue t = ((TupleType) type).newValue(); int j = 0; for (Object e : (JSONArray)jsonObject) { DataType eleType = ((TupleType) type).getComponentTypes().get(j); t.set(j, parseFromJson(e, eleType), registry.codecFor(eleType).getJavaType()); j++; } return t; } case UDT: { UDTValue t = ((UserType) type).newValue(); UserType userType = t.getType(); for (Map.Entry e : ((JSONObject)jsonObject).entrySet()) { DataType eleType = userType.getFieldType((String)e.getKey()); t.set((String)e.getKey(), parseFromJson(e.getValue(), eleType), registry.codecFor(eleType).getJavaType()); } return t; } } return null; } public static void setupColumn(BoundStatement ps, int pos, DataType sqlType, Column col) throws Exception { if (col.getRawData() != null) { switch (sqlType.getName()) { case ASCII: case TEXT: case VARCHAR: ps.setString(pos, col.asString()); break; case BLOB: ps.setBytes(pos, ByteBuffer.wrap(col.asBytes())); break; case BOOLEAN: ps.setBool(pos, col.asBoolean()); break; case TINYINT: ps.setByte(pos, col.asLong().byteValue()); break; case SMALLINT: ps.setShort(pos, col.asLong().shortValue()); break; case INT: ps.setInt(pos, col.asLong().intValue()); break; case BIGINT: ps.setLong(pos, col.asLong()); break; case VARINT: ps.setVarint(pos, col.asBigInteger()); break; case FLOAT: ps.setFloat(pos, col.asDouble().floatValue()); break; case DOUBLE: ps.setDouble(pos, col.asDouble()); break; case DECIMAL: ps.setDecimal(pos, col.asBigDecimal()); break; case DATE: ps.setDate(pos, LocalDate.fromMillisSinceEpoch(col.asDate().getTime())); break; case TIME: ps.setTime(pos, col.asLong()); break; case TIMESTAMP: ps.setTimestamp(pos, col.asDate()); break; case UUID: case TIMEUUID: ps.setUUID(pos, UUID.fromString(col.asString())); break; case INET: ps.setInet(pos, InetAddress.getByName(col.asString())); break; case DURATION: ps.set(pos, Duration.from(col.asString()), Duration.class); break; case LIST: ps.setList(pos, (List) parseFromString(col.asString(), sqlType)); break; case MAP: ps.setMap(pos, (Map) parseFromString(col.asString(), sqlType)); break; case SET: ps.setSet(pos, (Set) parseFromString(col.asString(), sqlType)); break; case TUPLE: ps.setTupleValue(pos, (TupleValue) parseFromString(col.asString(), sqlType)); break; case UDT: ps.setUDTValue(pos, (UDTValue) parseFromString(col.asString(), sqlType)); break; default: throw DataXException.asDataXException(CassandraWriterErrorCode.CONF_ERROR, "不支持您配置的列类型:" + sqlType + ", 请检查您的配置 或者 联系 管理员."); } // end switch } else { ps.setToNull(pos); } } } ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.cassandrawriter; /** * Created by mazhenlin on 2019/8/19. */ public class Key { public final static String USERNAME = "username"; public final static String PASSWORD = "password"; public final static String HOST = "host"; public final static String PORT = "port"; public final static String USESSL = "useSSL"; public final static String KEYSPACE = "keyspace"; public final static String TABLE = "table"; public final static String COLUMN = "column"; public final static String WRITE_TIME = "writetime()"; public final static String ASYNC_WRITE = "asyncWrite"; public final static String CONSITANCY_LEVEL = "consistancyLevel"; public final static String CONNECTIONS_PER_HOST = "connectionsPerHost"; public final static String MAX_PENDING_CONNECTION = "maxPendingPerConnection"; /** * 异步写入的批次大小,默认1(不异步写入) */ public final static String BATCH_SIZE = "batchSize"; /** * 每个列的名字 */ public static final String COLUMN_NAME = "name"; /** * 列分隔符 */ public static final String COLUMN_SPLITTER = "format"; public static final String ELEMENT_SPLITTER = "splitter"; public static final String ENTRY_SPLITTER = "entrySplitter"; public static final String KV_SPLITTER = "kvSplitter"; public static final String ELEMENT_CONFIG = "element"; public static final String TUPLE_CONNECTOR = "_"; public static final String KEY_CONFIG = "key"; public static final String VALUE_CONFIG = "value"; } ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/LocalStrings.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF. errorcode.write_failed_exception=\u5199\u5165\u6570\u636E\u65F6\u5931\u8D25 ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/LocalStrings_en_US.properties ================================================ errorcode.config_invalid_exception=Error in parameter configuration. errorcode.write_failed_exception=\u5199\u5165\u6570\u636E\u65F6\u5931\u8D25 ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/LocalStrings_ja_JP.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF. errorcode.write_failed_exception=\u5199\u5165\u6570\u636E\u65F6\u5931\u8D25 ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/LocalStrings_zh_CN.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF. errorcode.write_failed_exception=\u5199\u5165\u6570\u636E\u65F6\u5931\u8D25 ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/LocalStrings_zh_HK.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF. errorcode.write_failed_exception=\u5199\u5165\u6570\u636E\u65F6\u5931\u8D25 ================================================ FILE: cassandrawriter/src/main/java/com/alibaba/datax/plugin/writer/cassandrawriter/LocalStrings_zh_TW.properties ================================================ errorcode.config_invalid_exception=\u914D\u7F6E\u9519\u8BEF. errorcode.write_failed_exception=\u5199\u5165\u6570\u636E\u65F6\u5931\u8D25 ================================================ FILE: cassandrawriter/src/main/resources/plugin.json ================================================ { "name": "cassandrawriter", "class": "com.alibaba.datax.plugin.writer.cassandrawriter.CassandraWriter", "description": "useScene: prod. mechanism: use datax driver, execute insert sql.", "developer": "alibaba" } ================================================ FILE: cassandrawriter/src/main/resources/plugin_job_template.json ================================================ { "name": "cassandrawriter", "parameter": { "username": "", "password": "", "host": "", "port": "", "useSSL": false, "keyspace": "", "table": "", "column": [ "c1","c2","c3" ] } } ================================================ FILE: clickhousereader/doc/clickhousereader.md ================================================ # ClickhouseReader 插件文档 ___ ## 1 快速介绍 ClickhouseReader插件实现了从Clickhouse读取数据。在底层实现上,ClickhouseReader通过JDBC连接远程Clickhouse数据库,并执行相应的sql语句将数据从Clickhouse库中SELECT出来。 ## 2 实现原理 简而言之,ClickhouseReader通过JDBC连接器连接到远程的Clickhouse数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程Clickhouse数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,ClickhouseReader将其拼接为SQL语句发送到Clickhouse数据库;对于用户配置querySql信息,Clickhouse直接将其发送到Clickhouse数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Clickhouse数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度 byte/s 尽量逼近这个速度但是不高于它. // channel 表示通道数量,byte表示通道速度,如果单通道速度1MB,配置byte为1048576表示一个channel "byte": 1048576 }, //出错限制 "errorLimit": { //先选择record "record": 0, //百分比 1表示100% "percentage": 0.02 } }, "content": [ { "reader": { "name": "clickhousereader", "parameter": { // 数据库连接用户名 "username": "root", // 数据库连接密码 "password": "root", "column": [ "id","name" ], "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:clickhouse://[HOST_NAME]:PORT/[DATABASE_NAME]" ] } ] } }, "writer": { //writer类型 "name": "streamwriter", // 是否打印内容 "parameter": { "print": true } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": { "name": "clickhousereader", "parameter": { "username": "root", "password": "root", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10" ], "jdbcUrl": [ "jdbc:clickhouse://1.1.1.1:8123/default" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "visible": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,ClickhouseReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,ClickhouseReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 jdbcUrl按照Clickhouse官方规范,并可以填写连接附件控制信息。具体请参看[Clickhouse官方文档](https://clickhouse.com/docs/en/engines/table-engines/integrations/jdbc)。 * 必选:是
* 默认值:无
* **username** * 描述:数据源的用户名
* 必选:是
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,ClickhouseReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照JSON格式: ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] id为普通列名,\`table\`为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 Column必须显示填写,不允许为空! * 必选:是
* 默认值:无
* **splitPk** * 描述:ClickhouseReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形数据切分,`不支持浮点、日期等其他类型`。如果用户指定其他非支持类型,ClickhouseReader将报错! splitPk如果不填写,将视作用户不对单表进行切分,ClickhouseReader使用单通道同步全量数据。 * 必选:否
* 默认值:无
* **where** * 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
where条件可以有效地进行业务增量同步。 * 必选:否
* 默认值:无
* **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
`当用户配置querySql时,ClickhouseReader直接忽略table、column、where条件的配置`。 * 必选:否
* 默认值:无
* **fetchSize** * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
`注意,该值过大(>2048)可能造成DataX进程OOM。`。 * 必选:否
* 默认值:1024
* **session** * 描述:控制写入数据的时间格式,时区等的配置,如果表中有时间字段,配置该值以明确告知写入 clickhouse 的时间格式。通常配置的参数为:NLS_DATE_FORMAT,NLS_TIME_FORMAT。其配置的值为 json 格式,例如: ``` "session": [ "alter session set NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'", "alter session set NLS_TIMESTAMP_FORMAT='yyyy-mm-dd hh24:mi:ss'", "alter session set NLS_TIMESTAMP_TZ_FORMAT='yyyy-mm-dd hh24:mi:ss'", "alter session set TIME_ZONE='US/Pacific'" ] ``` `(注意"是 " 的转义字符串)`。 * 必选:否
* 默认值:无
### 3.3 类型转换 目前ClickhouseReader支持大部分Clickhouse类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出ClickhouseReader针对Clickhouse类型转换列表: | DataX 内部类型| Clickhouse 数据类型 | | -------- |--------------------------------------------------------------------------------------------| | Long | UInt8, UInt16, UInt32, UInt64, UInt128, UInt256, Int8, Int16, Int32, Int64, Int128, Int256 | | Double | Float32, Float64, Decimal | | String | String, FixedString | | Date | DATE, Date32, DateTime, DateTime64 | | Boolean | Boolean | | Bytes | BLOB,BFILE,RAW,LONG RAW | 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 为了模拟线上真实数据,我们设计两个Clickhouse数据表,分别为: #### 4.1.2 机器参数 * 执行DataX的机器参数为: * Clickhouse数据库机器参数为: ### 4.2 测试报告 #### 4.2.1 表1测试报告 | 并发任务数| DataX速度(Rec/s)|DataX流量|网卡流量|DataX运行负载|DB运行负载| |--------| --------|--------|--------|--------|--------| |1| DataX 统计速度(Rec/s)|DataX统计流量|网卡流量|DataX运行负载|DB运行负载| ## 5 约束限制 ### 5.1 主备同步数据恢复问题 主备同步问题指Clickhouse使用主从灾备,备库从主库不间断通过binlog恢复数据。由于主备数据同步存在一定的时间差,特别在于某些特定情况,例如网络延迟等问题,导致备库同步恢复的数据与主库有较大差别,导致从备库同步的数据不是一份当前时间的完整镜像。 针对这个问题,我们提供了preSql功能,该功能待补充。 ### 5.2 一致性约束 Clickhouse在数据存储划分中属于RDBMS系统,对外可以提供强一致性数据查询接口。例如当一次同步任务启动运行过程中,当该库存在其他数据写入方写入数据时,ClickhouseReader完全不会获取到写入更新数据,这是由于数据库本身的快照特性决定的。关于数据库快照特性,请参看[MVCC Wikipedia](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) 上述是在ClickhouseReader单线程模型下数据同步一致性的特性,由于ClickhouseReader可以根据用户配置信息使用了并发数据抽取,因此不能严格保证数据一致性:当ClickhouseReader根据splitPk进行数据切分后,会先后启动多个并发任务完成数据同步。由于多个并发任务相互之间不属于同一个读事务,同时多个并发任务存在时间间隔。因此这份数据并不是`完整的`、`一致的`数据快照信息。 针对多线程的一致性快照需求,在技术上目前无法实现,只能从工程角度解决,工程化的方式存在取舍,我们提供几个解决思路给用户,用户可以自行选择: 1. 使用单线程同步,即不再进行数据切片。缺点是速度比较慢,但是能够很好保证一致性。 2. 关闭其他数据写入方,保证当前数据为静态数据,例如,锁表、关闭备库同步等等。缺点是可能影响在线业务。 ### 5.3 数据库编码问题 ClickhouseReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此ClickhouseReader不需用户指定编码,可以自动获取编码并转码。 对于Clickhouse底层写入编码和其设定的编码不一致的混乱情况,ClickhouseReader对此无法识别,对此也无法提供解决方案,对于这类情况,`导出有可能为乱码`。 ### 5.4 增量数据同步 ClickhouseReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种: * 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,ClickhouseReader只需要WHERE条件跟上一同步阶段时间戳即可。 * 对于新增流水型数据,ClickhouseReader可以WHERE条件后跟上一阶段最大自增ID即可。 对于业务上无字段区分新增、修改数据情况,ClickhouseReader也无法进行增量数据同步,只能同步全量数据。 ### 5.5 Sql安全性 ClickhouseReader提供querySql语句交给用户自己实现SELECT抽取语句,ClickhouseReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。 ## 6 FAQ *** **Q: ClickhouseReader同步报错,报错信息为XXX** A: 网络或者权限问题,请使用Clickhouse命令行测试 如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 **Q: ClickhouseReader抽取速度很慢怎么办?** A: 影响抽取时间的原因大概有如下几个:(来自专业 DBA 卫绾) 1. 由于SQL的plan异常,导致的抽取时间长; 在抽取时,尽可能使用全表扫描代替索引扫描; 2. 合理sql的并发度,减少抽取时间; 3. 抽取sql要简单,尽量不用replace等函数,这个非常消耗cpu,会严重影响抽取速度; ================================================ FILE: clickhousereader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 clickhousereader clickhousereader jar ru.yandex.clickhouse clickhouse-jdbc 0.2.4 com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-common ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: clickhousereader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/clickhousereader target/ clickhousereader-0.0.1-SNAPSHOT.jar plugin/reader/clickhousereader false plugin/reader/clickhousereader/libs runtime ================================================ FILE: clickhousereader/src/main/java/com/alibaba/datax/plugin/reader/clickhousereader/ClickhouseReader.java ================================================ package com.alibaba.datax.plugin.reader.clickhousereader; import java.sql.Array; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; import java.sql.Types; import java.util.List; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.fastjson2.JSON; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class ClickhouseReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.ClickHouse; private static final Logger LOG = LoggerFactory.getLogger(ClickhouseReader.class); public static class Job extends Reader.Job { private Configuration jobConfig = null; private CommonRdbmsReader.Job commonRdbmsReaderMaster; @Override public void init() { this.jobConfig = super.getPluginJobConf(); this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); this.commonRdbmsReaderMaster.init(this.jobConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsReaderMaster.split(this.jobConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsReaderMaster.post(this.jobConfig); } @Override public void destroy() { this.commonRdbmsReaderMaster.destroy(this.jobConfig); } } public static class Task extends Reader.Task { private Configuration jobConfig; private CommonRdbmsReader.Task commonRdbmsReaderSlave; @Override public void init() { this.jobConfig = super.getPluginJobConf(); this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderSlave.init(this.jobConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.jobConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, 1000); this.commonRdbmsReaderSlave.startRead(this.jobConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderSlave.post(this.jobConfig); } @Override public void destroy() { this.commonRdbmsReaderSlave.destroy(this.jobConfig); } } } ================================================ FILE: clickhousereader/src/main/resources/plugin.json ================================================ { "name": "clickhousereader", "class": "com.alibaba.datax.plugin.reader.clickhousereader.ClickhouseReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql.", "developer": "alibaba" } ================================================ FILE: clickhousereader/src/main/resources/plugin_job_template.json ================================================ { "name": "clickhousereader", "parameter": { "username": "username", "password": "password", "column": ["col1", "col2", "col3"], "connection": [ { "jdbcUrl": "jdbc:clickhouse://:[/]", "table": ["table1", "table2"] } ], "preSql": [], "postSql": [] } } ================================================ FILE: clickhousereader/src/test/resources/basic1.json ================================================ { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": { "name": "clickhousereader", "parameter": { "username": "XXXX", "password": "XXXX", "column": [ "uint8_col", "uint16_col", "uint32_col", "uint64_col", "int8_col", "int16_col", "int32_col", "int64_col", "float32_col", "float64_col", "bool_col", "str_col", "fixedstr_col", "uuid_col", "date_col", "datetime_col", "enum_col", "ary_uint8_col", "ary_str_col", "tuple_col", "nullable_col", "nested_col.nested_id", "nested_col.nested_str", "ipv4_col", "ipv6_col", "decimal_col" ], "connection": [ { "table": [ "all_type_tbl" ], "jdbcUrl":["jdbc:clickhouse://XXXX:8123/default"] } ] } }, "writer": {} } ] } } ================================================ FILE: clickhousereader/src/test/resources/basic1.sql ================================================ CREATE TABLE IF NOT EXISTS default.all_type_tbl ( `uint8_col` UInt8, `uint16_col` UInt16, uint32_col UInt32, uint64_col UInt64, int8_col Int8, int16_col Int16, int32_col Int32, int64_col Int64, float32_col Float32, float64_col Float64, bool_col UInt8, str_col String, fixedstr_col FixedString(3), uuid_col UUID, date_col Date, datetime_col DateTime, enum_col Enum('hello' = 1, 'world' = 2), ary_uint8_col Array(UInt8), ary_str_col Array(String), tuple_col Tuple(UInt8, String), nullable_col Nullable(UInt8), nested_col Nested ( nested_id UInt32, nested_str String ), ipv4_col IPv4, ipv6_col IPv6, decimal_col Decimal(5,3) ) ENGINE = MergeTree() ORDER BY (uint8_col); ================================================ FILE: clickhousewriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 clickhousewriter clickhousewriter jar ru.yandex.clickhouse clickhouse-jdbc 0.2.4 com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-common ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: clickhousewriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/clickhousewriter target/ clickhousewriter-0.0.1-SNAPSHOT.jar plugin/writer/clickhousewriter false plugin/writer/clickhousewriter/libs runtime ================================================ FILE: clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriter.java ================================================ package com.alibaba.datax.plugin.writer.clickhousewriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import java.sql.Array; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Timestamp; import java.sql.Types; import java.util.List; import java.util.regex.Pattern; public class ClickhouseWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.ClickHouse; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterMaster.init(this.originalConfig); } @Override public void prepare() { this.commonRdbmsWriterMaster.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterMaster.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterSlave; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE) { @Override protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException { try { if (column.getRawData() == null) { preparedStatement.setNull(columnIndex + 1, columnSqltype); return preparedStatement; } java.util.Date utilDate; switch (columnSqltype) { case Types.CHAR: case Types.NCHAR: case Types.CLOB: case Types.NCLOB: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: preparedStatement.setString(columnIndex + 1, column .asString()); break; case Types.TINYINT: case Types.SMALLINT: case Types.INTEGER: case Types.BIGINT: case Types.DECIMAL: case Types.FLOAT: case Types.REAL: case Types.DOUBLE: String strValue = column.asString(); if (emptyAsNull && "".equals(strValue)) { preparedStatement.setNull(columnIndex + 1, columnSqltype); } else { switch (columnSqltype) { case Types.TINYINT: case Types.SMALLINT: case Types.INTEGER: preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); break; case Types.BIGINT: preparedStatement.setLong(columnIndex + 1, column.asLong()); break; case Types.DECIMAL: preparedStatement.setBigDecimal(columnIndex + 1, column.asBigDecimal()); break; case Types.REAL: case Types.FLOAT: preparedStatement.setFloat(columnIndex + 1, column.asDouble().floatValue()); break; case Types.DOUBLE: preparedStatement.setDouble(columnIndex + 1, column.asDouble()); break; } } break; case Types.DATE: if (this.resultSetMetaData.getRight().get(columnIndex) .equalsIgnoreCase("year")) { if (column.asBigInteger() == null) { preparedStatement.setString(columnIndex + 1, null); } else { preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); } } else { java.sql.Date sqlDate = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "Date 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlDate = new java.sql.Date(utilDate.getTime()); } preparedStatement.setDate(columnIndex + 1, sqlDate); } break; case Types.TIME: java.sql.Time sqlTime = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "Date 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlTime = new java.sql.Time(utilDate.getTime()); } preparedStatement.setTime(columnIndex + 1, sqlTime); break; case Types.TIMESTAMP: Timestamp sqlTimestamp = null; if (column instanceof StringColumn && column.asString() != null) { String timeStampStr = column.asString(); // JAVA TIMESTAMP 类型入参必须是 "2017-07-12 14:39:00.123566" 格式 String pattern = "^\\d+-\\d+-\\d+ \\d+:\\d+:\\d+.\\d+"; boolean isMatch = Pattern.matches(pattern, timeStampStr); if (isMatch) { sqlTimestamp = Timestamp.valueOf(timeStampStr); preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); break; } } try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "Date 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlTimestamp = new Timestamp( utilDate.getTime()); } preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); break; case Types.BINARY: case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: preparedStatement.setBytes(columnIndex + 1, column .asBytes()); break; case Types.BOOLEAN: preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); break; // warn: bit(1) -> Types.BIT 可使用setBoolean // warn: bit(>1) -> Types.VARBINARY 可使用setBytes case Types.BIT: if (this.dataBaseType == DataBaseType.MySql) { Boolean asBoolean = column.asBoolean(); if (asBoolean != null) { preparedStatement.setBoolean(columnIndex + 1, asBoolean); } else { preparedStatement.setNull(columnIndex + 1, Types.BIT); } } else { preparedStatement.setString(columnIndex + 1, column.asString()); } break; default: boolean isHandled = fillPreparedStatementColumnType4CustomType(preparedStatement, columnIndex, columnSqltype, column); if (isHandled) { break; } throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.", this.resultSetMetaData.getLeft() .get(columnIndex), this.resultSetMetaData.getMiddle() .get(columnIndex), this.resultSetMetaData.getRight() .get(columnIndex))); } return preparedStatement; } catch (DataXException e) { // fix类型转换或者溢出失败时,将具体哪一列打印出来 if (e.getErrorCode() == CommonErrorCode.CONVERT_NOT_SUPPORT || e.getErrorCode() == CommonErrorCode.CONVERT_OVER_FLOW) { throw DataXException .asDataXException( e.getErrorCode(), String.format( "类型转化错误. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.", this.resultSetMetaData.getLeft() .get(columnIndex), this.resultSetMetaData.getMiddle() .get(columnIndex), this.resultSetMetaData.getRight() .get(columnIndex))); } else { throw e; } } } private Object toJavaArray(Object val) { if (null == val) { return null; } else if (val instanceof JSONArray) { Object[] valArray = ((JSONArray) val).toArray(); for (int i = 0; i < valArray.length; i++) { valArray[i] = this.toJavaArray(valArray[i]); } return valArray; } else { return val; } } boolean fillPreparedStatementColumnType4CustomType(PreparedStatement ps, int columnIndex, int columnSqltype, Column column) throws SQLException { switch (columnSqltype) { case Types.OTHER: if (this.resultSetMetaData.getRight().get(columnIndex).startsWith("Tuple")) { throw DataXException .asDataXException(ClickhouseWriterErrorCode.TUPLE_NOT_SUPPORTED_ERROR, ClickhouseWriterErrorCode.TUPLE_NOT_SUPPORTED_ERROR.getDescription()); } else { ps.setString(columnIndex + 1, column.asString()); } return true; case Types.ARRAY: Connection conn = ps.getConnection(); List values = JSON.parseArray(column.asString(), Object.class); for (int i = 0; i < values.size(); i++) { values.set(i, this.toJavaArray(values.get(i))); } Array array = conn.createArrayOf("String", values.toArray()); ps.setArray(columnIndex + 1, array); return true; default: break; } return false; } }; this.commonRdbmsWriterSlave.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); } @Override public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterSlave.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); } } } ================================================ FILE: clickhousewriter/src/main/java/com/alibaba/datax/plugin/writer/clickhousewriter/ClickhouseWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.clickhousewriter; import com.alibaba.datax.common.spi.ErrorCode; public enum ClickhouseWriterErrorCode implements ErrorCode { TUPLE_NOT_SUPPORTED_ERROR("ClickhouseWriter-00", "不支持TUPLE类型导入."), ; private final String code; private final String description; private ClickhouseWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: clickhousewriter/src/main/resources/plugin.json ================================================ { "name": "clickhousewriter", "class": "com.alibaba.datax.plugin.writer.clickhousewriter.ClickhouseWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql.", "developer": "alibaba" } ================================================ FILE: clickhousewriter/src/main/resources/plugin_job_template.json ================================================ { "name": "clickhousewriter", "parameter": { "username": "username", "password": "password", "column": ["col1", "col2", "col3"], "connection": [ { "jdbcUrl": "jdbc:clickhouse://:[/]", "table": ["table1", "table2"] } ], "preSql": [], "postSql": [], "batchSize": 65536, "batchByteSize": 134217728, "dryRun": false, "writeMode": "insert" } } ================================================ FILE: common/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT datax-common datax-common jar org.apache.commons commons-lang3 com.alibaba.fastjson2 fastjson2 commons-io commons-io junit junit test org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.httpcomponents httpclient 4.4 test org.apache.httpcomponents fluent-hc 4.4 test org.apache.commons commons-math3 3.1.1 src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} ================================================ FILE: common/src/main/java/com/alibaba/datax/common/base/BaseObject.java ================================================ package com.alibaba.datax.common.base; import org.apache.commons.lang3.builder.EqualsBuilder; import org.apache.commons.lang3.builder.HashCodeBuilder; import org.apache.commons.lang3.builder.ToStringBuilder; import org.apache.commons.lang3.builder.ToStringStyle; public class BaseObject { @Override public int hashCode() { return HashCodeBuilder.reflectionHashCode(this, false); } @Override public boolean equals(Object object) { return EqualsBuilder.reflectionEquals(this, object, false); } @Override public String toString() { return ToStringBuilder.reflectionToString(this, ToStringStyle.MULTI_LINE_STYLE); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/constant/CommonConstant.java ================================================ package com.alibaba.datax.common.constant; public final class CommonConstant { /** * 用于插件对自身 split 的每个 task 标识其使用的资源,以告知core 对 reader/writer split 之后的 task 进行拼接时需要根据资源标签进行更有意义的 shuffle 操作 */ public static String LOAD_BALANCE_RESOURCE_MARK = "loadBalanceResourceMark"; } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/constant/PluginType.java ================================================ package com.alibaba.datax.common.constant; /** * Created by jingxing on 14-8-31. */ public enum PluginType { //pluginType还代表了资源目录,很难扩展,或者说需要足够必要才扩展。先mark Handler(其实和transformer一样),再讨论 READER("reader"), TRANSFORMER("transformer"), WRITER("writer"), HANDLER("handler"); private String pluginType; private PluginType(String pluginType) { this.pluginType = pluginType; } @Override public String toString() { return this.pluginType; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/BoolColumn.java ================================================ package com.alibaba.datax.common.element; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import java.math.BigDecimal; import java.math.BigInteger; import java.util.Date; /** * Created by jingxing on 14-8-24. */ public class BoolColumn extends Column { public BoolColumn(Boolean bool) { super(bool, Column.Type.BOOL, 1); } public BoolColumn(final String data) { this(true); this.validate(data); if (null == data) { this.setRawData(null); this.setByteSize(0); } else { this.setRawData(Boolean.valueOf(data)); this.setByteSize(1); } return; } public BoolColumn() { super(null, Column.Type.BOOL, 1); } @Override public Boolean asBoolean() { if (null == super.getRawData()) { return null; } return (Boolean) super.getRawData(); } @Override public Long asLong() { if (null == this.getRawData()) { return null; } return this.asBoolean() ? 1L : 0L; } @Override public Double asDouble() { if (null == this.getRawData()) { return null; } return this.asBoolean() ? 1.0d : 0.0d; } @Override public String asString() { if (null == super.getRawData()) { return null; } return this.asBoolean() ? "true" : "false"; } @Override public BigInteger asBigInteger() { if (null == this.getRawData()) { return null; } return BigInteger.valueOf(this.asLong()); } @Override public BigDecimal asBigDecimal() { if (null == this.getRawData()) { return null; } return BigDecimal.valueOf(this.asLong()); } @Override public Date asDate() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bool类型不能转为Date ."); } @Override public Date asDate(String dateFormat) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bool类型不能转为Date ."); } @Override public byte[] asBytes() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Boolean类型不能转为Bytes ."); } private void validate(final String data) { if (null == data) { return; } if ("true".equalsIgnoreCase(data) || "false".equalsIgnoreCase(data)) { return; } throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[%s]不能转为Bool .", data)); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/BytesColumn.java ================================================ package com.alibaba.datax.common.element; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang3.ArrayUtils; import java.math.BigDecimal; import java.math.BigInteger; import java.util.Date; /** * Created by jingxing on 14-8-24. */ public class BytesColumn extends Column { public BytesColumn() { this(null); } public BytesColumn(byte[] bytes) { super(ArrayUtils.clone(bytes), Column.Type.BYTES, null == bytes ? 0 : bytes.length); } @Override public byte[] asBytes() { if (null == this.getRawData()) { return null; } return (byte[]) this.getRawData(); } @Override public String asString() { if (null == this.getRawData()) { return null; } try { return ColumnCast.bytes2String(this); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("Bytes[%s]不能转为String .", this.toString())); } } @Override public Long asLong() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Long ."); } @Override public BigDecimal asBigDecimal() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为BigDecimal ."); } @Override public BigInteger asBigInteger() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为BigInteger ."); } @Override public Double asDouble() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Long ."); } @Override public Date asDate() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Date ."); } @Override public Date asDate(String dateFormat) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Date ."); } @Override public Boolean asBoolean() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Bytes类型不能转为Boolean ."); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/Column.java ================================================ package com.alibaba.datax.common.element; import com.alibaba.fastjson2.JSON; import java.math.BigDecimal; import java.math.BigInteger; import java.util.Date; /** * Created by jingxing on 14-8-24. *

*/ public abstract class Column { private Type type; private Object rawData; private int byteSize; public Column(final Object object, final Type type, int byteSize) { this.rawData = object; this.type = type; this.byteSize = byteSize; } public Object getRawData() { return this.rawData; } public Type getType() { return this.type; } public int getByteSize() { return this.byteSize; } protected void setType(Type type) { this.type = type; } protected void setRawData(Object rawData) { this.rawData = rawData; } protected void setByteSize(int byteSize) { this.byteSize = byteSize; } public abstract Long asLong(); public abstract Double asDouble(); public abstract String asString(); public abstract Date asDate(); public abstract Date asDate(String dateFormat); public abstract byte[] asBytes(); public abstract Boolean asBoolean(); public abstract BigDecimal asBigDecimal(); public abstract BigInteger asBigInteger(); @Override public String toString() { return JSON.toJSONString(this); } public enum Type { BAD, NULL, INT, LONG, DOUBLE, STRING, BOOL, DATE, BYTES } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/ColumnCast.java ================================================ package com.alibaba.datax.common.element; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.time.DateFormatUtils; import org.apache.commons.lang3.time.FastDateFormat; import java.io.UnsupportedEncodingException; import java.text.ParseException; import java.util.*; public final class ColumnCast { public static void bind(final Configuration configuration) { StringCast.init(configuration); DateCast.init(configuration); BytesCast.init(configuration); } public static Date string2Date(final StringColumn column) throws ParseException { return StringCast.asDate(column); } public static Date string2Date(final StringColumn column, String dateFormat) throws ParseException { return StringCast.asDate(column, dateFormat); } public static byte[] string2Bytes(final StringColumn column) throws UnsupportedEncodingException { return StringCast.asBytes(column); } public static String date2String(final DateColumn column) { return DateCast.asString(column); } public static String bytes2String(final BytesColumn column) throws UnsupportedEncodingException { return BytesCast.asString(column); } } class StringCast { static String datetimeFormat = "yyyy-MM-dd HH:mm:ss"; static String dateFormat = "yyyy-MM-dd"; static String timeFormat = "HH:mm:ss"; static List extraFormats = Collections.emptyList(); static String timeZone = "GMT+8"; static FastDateFormat dateFormatter; static FastDateFormat timeFormatter; static FastDateFormat datetimeFormatter; static TimeZone timeZoner; static String encoding = "UTF-8"; static void init(final Configuration configuration) { StringCast.datetimeFormat = configuration.getString( "common.column.datetimeFormat", StringCast.datetimeFormat); StringCast.dateFormat = configuration.getString( "common.column.dateFormat", StringCast.dateFormat); StringCast.timeFormat = configuration.getString( "common.column.timeFormat", StringCast.timeFormat); StringCast.extraFormats = configuration.getList( "common.column.extraFormats", Collections.emptyList(), String.class); StringCast.timeZone = configuration.getString("common.column.timeZone", StringCast.timeZone); StringCast.timeZoner = TimeZone.getTimeZone(StringCast.timeZone); StringCast.datetimeFormatter = FastDateFormat.getInstance( StringCast.datetimeFormat, StringCast.timeZoner); StringCast.dateFormatter = FastDateFormat.getInstance( StringCast.dateFormat, StringCast.timeZoner); StringCast.timeFormatter = FastDateFormat.getInstance( StringCast.timeFormat, StringCast.timeZoner); StringCast.encoding = configuration.getString("common.column.encoding", StringCast.encoding); } static Date asDate(final StringColumn column) throws ParseException { if (null == column.asString()) { return null; } try { return StringCast.datetimeFormatter.parse(column.asString()); } catch (ParseException ignored) { } try { return StringCast.dateFormatter.parse(column.asString()); } catch (ParseException ignored) { } ParseException e; try { return StringCast.timeFormatter.parse(column.asString()); } catch (ParseException ignored) { e = ignored; } for (String format : StringCast.extraFormats) { try{ return FastDateFormat.getInstance(format, StringCast.timeZoner).parse(column.asString()); } catch (ParseException ignored){ e = ignored; } } throw e; } static Date asDate(final StringColumn column, String dateFormat) throws ParseException { ParseException e; try { return FastDateFormat.getInstance(dateFormat, StringCast.timeZoner).parse(column.asString()); } catch (ParseException ignored) { e = ignored; } throw e; } static byte[] asBytes(final StringColumn column) throws UnsupportedEncodingException { if (null == column.asString()) { return null; } return column.asString().getBytes(StringCast.encoding); } } /** * 后续为了可维护性,可以考虑直接使用 apache 的DateFormatUtils. * * 迟南已经修复了该问题,但是为了维护性,还是直接使用apache的内置函数 */ class DateCast { static String datetimeFormat = "yyyy-MM-dd HH:mm:ss"; static String dateFormat = "yyyy-MM-dd"; static String timeFormat = "HH:mm:ss"; static String timeZone = "GMT+8"; static TimeZone timeZoner = TimeZone.getTimeZone(DateCast.timeZone); static void init(final Configuration configuration) { DateCast.datetimeFormat = configuration.getString( "common.column.datetimeFormat", datetimeFormat); DateCast.timeFormat = configuration.getString( "common.column.timeFormat", timeFormat); DateCast.dateFormat = configuration.getString( "common.column.dateFormat", dateFormat); DateCast.timeZone = configuration.getString("common.column.timeZone", DateCast.timeZone); DateCast.timeZoner = TimeZone.getTimeZone(DateCast.timeZone); return; } static String asString(final DateColumn column) { if (null == column.asDate()) { return null; } switch (column.getSubType()) { case DATE: return DateFormatUtils.format(column.asDate(), DateCast.dateFormat, DateCast.timeZoner); case TIME: return DateFormatUtils.format(column.asDate(), DateCast.timeFormat, DateCast.timeZoner); case DATETIME: return DateFormatUtils.format(column.asDate(), DateCast.datetimeFormat, DateCast.timeZoner); default: throw DataXException .asDataXException(CommonErrorCode.CONVERT_NOT_SUPPORT, "时间类型出现不支持类型,目前仅支持DATE/TIME/DATETIME。该类型属于编程错误,请反馈给DataX开发团队 ."); } } } class BytesCast { static String encoding = "utf-8"; static void init(final Configuration configuration) { BytesCast.encoding = configuration.getString("common.column.encoding", BytesCast.encoding); return; } static String asString(final BytesColumn column) throws UnsupportedEncodingException { if (null == column.asBytes()) { return null; } return new String(column.asBytes(), encoding); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/DateColumn.java ================================================ package com.alibaba.datax.common.element; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import java.math.BigDecimal; import java.math.BigInteger; import java.sql.Time; import java.util.Date; /** * Created by jingxing on 14-8-24. */ public class DateColumn extends Column { private DateType subType = DateType.DATETIME; private int nanos = 0; private int precision = -1; public static enum DateType { DATE, TIME, DATETIME } /** * 构建值为time(java.sql.Time)的DateColumn,使用Date子类型为TIME,只有时间,没有日期 */ public DateColumn(Time time, int nanos, int jdbcPrecision) { this(time); if (time != null) { setNanos(nanos); } if (jdbcPrecision == 10) { setPrecision(0); } if (jdbcPrecision >= 12 && jdbcPrecision <= 17) { setPrecision(jdbcPrecision - 11); } } public long getNanos() { return nanos; } public void setNanos(int nanos) { this.nanos = nanos; } public int getPrecision() { return precision; } public void setPrecision(int precision) { this.precision = precision; } /** * 构建值为null的DateColumn,使用Date子类型为DATETIME */ public DateColumn() { this((Long) null); } /** * 构建值为stamp(Unix时间戳)的DateColumn,使用Date子类型为DATETIME * 实际存储有date改为long的ms,节省存储 * */ public DateColumn(final Long stamp) { super(stamp, Column.Type.DATE, (null == stamp ? 0 : 8)); } /** * 构建值为date(java.util.Date)的DateColumn,使用Date子类型为DATETIME * */ public DateColumn(final Date date) { this(date == null ? null : date.getTime()); } /** * 构建值为date(java.sql.Date)的DateColumn,使用Date子类型为DATE,只有日期,没有时间 * */ public DateColumn(final java.sql.Date date) { this(date == null ? null : date.getTime()); this.setSubType(DateType.DATE); } /** * 构建值为time(java.sql.Time)的DateColumn,使用Date子类型为TIME,只有时间,没有日期 * */ public DateColumn(final java.sql.Time time) { this(time == null ? null : time.getTime()); this.setSubType(DateType.TIME); } /** * 构建值为ts(java.sql.Timestamp)的DateColumn,使用Date子类型为DATETIME * */ public DateColumn(final java.sql.Timestamp ts) { this(ts == null ? null : ts.getTime()); this.setSubType(DateType.DATETIME); } @Override public Long asLong() { return (Long)this.getRawData(); } @Override public String asString() { try { return ColumnCast.date2String(this); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("Date[%s]类型不能转为String .", this.toString())); } } @Override public Date asDate() { if (null == this.getRawData()) { return null; } return new Date((Long)this.getRawData()); } @Override public Date asDate(String dateFormat) { return asDate(); } @Override public byte[] asBytes() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为Bytes ."); } @Override public Boolean asBoolean() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为Boolean ."); } @Override public Double asDouble() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为Double ."); } @Override public BigInteger asBigInteger() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为BigInteger ."); } @Override public BigDecimal asBigDecimal() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Date类型不能转为BigDecimal ."); } public DateType getSubType() { return subType; } public void setSubType(DateType subType) { this.subType = subType; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/DoubleColumn.java ================================================ package com.alibaba.datax.common.element; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import java.math.BigDecimal; import java.math.BigInteger; import java.util.Date; public class DoubleColumn extends Column { public DoubleColumn(final String data) { this(data, null == data ? 0 : data.length()); this.validate(data); } public DoubleColumn(Long data) { this(data == null ? (String) null : String.valueOf(data)); } public DoubleColumn(Integer data) { this(data == null ? (String) null : String.valueOf(data)); } /** * Double无法表示准确的小数数据,我们不推荐使用该方法保存Double数据,建议使用String作为构造入参 * * */ public DoubleColumn(final Double data) { this(data == null ? (String) null : new BigDecimal(String.valueOf(data)).toPlainString()); } /** * Float无法表示准确的小数数据,我们不推荐使用该方法保存Float数据,建议使用String作为构造入参 * * */ public DoubleColumn(final Float data) { this(data == null ? (String) null : new BigDecimal(String.valueOf(data)).toPlainString()); } public DoubleColumn(final BigDecimal data) { this(null == data ? (String) null : data.toPlainString()); } public DoubleColumn(final BigInteger data) { this(null == data ? (String) null : data.toString()); } public DoubleColumn() { this((String) null); } private DoubleColumn(final String data, int byteSize) { super(data, Column.Type.DOUBLE, byteSize); } @Override public BigDecimal asBigDecimal() { if (null == this.getRawData()) { return null; } try { return new BigDecimal((String) this.getRawData()); } catch (NumberFormatException e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[%s] 无法转换为Double类型 .", (String) this.getRawData())); } } @Override public Double asDouble() { if (null == this.getRawData()) { return null; } String string = (String) this.getRawData(); boolean isDoubleSpecific = string.equals("NaN") || string.equals("-Infinity") || string.equals("+Infinity"); if (isDoubleSpecific) { return Double.valueOf(string); } BigDecimal result = this.asBigDecimal(); OverFlowUtil.validateDoubleNotOverFlow(result); return result.doubleValue(); } @Override public Long asLong() { if (null == this.getRawData()) { return null; } BigDecimal result = this.asBigDecimal(); OverFlowUtil.validateLongNotOverFlow(result.toBigInteger()); return result.longValue(); } @Override public BigInteger asBigInteger() { if (null == this.getRawData()) { return null; } return this.asBigDecimal().toBigInteger(); } @Override public String asString() { if (null == this.getRawData()) { return null; } return (String) this.getRawData(); } @Override public Boolean asBoolean() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Double类型无法转为Bool ."); } @Override public Date asDate() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Double类型无法转为Date类型 ."); } @Override public Date asDate(String dateFormat) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Double类型无法转为Date类型 ."); } @Override public byte[] asBytes() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Double类型无法转为Bytes类型 ."); } private void validate(final String data) { if (null == data) { return; } if (data.equalsIgnoreCase("NaN") || data.equalsIgnoreCase("-Infinity") || data.equalsIgnoreCase("Infinity")) { return; } try { new BigDecimal(data); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[%s]无法转为Double类型 .", data)); } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/LongColumn.java ================================================ package com.alibaba.datax.common.element; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang3.math.NumberUtils; import java.math.BigDecimal; import java.math.BigInteger; import java.util.Date; public class LongColumn extends Column { /** * 从整形字符串表示转为LongColumn,支持Java科学计数法 * * NOTE:
* 如果data为浮点类型的字符串表示,数据将会失真,请使用DoubleColumn对接浮点字符串 * * */ public LongColumn(final String data) { super(null, Column.Type.LONG, 0); if (null == data) { return; } try { BigInteger rawData = NumberUtils.createBigDecimal(data) .toBigInteger(); super.setRawData(rawData); // 当 rawData 为[0-127]时,rawData.bitLength() < 8,导致其 byteSize = 0,简单起见,直接认为其长度为 data.length() // super.setByteSize(rawData.bitLength() / 8); super.setByteSize(data.length()); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[%s]不能转为Long .", data)); } } public LongColumn(Long data) { this(null == data ? (BigInteger) null : BigInteger.valueOf(data)); } public LongColumn(Integer data) { this(null == data ? (BigInteger) null : BigInteger.valueOf(data)); } public LongColumn(BigInteger data) { this(data, null == data ? 0 : 8); } private LongColumn(BigInteger data, int byteSize) { super(data, Column.Type.LONG, byteSize); } public LongColumn() { this((BigInteger) null); } @Override public BigInteger asBigInteger() { if (null == this.getRawData()) { return null; } return (BigInteger) this.getRawData(); } @Override public Long asLong() { BigInteger rawData = (BigInteger) this.getRawData(); if (null == rawData) { return null; } OverFlowUtil.validateLongNotOverFlow(rawData); return rawData.longValue(); } @Override public Double asDouble() { if (null == this.getRawData()) { return null; } BigDecimal decimal = this.asBigDecimal(); OverFlowUtil.validateDoubleNotOverFlow(decimal); return decimal.doubleValue(); } @Override public Boolean asBoolean() { if (null == this.getRawData()) { return null; } return this.asBigInteger().compareTo(BigInteger.ZERO) != 0 ? true : false; } @Override public BigDecimal asBigDecimal() { if (null == this.getRawData()) { return null; } return new BigDecimal(this.asBigInteger()); } @Override public String asString() { if (null == this.getRawData()) { return null; } return ((BigInteger) this.getRawData()).toString(); } @Override public Date asDate() { if (null == this.getRawData()) { return null; } return new Date(this.asLong()); } @Override public Date asDate(String dateFormat) { return this.asDate(); } @Override public byte[] asBytes() { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, "Long类型不能转为Bytes ."); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/OverFlowUtil.java ================================================ package com.alibaba.datax.common.element; import java.math.BigDecimal; import java.math.BigInteger; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; public final class OverFlowUtil { public static final BigInteger MAX_LONG = BigInteger .valueOf(Long.MAX_VALUE); public static final BigInteger MIN_LONG = BigInteger .valueOf(Long.MIN_VALUE); public static final BigDecimal MIN_DOUBLE_POSITIVE = new BigDecimal( String.valueOf(Double.MIN_VALUE)); public static final BigDecimal MAX_DOUBLE_POSITIVE = new BigDecimal( String.valueOf(Double.MAX_VALUE)); public static boolean isLongOverflow(final BigInteger integer) { return (integer.compareTo(OverFlowUtil.MAX_LONG) > 0 || integer .compareTo(OverFlowUtil.MIN_LONG) < 0); } public static void validateLongNotOverFlow(final BigInteger integer) { boolean isOverFlow = OverFlowUtil.isLongOverflow(integer); if (isOverFlow) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_OVER_FLOW, String.format("[%s] 转为Long类型出现溢出 .", integer.toString())); } } public static boolean isDoubleOverFlow(final BigDecimal decimal) { if (decimal.signum() == 0) { return false; } BigDecimal newDecimal = decimal; boolean isPositive = decimal.signum() == 1; if (!isPositive) { newDecimal = decimal.negate(); } return (newDecimal.compareTo(MIN_DOUBLE_POSITIVE) < 0 || newDecimal .compareTo(MAX_DOUBLE_POSITIVE) > 0); } public static void validateDoubleNotOverFlow(final BigDecimal decimal) { boolean isOverFlow = OverFlowUtil.isDoubleOverFlow(decimal); if (isOverFlow) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_OVER_FLOW, String.format("[%s]转为Double类型出现溢出 .", decimal.toPlainString())); } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/Record.java ================================================ package com.alibaba.datax.common.element; import java.util.Map; /** * Created by jingxing on 14-8-24. */ public interface Record { public void addColumn(Column column); public void setColumn(int i, final Column column); public Column getColumn(int i); public String toString(); public int getColumnNumber(); public int getByteSize(); public int getMemorySize(); public void setMeta(Map meta); public Map getMeta(); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/element/StringColumn.java ================================================ package com.alibaba.datax.common.element; import java.math.BigDecimal; import java.math.BigInteger; import java.util.Date; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; /** * Created by jingxing on 14-8-24. */ public class StringColumn extends Column { public StringColumn() { this((String) null); } public StringColumn(final String rawData) { super(rawData, Column.Type.STRING, (null == rawData ? 0 : rawData .length())); } @Override public String asString() { if (null == this.getRawData()) { return null; } return (String) this.getRawData(); } private void validateDoubleSpecific(final String data) { if ("NaN".equals(data) || "Infinity".equals(data) || "-Infinity".equals(data)) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[\"%s\"]属于Double特殊类型,不能转为其他类型 .", data)); } return; } @Override public BigInteger asBigInteger() { if (null == this.getRawData()) { return null; } this.validateDoubleSpecific((String) this.getRawData()); try { return this.asBigDecimal().toBigInteger(); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format( "String[\"%s\"]不能转为BigInteger .", this.asString())); } } @Override public Long asLong() { if (null == this.getRawData()) { return null; } this.validateDoubleSpecific((String) this.getRawData()); try { BigInteger integer = this.asBigInteger(); OverFlowUtil.validateLongNotOverFlow(integer); return integer.longValue(); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[\"%s\"]不能转为Long .", this.asString())); } } @Override public BigDecimal asBigDecimal() { if (null == this.getRawData()) { return null; } this.validateDoubleSpecific((String) this.getRawData()); try { return new BigDecimal(this.asString()); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format( "String [\"%s\"] 不能转为BigDecimal .", this.asString())); } } @Override public Double asDouble() { if (null == this.getRawData()) { return null; } String data = (String) this.getRawData(); if ("NaN".equals(data)) { return Double.NaN; } if ("Infinity".equals(data)) { return Double.POSITIVE_INFINITY; } if ("-Infinity".equals(data)) { return Double.NEGATIVE_INFINITY; } BigDecimal decimal = this.asBigDecimal(); OverFlowUtil.validateDoubleNotOverFlow(decimal); return decimal.doubleValue(); } @Override public Boolean asBoolean() { if (null == this.getRawData()) { return null; } if ("true".equalsIgnoreCase(this.asString())) { return true; } if ("false".equalsIgnoreCase(this.asString())) { return false; } throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[\"%s\"]不能转为Bool .", this.asString())); } @Override public Date asDate() { try { return ColumnCast.string2Date(this); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[\"%s\"]不能转为Date .", this.asString())); } } @Override public Date asDate(String dateFormat) { try { return ColumnCast.string2Date(this, dateFormat); } catch (Exception e) { throw DataXException.asDataXException(CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[\"%s\"]不能转为Date .", this.asString())); } } @Override public byte[] asBytes() { try { return ColumnCast.string2Bytes(this); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONVERT_NOT_SUPPORT, String.format("String[\"%s\"]不能转为Bytes .", this.asString())); } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/exception/CommonErrorCode.java ================================================ package com.alibaba.datax.common.exception; import com.alibaba.datax.common.spi.ErrorCode; /** * */ public enum CommonErrorCode implements ErrorCode { CONFIG_ERROR("Common-00", "您提供的配置文件存在错误信息,请检查您的作业配置 ."), CONVERT_NOT_SUPPORT("Common-01", "同步数据出现业务脏数据情况,数据类型转换错误 ."), CONVERT_OVER_FLOW("Common-02", "同步数据出现业务脏数据情况,数据类型转换溢出 ."), RETRY_FAIL("Common-10", "方法调用多次仍旧失败 ."), RUNTIME_ERROR("Common-11", "运行时内部调用错误 ."), HOOK_INTERNAL_ERROR("Common-12", "Hook运行错误 ."), SHUT_DOWN_TASK("Common-20", "Task收到了shutdown指令,为failover做准备"), WAIT_TIME_EXCEED("Common-21", "等待时间超出范围"), TASK_HUNG_EXPIRED("Common-22", "任务hung住,Expired"); private final String code; private final String describe; private CommonErrorCode(String code, String describe) { this.code = code; this.describe = describe; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.describe; } @Override public String toString() { return String.format("Code:[%s], Describe:[%s]", this.code, this.describe); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/exception/DataXException.java ================================================ package com.alibaba.datax.common.exception; import com.alibaba.datax.common.spi.ErrorCode; import java.io.PrintWriter; import java.io.StringWriter; public class DataXException extends RuntimeException { private static final long serialVersionUID = 1L; private ErrorCode errorCode; public DataXException(ErrorCode errorCode, String errorMessage) { super(errorCode.toString() + " - " + errorMessage); this.errorCode = errorCode; } public DataXException(String errorMessage) { super(errorMessage); } private DataXException(ErrorCode errorCode, String errorMessage, Throwable cause) { super(errorCode.toString() + " - " + getMessage(errorMessage) + " - " + getMessage(cause), cause); this.errorCode = errorCode; } public static DataXException asDataXException(ErrorCode errorCode, String message) { return new DataXException(errorCode, message); } public static DataXException asDataXException(String message) { return new DataXException(message); } public static DataXException asDataXException(ErrorCode errorCode, String message, Throwable cause) { if (cause instanceof DataXException) { return (DataXException) cause; } return new DataXException(errorCode, message, cause); } public static DataXException asDataXException(ErrorCode errorCode, Throwable cause) { if (cause instanceof DataXException) { return (DataXException) cause; } return new DataXException(errorCode, getMessage(cause), cause); } public ErrorCode getErrorCode() { return this.errorCode; } private static String getMessage(Object obj) { if (obj == null) { return ""; } if (obj instanceof Throwable) { StringWriter str = new StringWriter(); PrintWriter pw = new PrintWriter(str); ((Throwable) obj).printStackTrace(pw); return str.toString(); // return ((Throwable) obj).getMessage(); } else { return obj.toString(); } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/exception/ExceptionTracker.java ================================================ package com.alibaba.datax.common.exception; import java.io.PrintWriter; import java.io.StringWriter; public final class ExceptionTracker { public static final int STRING_BUFFER = 1024; public static String trace(Throwable ex) { StringWriter sw = new StringWriter(STRING_BUFFER); PrintWriter pw = new PrintWriter(sw); ex.printStackTrace(pw); return sw.toString(); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/AbstractJobPlugin.java ================================================ package com.alibaba.datax.common.plugin; /** * Created by jingxing on 14-8-24. */ public abstract class AbstractJobPlugin extends AbstractPlugin { /** * @return the jobPluginCollector */ public JobPluginCollector getJobPluginCollector() { return jobPluginCollector; } /** * @param jobPluginCollector * the jobPluginCollector to set */ public void setJobPluginCollector( JobPluginCollector jobPluginCollector) { this.jobPluginCollector = jobPluginCollector; } private JobPluginCollector jobPluginCollector; } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/AbstractPlugin.java ================================================ package com.alibaba.datax.common.plugin; import com.alibaba.datax.common.base.BaseObject; import com.alibaba.datax.common.util.Configuration; import java.util.List; public abstract class AbstractPlugin extends BaseObject implements Pluginable { //作业的config private Configuration pluginJobConf; //插件本身的plugin private Configuration pluginConf; // by qiangsi.lq。 修改为对端的作业configuration private Configuration peerPluginJobConf; private String peerPluginName; private List readerPluginSplitConf; @Override public String getPluginName() { assert null != this.pluginConf; return this.pluginConf.getString("name"); } @Override public String getDeveloper() { assert null != this.pluginConf; return this.pluginConf.getString("developer"); } @Override public String getDescription() { assert null != this.pluginConf; return this.pluginConf.getString("description"); } @Override public Configuration getPluginJobConf() { return pluginJobConf; } @Override public void setPluginJobConf(Configuration pluginJobConf) { this.pluginJobConf = pluginJobConf; } @Override public void setPluginConf(Configuration pluginConf) { this.pluginConf = pluginConf; } @Override public Configuration getPeerPluginJobConf() { return peerPluginJobConf; } @Override public void setPeerPluginJobConf(Configuration peerPluginJobConf) { this.peerPluginJobConf = peerPluginJobConf; } @Override public String getPeerPluginName() { return peerPluginName; } @Override public void setPeerPluginName(String peerPluginName) { this.peerPluginName = peerPluginName; } public void preCheck() { } public void prepare() { } public void post() { } public void preHandler(Configuration jobConfiguration){ } public void postHandler(Configuration jobConfiguration){ } public List getReaderPluginSplitConf(){ return this.readerPluginSplitConf; } public void setReaderPluginSplitConf(List readerPluginSplitConf){ this.readerPluginSplitConf = readerPluginSplitConf; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/AbstractTaskPlugin.java ================================================ package com.alibaba.datax.common.plugin; /** * Created by jingxing on 14-8-24. */ public abstract class AbstractTaskPlugin extends AbstractPlugin { //TaskPlugin 应该具备taskId private int taskGroupId; private int taskId; private TaskPluginCollector taskPluginCollector; public TaskPluginCollector getTaskPluginCollector() { return taskPluginCollector; } public void setTaskPluginCollector( TaskPluginCollector taskPluginCollector) { this.taskPluginCollector = taskPluginCollector; } public int getTaskId() { return taskId; } public void setTaskId(int taskId) { this.taskId = taskId; } public int getTaskGroupId() { return taskGroupId; } public void setTaskGroupId(int taskGroupId) { this.taskGroupId = taskGroupId; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/JobPluginCollector.java ================================================ package com.alibaba.datax.common.plugin; import java.util.List; import java.util.Map; /** * Created by jingxing on 14-9-9. */ public interface JobPluginCollector extends PluginCollector { /** * 从Task获取自定义收集信息 * * */ Map> getMessage(); /** * 从Task获取自定义收集信息 * * */ List getMessage(String key); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/PluginCollector.java ================================================ package com.alibaba.datax.common.plugin; /** * 这里只是一个标示类 * */ public interface PluginCollector { } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/Pluginable.java ================================================ package com.alibaba.datax.common.plugin; import com.alibaba.datax.common.util.Configuration; public interface Pluginable { String getDeveloper(); String getDescription(); void setPluginConf(Configuration pluginConf); void init(); void destroy(); String getPluginName(); Configuration getPluginJobConf(); Configuration getPeerPluginJobConf(); public String getPeerPluginName(); void setPluginJobConf(Configuration jobConf); void setPeerPluginJobConf(Configuration peerPluginJobConf); public void setPeerPluginName(String peerPluginName); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/RecordReceiver.java ================================================ /** * (C) 2010-2013 Alibaba Group Holding Limited. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.alibaba.datax.common.plugin; import com.alibaba.datax.common.element.Record; public interface RecordReceiver { public Record getFromReader(); public void shutdown(); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/RecordSender.java ================================================ /** * (C) 2010-2013 Alibaba Group Holding Limited. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.alibaba.datax.common.plugin; import com.alibaba.datax.common.element.Record; public interface RecordSender { public Record createRecord(); public void sendToWriter(Record record); public void flush(); public void terminate(); public void shutdown(); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/plugin/TaskPluginCollector.java ================================================ package com.alibaba.datax.common.plugin; import com.alibaba.datax.common.element.Record; /** * * 该接口提供给Task Plugin用来记录脏数据和自定义信息。
* * 1. 脏数据记录,TaskPluginCollector提供多种脏数据记录的适配,包括本地输出、集中式汇报等等
* 2. 自定义信息,所有的task插件运行过程中可以通过TaskPluginCollector收集信息,
* Job的插件在POST过程中通过getMessage()接口获取信息 */ public abstract class TaskPluginCollector implements PluginCollector { /** * 收集脏数据 * * @param dirtyRecord * 脏数据信息 * @param t * 异常信息 * @param errorMessage * 错误的提示信息 */ public abstract void collectDirtyRecord(final Record dirtyRecord, final Throwable t, final String errorMessage); /** * 收集脏数据 * * @param dirtyRecord * 脏数据信息 * @param errorMessage * 错误的提示信息 */ public void collectDirtyRecord(final Record dirtyRecord, final String errorMessage) { this.collectDirtyRecord(dirtyRecord, null, errorMessage); } /** * 收集脏数据 * * @param dirtyRecord * 脏数据信息 * @param t * 异常信息 */ public void collectDirtyRecord(final Record dirtyRecord, final Throwable t) { this.collectDirtyRecord(dirtyRecord, t, ""); } /** * 收集自定义信息,Job插件可以通过getMessage获取该信息
* 如果多个key冲突,内部使用List记录同一个key,多个value情况。
* */ public abstract void collectMessage(final String key, final String value); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/spi/ErrorCode.java ================================================ package com.alibaba.datax.common.spi; /** * 尤其注意:最好提供toString()实现。例如: * *

 * 
 * @Override
 * public String toString() {
 * 	return String.format("Code:[%s], Description:[%s]. ", this.code, this.describe);
 * }
 * 
* */ public interface ErrorCode { // 错误码编号 String getCode(); // 错误码描述 String getDescription(); /** 必须提供toString的实现 * *
	 * @Override
	 * public String toString() {
	 * 	return String.format("Code:[%s], Description:[%s]. ", this.code, this.describe);
	 * }
	 * 
* */ String toString(); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/spi/Hook.java ================================================ package com.alibaba.datax.common.spi; import com.alibaba.datax.common.util.Configuration; import java.util.Map; /** * Created by xiafei.qiuxf on 14/12/17. */ public interface Hook { /** * 返回名字 * * @return */ public String getName(); /** * TODO 文档 * * @param jobConf * @param msg */ public void invoke(Configuration jobConf, Map msg); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/spi/Reader.java ================================================ package com.alibaba.datax.common.spi; import java.util.List; import com.alibaba.datax.common.base.BaseObject; import com.alibaba.datax.common.plugin.AbstractJobPlugin; import com.alibaba.datax.common.plugin.AbstractTaskPlugin; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.plugin.RecordSender; /** * 每个Reader插件在其内部内部实现Job、Task两个内部类。 * * * */ public abstract class Reader extends BaseObject { /** * 每个Reader插件必须实现Job内部类。 * * */ public static abstract class Job extends AbstractJobPlugin { /** * 切分任务 * * @param adviceNumber * * 着重说明下,adviceNumber是框架建议插件切分的任务数,插件开发人员最好切分出来的任务数>= * adviceNumber。
*
* 之所以采取这个建议是为了给用户最好的实现,例如框架根据计算认为用户数据存储可以支持100个并发连接, * 并且用户认为需要100个并发。 此时,插件开发人员如果能够根据上述切分规则进行切分并做到>=100连接信息, * DataX就可以同时启动100个Channel,这样给用户最好的吞吐量
* 例如用户同步一张Mysql单表,但是认为可以到10并发吞吐量,插件开发人员最好对该表进行切分,比如使用主键范围切分, * 并且如果最终切分任务数到>=10,我们就可以提供给用户最大的吞吐量。
*
* 当然,我们这里只是提供一个建议值,Reader插件可以按照自己规则切分。但是我们更建议按照框架提供的建议值来切分。
*
* 对于ODPS写入OTS而言,如果存在预排序预切分问题,这样就可能只能按照分区信息切分,无法更细粒度切分, * 这类情况只能按照源头物理信息切分规则切分。
*
* * * */ public abstract List split(int adviceNumber); } public static abstract class Task extends AbstractTaskPlugin { public abstract void startRead(RecordSender recordSender); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/spi/Writer.java ================================================ package com.alibaba.datax.common.spi; import com.alibaba.datax.common.base.BaseObject; import com.alibaba.datax.common.plugin.AbstractJobPlugin; import com.alibaba.datax.common.plugin.AbstractTaskPlugin; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.plugin.RecordReceiver; import java.util.List; /** * 每个Writer插件需要实现Writer类,并在其内部实现Job、Task两个内部类。 * * * */ public abstract class Writer extends BaseObject { /** * 每个Writer插件必须实现Job内部类 */ public abstract static class Job extends AbstractJobPlugin { /** * 切分任务。
* * @param mandatoryNumber * 为了做到Reader、Writer任务数对等,这里要求Writer插件必须按照源端的切分数进行切分。否则框架报错! * * */ public abstract List split(int mandatoryNumber); } /** * 每个Writer插件必须实现Task内部类 */ public abstract static class Task extends AbstractTaskPlugin { public abstract void startWrite(RecordReceiver lineReceiver); public boolean supportFailOver(){return false;} } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/statistics/PerfRecord.java ================================================ package com.alibaba.datax.common.statistics; import com.alibaba.datax.common.util.HostUtils; import org.apache.commons.lang3.time.DateFormatUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.Date; /** * Created by liqiang on 15/8/23. */ @SuppressWarnings("NullableProblems") public class PerfRecord implements Comparable { private static Logger perf = LoggerFactory.getLogger(PerfRecord.class); private static String datetimeFormat = "yyyy-MM-dd HH:mm:ss"; public enum PHASE { /** * task total运行的时间,前10为框架统计,后面为部分插件的个性统计 */ TASK_TOTAL(0), READ_TASK_INIT(1), READ_TASK_PREPARE(2), READ_TASK_DATA(3), READ_TASK_POST(4), READ_TASK_DESTROY(5), WRITE_TASK_INIT(6), WRITE_TASK_PREPARE(7), WRITE_TASK_DATA(8), WRITE_TASK_POST(9), WRITE_TASK_DESTROY(10), /** * SQL_QUERY: sql query阶段, 部分reader的个性统计 */ SQL_QUERY(100), /** * 数据从sql全部读出来 */ RESULT_NEXT_ALL(101), /** * only odps block close */ ODPS_BLOCK_CLOSE(102), WAIT_READ_TIME(103), WAIT_WRITE_TIME(104), TRANSFORMER_TIME(201); private int val; PHASE(int val) { this.val = val; } public int toInt(){ return val; } } public enum ACTION{ start, end } private final int taskGroupId; private final int taskId; private final PHASE phase; private volatile ACTION action; private volatile Date startTime; private volatile long elapsedTimeInNs = -1; private volatile long count = 0; private volatile long size = 0; private volatile long startTimeInNs; private volatile boolean isReport = false; public PerfRecord(int taskGroupId, int taskId, PHASE phase) { this.taskGroupId = taskGroupId; this.taskId = taskId; this.phase = phase; } public static void addPerfRecord(int taskGroupId, int taskId, PHASE phase, long startTime,long elapsedTimeInNs) { if(PerfTrace.getInstance().isEnable()) { PerfRecord perfRecord = new PerfRecord(taskGroupId, taskId, phase); perfRecord.elapsedTimeInNs = elapsedTimeInNs; perfRecord.action = ACTION.end; perfRecord.startTime = new Date(startTime); //在PerfTrace里注册 PerfTrace.getInstance().tracePerfRecord(perfRecord); perf.info(perfRecord.toString()); } } public void start() { if(PerfTrace.getInstance().isEnable()) { this.startTime = new Date(); this.startTimeInNs = System.nanoTime(); this.action = ACTION.start; //在PerfTrace里注册 PerfTrace.getInstance().tracePerfRecord(this); perf.info(toString()); } } public void addCount(long count) { this.count += count; } public void addSize(long size) { this.size += size; } public void end() { if(PerfTrace.getInstance().isEnable()) { this.elapsedTimeInNs = System.nanoTime() - startTimeInNs; this.action = ACTION.end; PerfTrace.getInstance().tracePerfRecord(this); perf.info(toString()); } } public void end(long elapsedTimeInNs) { if(PerfTrace.getInstance().isEnable()) { this.elapsedTimeInNs = elapsedTimeInNs; this.action = ACTION.end; PerfTrace.getInstance().tracePerfRecord(this); perf.info(toString()); } } public String toString() { return String.format("%s,%s,%s,%s,%s,%s,%s,%s,%s,%s" , getInstId(), taskGroupId, taskId, phase, action, DateFormatUtils.format(startTime, datetimeFormat), elapsedTimeInNs, count, size,getHostIP()); } @Override public int compareTo(PerfRecord o) { if (o == null) { return 1; } return this.elapsedTimeInNs > o.elapsedTimeInNs ? 1 : this.elapsedTimeInNs == o.elapsedTimeInNs ? 0 : -1; } @Override public int hashCode() { long jobId = getInstId(); int result = (int) (jobId ^ (jobId >>> 32)); result = 31 * result + taskGroupId; result = 31 * result + taskId; result = 31 * result + phase.toInt(); result = 31 * result + (startTime != null ? startTime.hashCode() : 0); return result; } @Override public boolean equals(Object o) { if (this == o) return true; if(!(o instanceof PerfRecord)){ return false; } PerfRecord dst = (PerfRecord)o; if (this.getInstId() != dst.getInstId()) return false; if (this.taskGroupId != dst.taskGroupId) return false; if (this.taskId != dst.taskId) return false; if (phase != null ? !phase.equals(dst.phase) : dst.phase != null) return false; if (startTime != null ? !startTime.equals(dst.startTime) : dst.startTime != null) return false; return true; } public PerfRecord copy() { PerfRecord copy = new PerfRecord(this.taskGroupId, this.getTaskId(), this.phase); copy.action = this.action; copy.startTime = this.startTime; copy.elapsedTimeInNs = this.elapsedTimeInNs; copy.count = this.count; copy.size = this.size; return copy; } public int getTaskGroupId() { return taskGroupId; } public int getTaskId() { return taskId; } public PHASE getPhase() { return phase; } public ACTION getAction() { return action; } public long getElapsedTimeInNs() { return elapsedTimeInNs; } public long getCount() { return count; } public long getSize() { return size; } public long getInstId(){ return PerfTrace.getInstance().getInstId(); } public String getHostIP(){ return HostUtils.IP; } public String getHostName(){ return HostUtils.HOSTNAME; } public Date getStartTime() { return startTime; } public long getStartTimeInMs() { return startTime.getTime(); } public long getStartTimeInNs() { return startTimeInNs; } public String getDatetime(){ if(startTime == null){ return "null time"; } return DateFormatUtils.format(startTime, datetimeFormat); } public boolean isReport() { return isReport; } public void setIsReport(boolean isReport) { this.isReport = isReport; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/statistics/PerfTrace.java ================================================ package com.alibaba.datax.common.statistics; import com.alibaba.datax.common.statistics.PerfRecord.PHASE; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.HostUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.*; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.TimeUnit; /** * PerfTrace 记录 job(local模式),taskGroup(distribute模式),因为这2种都是jvm,即一个jvm里只需要有1个PerfTrace。 */ public class PerfTrace { private static Logger LOG = LoggerFactory.getLogger(PerfTrace.class); private static PerfTrace instance; private static final Object lock = new Object(); private String perfTraceId; private volatile boolean enable; private volatile boolean isJob; private long instId; private long jobId; private long jobVersion; private int taskGroupId; private int channelNumber; private int batchSize = 500; private volatile boolean perfReportEnable = true; //jobid_jobversion,instanceid,taskid, src_mark, dst_mark, private Map taskDetails = new ConcurrentHashMap(); //PHASE => PerfRecord private ConcurrentHashMap perfRecordMaps4print = new ConcurrentHashMap(); // job_phase => SumPerf4Report private SumPerf4Report sumPerf4Report = new SumPerf4Report(); private SumPerf4Report sumPerf4Report4NotEnd; private Configuration jobInfo; private final Set needReportPool4NotEnd = new HashSet(); private final List totalEndReport = new ArrayList(); /** * 单实例 * * @param isJob * @param jobId * @param taskGroupId * @return */ public static PerfTrace getInstance(boolean isJob, long jobId, int taskGroupId, boolean enable) { if (instance == null) { synchronized (lock) { if (instance == null) { instance = new PerfTrace(isJob, jobId, taskGroupId, enable); } } } return instance; } /** * 因为一个JVM只有一个,因此在getInstance(isJob,jobId,taskGroupId)调用完成实例化后,方便后续调用,直接返回该实例 * * @return */ public static PerfTrace getInstance() { if (instance == null) { LOG.error("PerfTrace instance not be init! must have some error! "); synchronized (lock) { if (instance == null) { instance = new PerfTrace(false, -1111, -1111, false); } } } return instance; } private PerfTrace(boolean isJob, long jobId, int taskGroupId, boolean enable) { try { this.perfTraceId = isJob ? "job_" + jobId : String.format("taskGroup_%s_%s", jobId, taskGroupId); this.enable = enable; this.isJob = isJob; this.taskGroupId = taskGroupId; this.instId = jobId; LOG.info(String.format("PerfTrace traceId=%s, isEnable=%s", this.perfTraceId, this.enable)); } catch (Exception e) { // do nothing this.enable = false; } } public void addTaskDetails(int taskId, String detail) { if (enable) { String before = ""; int index = detail.indexOf("?"); String current = detail.substring(0, index == -1 ? detail.length() : index); if (current.indexOf("[") >= 0) { current += "]"; } if (taskDetails.containsKey(taskId)) { before = taskDetails.get(taskId).trim(); } if (StringUtils.isEmpty(before)) { before = ""; } else { before += ","; } this.taskDetails.put(taskId, before + current); } } public void tracePerfRecord(PerfRecord perfRecord) { try { if (enable) { long curNanoTime = System.nanoTime(); //ArrayList非线程安全 switch (perfRecord.getAction()) { case end: synchronized (totalEndReport) { totalEndReport.add(perfRecord); if (totalEndReport.size() > batchSize * 10) { sumPerf4EndPrint(totalEndReport); } } if (perfReportEnable && needReport(perfRecord)) { synchronized (needReportPool4NotEnd) { sumPerf4Report.add(curNanoTime,perfRecord); needReportPool4NotEnd.remove(perfRecord); } } break; case start: if (perfReportEnable && needReport(perfRecord)) { synchronized (needReportPool4NotEnd) { needReportPool4NotEnd.add(perfRecord); } } break; } } } catch (Exception e) { // do nothing } } private boolean needReport(PerfRecord perfRecord) { switch (perfRecord.getPhase()) { case TASK_TOTAL: case SQL_QUERY: case RESULT_NEXT_ALL: case ODPS_BLOCK_CLOSE: return true; } return false; } public String summarizeNoException() { String res; try { res = summarize(); } catch (Exception e) { res = "PerfTrace summarize has Exception " + e.getMessage(); } return res; } //任务结束时,对当前的perf总汇总统计 private synchronized String summarize() { if (!enable) { return "PerfTrace not enable!"; } if (totalEndReport.size() > 0) { sumPerf4EndPrint(totalEndReport); } StringBuilder info = new StringBuilder(); info.append("\n === total summarize info === \n"); info.append("\n 1. all phase average time info and max time task info: \n\n"); info.append(String.format("%-20s | %18s | %18s | %18s | %18s | %-100s\n", "PHASE", "AVERAGE USED TIME", "ALL TASK NUM", "MAX USED TIME", "MAX TASK ID", "MAX TASK INFO")); List keys = new ArrayList(perfRecordMaps4print.keySet()); Collections.sort(keys, new Comparator() { @Override public int compare(PHASE o1, PHASE o2) { return o1.toInt() - o2.toInt(); } }); for (PHASE phase : keys) { SumPerfRecord4Print sumPerfRecord = perfRecordMaps4print.get(phase); if (sumPerfRecord == null) { continue; } long averageTime = sumPerfRecord.getAverageTime(); long maxTime = sumPerfRecord.getMaxTime(); int maxTaskId = sumPerfRecord.maxTaskId; int maxTaskGroupId = sumPerfRecord.getMaxTaskGroupId(); info.append(String.format("%-20s | %18s | %18s | %18s | %18s | %-100s\n", phase, unitTime(averageTime), sumPerfRecord.totalCount, unitTime(maxTime), jobId + "-" + maxTaskGroupId + "-" + maxTaskId, taskDetails.get(maxTaskId))); } //SumPerfRecord4Print countSumPerf = Optional.fromNullable(perfRecordMaps4print.get(PHASE.READ_TASK_DATA)).or(new SumPerfRecord4Print()); SumPerfRecord4Print countSumPerf = perfRecordMaps4print.get(PHASE.READ_TASK_DATA); if(countSumPerf == null){ countSumPerf = new SumPerfRecord4Print(); } long averageRecords = countSumPerf.getAverageRecords(); long averageBytes = countSumPerf.getAverageBytes(); long maxRecord = countSumPerf.getMaxRecord(); long maxByte = countSumPerf.getMaxByte(); int maxTaskId4Records = countSumPerf.getMaxTaskId4Records(); int maxTGID4Records = countSumPerf.getMaxTGID4Records(); info.append("\n\n 2. record average count and max count task info :\n\n"); info.append(String.format("%-20s | %18s | %18s | %18s | %18s | %18s | %-100s\n", "PHASE", "AVERAGE RECORDS", "AVERAGE BYTES", "MAX RECORDS", "MAX RECORD`S BYTES", "MAX TASK ID", "MAX TASK INFO")); if (maxTaskId4Records > -1) { info.append(String.format("%-20s | %18s | %18s | %18s | %18s | %18s | %-100s\n" , PHASE.READ_TASK_DATA, averageRecords, unitSize(averageBytes), maxRecord, unitSize(maxByte), jobId + "-" + maxTGID4Records + "-" + maxTaskId4Records, taskDetails.get(maxTaskId4Records))); } return info.toString(); } //缺省传入的时间是nano public static String unitTime(long time) { return unitTime(time, TimeUnit.NANOSECONDS); } public static String unitTime(long time, TimeUnit timeUnit) { return String.format("%,.3fs", ((float) timeUnit.toNanos(time)) / 1000000000); } public static String unitSize(long size) { if (size > 1000000000) { return String.format("%,.2fG", (float) size / 1000000000); } else if (size > 1000000) { return String.format("%,.2fM", (float) size / 1000000); } else if (size > 1000) { return String.format("%,.2fK", (float) size / 1000); } else { return size + "B"; } } public synchronized ConcurrentHashMap getPerfRecordMaps4print() { if (totalEndReport.size() > 0) { sumPerf4EndPrint(totalEndReport); } return perfRecordMaps4print; } public SumPerf4Report getSumPerf4Report() { return sumPerf4Report; } public Set getNeedReportPool4NotEnd() { return needReportPool4NotEnd; } public List getTotalEndReport() { return totalEndReport; } public Map getTaskDetails() { return taskDetails; } public boolean isEnable() { return enable; } public boolean isJob() { return isJob; } private String cluster; private String jobDomain; private String srcType; private String dstType; private String srcGuid; private String dstGuid; private Date windowStart; private Date windowEnd; private Date jobStartTime; public void setJobInfo(Configuration jobInfo, boolean perfReportEnable, int channelNumber) { try { this.jobInfo = jobInfo; if (jobInfo != null && perfReportEnable) { cluster = jobInfo.getString("cluster"); String srcDomain = jobInfo.getString("srcDomain", "null"); String dstDomain = jobInfo.getString("dstDomain", "null"); jobDomain = srcDomain + "|" + dstDomain; srcType = jobInfo.getString("srcType"); dstType = jobInfo.getString("dstType"); srcGuid = jobInfo.getString("srcGuid"); dstGuid = jobInfo.getString("dstGuid"); windowStart = getWindow(jobInfo.getString("windowStart"), true); windowEnd = getWindow(jobInfo.getString("windowEnd"), false); String jobIdStr = jobInfo.getString("jobId"); jobId = StringUtils.isEmpty(jobIdStr) ? (long) -5 : Long.parseLong(jobIdStr); String jobVersionStr = jobInfo.getString("jobVersion"); jobVersion = StringUtils.isEmpty(jobVersionStr) ? (long) -4 : Long.parseLong(jobVersionStr); jobStartTime = new Date(); } this.perfReportEnable = perfReportEnable; this.channelNumber = channelNumber; } catch (Exception e) { this.perfReportEnable = false; } } private Date getWindow(String windowStr, boolean startWindow) { SimpleDateFormat sdf1 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); SimpleDateFormat sdf2 = new SimpleDateFormat("yyyy-MM-dd 00:00:00"); if (StringUtils.isNotEmpty(windowStr)) { try { return sdf1.parse(windowStr); } catch (ParseException e) { // do nothing } } if (startWindow) { try { return sdf2.parse(sdf2.format(new Date())); } catch (ParseException e1) { //do nothing } } return null; } public long getInstId() { return instId; } public Configuration getJobInfo() { return jobInfo; } public void setBatchSize(int batchSize) { this.batchSize = batchSize; } public synchronized JobStatisticsDto2 getReports(String mode) { try { if (!enable || !perfReportEnable) { return null; } if (("job".equalsIgnoreCase(mode) && !isJob) || "tg".equalsIgnoreCase(mode) && isJob) { return null; } //每次将未完成的task的统计清空 sumPerf4Report4NotEnd = new SumPerf4Report(); Set needReportPool4NotEndTmp = null; synchronized (needReportPool4NotEnd) { needReportPool4NotEndTmp = new HashSet(needReportPool4NotEnd); } long curNanoTime = System.nanoTime(); for (PerfRecord perfRecord : needReportPool4NotEndTmp) { sumPerf4Report4NotEnd.add(curNanoTime, perfRecord); } JobStatisticsDto2 jdo = new JobStatisticsDto2(); jdo.setInstId(this.instId); if (isJob) { jdo.setTaskGroupId(-6); } else { jdo.setTaskGroupId(this.taskGroupId); } jdo.setJobId(this.jobId); jdo.setJobVersion(this.jobVersion); jdo.setWindowStart(this.windowStart); jdo.setWindowEnd(this.windowEnd); jdo.setJobStartTime(jobStartTime); jdo.setJobRunTimeMs(System.currentTimeMillis() - jobStartTime.getTime()); jdo.setChannelNum(this.channelNumber); jdo.setCluster(this.cluster); jdo.setJobDomain(this.jobDomain); jdo.setSrcType(this.srcType); jdo.setDstType(this.dstType); jdo.setSrcGuid(this.srcGuid); jdo.setDstGuid(this.dstGuid); jdo.setHostAddress(HostUtils.IP); //sum jdo.setTaskTotalTimeMs(sumPerf4Report4NotEnd.totalTaskRunTimeInMs + sumPerf4Report.totalTaskRunTimeInMs); jdo.setOdpsBlockCloseTimeMs(sumPerf4Report4NotEnd.odpsCloseTimeInMs + sumPerf4Report.odpsCloseTimeInMs); jdo.setSqlQueryTimeMs(sumPerf4Report4NotEnd.sqlQueryTimeInMs + sumPerf4Report.sqlQueryTimeInMs); jdo.setResultNextTimeMs(sumPerf4Report4NotEnd.resultNextTimeInMs + sumPerf4Report.resultNextTimeInMs); return jdo; } catch (Exception e) { // do nothing } return null; } private void sumPerf4EndPrint(List totalEndReport) { if (!enable || totalEndReport == null) { return; } for (PerfRecord perfRecord : totalEndReport) { perfRecordMaps4print.putIfAbsent(perfRecord.getPhase(), new SumPerfRecord4Print()); perfRecordMaps4print.get(perfRecord.getPhase()).add(perfRecord); } totalEndReport.clear(); } public void setChannelNumber(int needChannelNumber) { this.channelNumber = needChannelNumber; } public static class SumPerf4Report { long totalTaskRunTimeInMs = 0L; long odpsCloseTimeInMs = 0L; long sqlQueryTimeInMs = 0L; long resultNextTimeInMs = 0L; public void add(long curNanoTime,PerfRecord perfRecord) { try { long runTimeEndInMs; if (perfRecord.getElapsedTimeInNs() == -1) { runTimeEndInMs = (curNanoTime - perfRecord.getStartTimeInNs()) / 1000000; } else { runTimeEndInMs = perfRecord.getElapsedTimeInNs() / 1000000; } switch (perfRecord.getPhase()) { case TASK_TOTAL: totalTaskRunTimeInMs += runTimeEndInMs; break; case SQL_QUERY: sqlQueryTimeInMs += runTimeEndInMs; break; case RESULT_NEXT_ALL: resultNextTimeInMs += runTimeEndInMs; break; case ODPS_BLOCK_CLOSE: odpsCloseTimeInMs += runTimeEndInMs; break; } }catch (Exception e){ //do nothing } } public long getTotalTaskRunTimeInMs() { return totalTaskRunTimeInMs; } public long getOdpsCloseTimeInMs() { return odpsCloseTimeInMs; } public long getSqlQueryTimeInMs() { return sqlQueryTimeInMs; } public long getResultNextTimeInMs() { return resultNextTimeInMs; } } public static class SumPerfRecord4Print { private long perfTimeTotal = 0; private long averageTime = 0; private long maxTime = 0; private int maxTaskId = -1; private int maxTaskGroupId = -1; private int totalCount = 0; private long recordsTotal = 0; private long sizesTotal = 0; private long averageRecords = 0; private long averageBytes = 0; private long maxRecord = 0; private long maxByte = 0; private int maxTaskId4Records = -1; private int maxTGID4Records = -1; public void add(PerfRecord perfRecord) { if (perfRecord == null) { return; } perfTimeTotal += perfRecord.getElapsedTimeInNs(); if (perfRecord.getElapsedTimeInNs() >= maxTime) { maxTime = perfRecord.getElapsedTimeInNs(); maxTaskId = perfRecord.getTaskId(); maxTaskGroupId = perfRecord.getTaskGroupId(); } recordsTotal += perfRecord.getCount(); sizesTotal += perfRecord.getSize(); if (perfRecord.getCount() >= maxRecord) { maxRecord = perfRecord.getCount(); maxByte = perfRecord.getSize(); maxTaskId4Records = perfRecord.getTaskId(); maxTGID4Records = perfRecord.getTaskGroupId(); } totalCount++; } public long getPerfTimeTotal() { return perfTimeTotal; } public long getAverageTime() { if (totalCount > 0) { averageTime = perfTimeTotal / totalCount; } return averageTime; } public long getMaxTime() { return maxTime; } public int getMaxTaskId() { return maxTaskId; } public int getMaxTaskGroupId() { return maxTaskGroupId; } public long getRecordsTotal() { return recordsTotal; } public long getSizesTotal() { return sizesTotal; } public long getAverageRecords() { if (totalCount > 0) { averageRecords = recordsTotal / totalCount; } return averageRecords; } public long getAverageBytes() { if (totalCount > 0) { averageBytes = sizesTotal / totalCount; } return averageBytes; } public long getMaxRecord() { return maxRecord; } public long getMaxByte() { return maxByte; } public int getMaxTaskId4Records() { return maxTaskId4Records; } public int getMaxTGID4Records() { return maxTGID4Records; } public int getTotalCount() { return totalCount; } } class JobStatisticsDto2 { private Long id; private Date gmtCreate; private Date gmtModified; private Long instId; private Long jobId; private Long jobVersion; private Integer taskGroupId; private Date windowStart; private Date windowEnd; private Date jobStartTime; private Date jobEndTime; private Long jobRunTimeMs; private Integer channelNum; private String cluster; private String jobDomain; private String srcType; private String dstType; private String srcGuid; private String dstGuid; private Long records; private Long bytes; private Long speedRecord; private Long speedByte; private String stagePercent; private Long errorRecord; private Long errorBytes; private Long waitReadTimeMs; private Long waitWriteTimeMs; private Long odpsBlockCloseTimeMs; private Long sqlQueryTimeMs; private Long resultNextTimeMs; private Long taskTotalTimeMs; private String hostAddress; public Long getId() { return id; } public Date getGmtCreate() { return gmtCreate; } public Date getGmtModified() { return gmtModified; } public Long getInstId() { return instId; } public Long getJobId() { return jobId; } public Long getJobVersion() { return jobVersion; } public Integer getTaskGroupId() { return taskGroupId; } public Date getWindowStart() { return windowStart; } public Date getWindowEnd() { return windowEnd; } public Date getJobStartTime() { return jobStartTime; } public Date getJobEndTime() { return jobEndTime; } public Long getJobRunTimeMs() { return jobRunTimeMs; } public Integer getChannelNum() { return channelNum; } public String getCluster() { return cluster; } public String getJobDomain() { return jobDomain; } public String getSrcType() { return srcType; } public String getDstType() { return dstType; } public String getSrcGuid() { return srcGuid; } public String getDstGuid() { return dstGuid; } public Long getRecords() { return records; } public Long getBytes() { return bytes; } public Long getSpeedRecord() { return speedRecord; } public Long getSpeedByte() { return speedByte; } public String getStagePercent() { return stagePercent; } public Long getErrorRecord() { return errorRecord; } public Long getErrorBytes() { return errorBytes; } public Long getWaitReadTimeMs() { return waitReadTimeMs; } public Long getWaitWriteTimeMs() { return waitWriteTimeMs; } public Long getOdpsBlockCloseTimeMs() { return odpsBlockCloseTimeMs; } public Long getSqlQueryTimeMs() { return sqlQueryTimeMs; } public Long getResultNextTimeMs() { return resultNextTimeMs; } public Long getTaskTotalTimeMs() { return taskTotalTimeMs; } public String getHostAddress() { return hostAddress; } public void setId(Long id) { this.id = id; } public void setGmtCreate(Date gmtCreate) { this.gmtCreate = gmtCreate; } public void setGmtModified(Date gmtModified) { this.gmtModified = gmtModified; } public void setInstId(Long instId) { this.instId = instId; } public void setJobId(Long jobId) { this.jobId = jobId; } public void setJobVersion(Long jobVersion) { this.jobVersion = jobVersion; } public void setTaskGroupId(Integer taskGroupId) { this.taskGroupId = taskGroupId; } public void setWindowStart(Date windowStart) { this.windowStart = windowStart; } public void setWindowEnd(Date windowEnd) { this.windowEnd = windowEnd; } public void setJobStartTime(Date jobStartTime) { this.jobStartTime = jobStartTime; } public void setJobEndTime(Date jobEndTime) { this.jobEndTime = jobEndTime; } public void setJobRunTimeMs(Long jobRunTimeMs) { this.jobRunTimeMs = jobRunTimeMs; } public void setChannelNum(Integer channelNum) { this.channelNum = channelNum; } public void setCluster(String cluster) { this.cluster = cluster; } public void setJobDomain(String jobDomain) { this.jobDomain = jobDomain; } public void setSrcType(String srcType) { this.srcType = srcType; } public void setDstType(String dstType) { this.dstType = dstType; } public void setSrcGuid(String srcGuid) { this.srcGuid = srcGuid; } public void setDstGuid(String dstGuid) { this.dstGuid = dstGuid; } public void setRecords(Long records) { this.records = records; } public void setBytes(Long bytes) { this.bytes = bytes; } public void setSpeedRecord(Long speedRecord) { this.speedRecord = speedRecord; } public void setSpeedByte(Long speedByte) { this.speedByte = speedByte; } public void setStagePercent(String stagePercent) { this.stagePercent = stagePercent; } public void setErrorRecord(Long errorRecord) { this.errorRecord = errorRecord; } public void setErrorBytes(Long errorBytes) { this.errorBytes = errorBytes; } public void setWaitReadTimeMs(Long waitReadTimeMs) { this.waitReadTimeMs = waitReadTimeMs; } public void setWaitWriteTimeMs(Long waitWriteTimeMs) { this.waitWriteTimeMs = waitWriteTimeMs; } public void setOdpsBlockCloseTimeMs(Long odpsBlockCloseTimeMs) { this.odpsBlockCloseTimeMs = odpsBlockCloseTimeMs; } public void setSqlQueryTimeMs(Long sqlQueryTimeMs) { this.sqlQueryTimeMs = sqlQueryTimeMs; } public void setResultNextTimeMs(Long resultNextTimeMs) { this.resultNextTimeMs = resultNextTimeMs; } public void setTaskTotalTimeMs(Long taskTotalTimeMs) { this.taskTotalTimeMs = taskTotalTimeMs; } public void setHostAddress(String hostAddress) { this.hostAddress = hostAddress; } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/statistics/VMInfo.java ================================================ package com.alibaba.datax.common.statistics; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.lang.management.GarbageCollectorMXBean; import java.lang.management.MemoryPoolMXBean; import java.lang.management.OperatingSystemMXBean; import java.lang.management.RuntimeMXBean; import java.lang.reflect.Method; import java.util.HashMap; import java.util.List; import java.util.Map; /** * Created by liqiang on 15/11/12. */ public class VMInfo { private static final Logger LOG = LoggerFactory.getLogger(VMInfo.class); static final long MB = 1024 * 1024; static final long GB = 1024 * 1024 * 1024; public static Object lock = new Object(); private static VMInfo vmInfo; /** * @return null or vmInfo. null is something error, job no care it. */ public static VMInfo getVmInfo() { if (vmInfo == null) { synchronized (lock) { if (vmInfo == null) { try { vmInfo = new VMInfo(); } catch (Exception e) { LOG.warn("no need care, the fail is ignored : vmInfo init failed " + e.getMessage(), e); } } } } return vmInfo; } // 数据的MxBean private final OperatingSystemMXBean osMXBean; private final RuntimeMXBean runtimeMXBean; private final List garbageCollectorMXBeanList; private final List memoryPoolMXBeanList; /** * 静态信息 */ private final String osInfo; private final String jvmInfo; /** * cpu个数 */ private final int totalProcessorCount; /** * 机器的各个状态,用于中间打印和统计上报 */ private final PhyOSStatus startPhyOSStatus; private final ProcessCpuStatus processCpuStatus = new ProcessCpuStatus(); private final ProcessGCStatus processGCStatus = new ProcessGCStatus(); private final ProcessMemoryStatus processMomoryStatus = new ProcessMemoryStatus(); //ms private long lastUpTime = 0; //nano private long lastProcessCpuTime = 0; private VMInfo() { //初始化静态信息 osMXBean = java.lang.management.ManagementFactory.getOperatingSystemMXBean(); runtimeMXBean = java.lang.management.ManagementFactory.getRuntimeMXBean(); garbageCollectorMXBeanList = java.lang.management.ManagementFactory.getGarbageCollectorMXBeans(); memoryPoolMXBeanList = java.lang.management.ManagementFactory.getMemoryPoolMXBeans(); jvmInfo = runtimeMXBean.getVmVendor() + " " + runtimeMXBean.getSpecVersion() + " " + runtimeMXBean.getVmVersion(); osInfo = osMXBean.getName() + " " + osMXBean.getArch() + " " + osMXBean.getVersion(); totalProcessorCount = osMXBean.getAvailableProcessors(); //构建startPhyOSStatus startPhyOSStatus = new PhyOSStatus(); LOG.info("VMInfo# operatingSystem class => " + osMXBean.getClass().getName()); if (VMInfo.isSunOsMBean(osMXBean)) { { startPhyOSStatus.totalPhysicalMemory = VMInfo.getLongFromOperatingSystem(osMXBean, "getTotalPhysicalMemorySize"); startPhyOSStatus.freePhysicalMemory = VMInfo.getLongFromOperatingSystem(osMXBean, "getFreePhysicalMemorySize"); startPhyOSStatus.maxFileDescriptorCount = VMInfo.getLongFromOperatingSystem(osMXBean, "getMaxFileDescriptorCount"); startPhyOSStatus.currentOpenFileDescriptorCount = VMInfo.getLongFromOperatingSystem(osMXBean, "getOpenFileDescriptorCount"); } } //初始化processGCStatus; for (GarbageCollectorMXBean garbage : garbageCollectorMXBeanList) { GCStatus gcStatus = new GCStatus(); gcStatus.name = garbage.getName(); processGCStatus.gcStatusMap.put(garbage.getName(), gcStatus); } //初始化processMemoryStatus if (memoryPoolMXBeanList != null && !memoryPoolMXBeanList.isEmpty()) { for (MemoryPoolMXBean pool : memoryPoolMXBeanList) { MemoryStatus memoryStatus = new MemoryStatus(); memoryStatus.name = pool.getName(); memoryStatus.initSize = pool.getUsage().getInit(); memoryStatus.maxSize = pool.getUsage().getMax(); processMomoryStatus.memoryStatusMap.put(pool.getName(), memoryStatus); } } } public String toString() { return "the machine info => \n\n" + "\tosInfo:\t" + osInfo + "\n" + "\tjvmInfo:\t" + jvmInfo + "\n" + "\tcpu num:\t" + totalProcessorCount + "\n\n" + startPhyOSStatus.toString() + "\n" + processGCStatus.toString() + "\n" + processMomoryStatus.toString() + "\n"; } public String totalString() { return (processCpuStatus.getTotalString() + processGCStatus.getTotalString()); } public void getDelta() { getDelta(true); } public synchronized void getDelta(boolean print) { try { if (VMInfo.isSunOsMBean(osMXBean)) { long curUptime = runtimeMXBean.getUptime(); long curProcessTime = getLongFromOperatingSystem(osMXBean, "getProcessCpuTime"); //百分比, uptime是ms,processTime是nano if ((curUptime > lastUpTime) && (curProcessTime >= lastProcessCpuTime)) { float curDeltaCpu = (float) (curProcessTime - lastProcessCpuTime) / ((curUptime - lastUpTime) * totalProcessorCount * 10000); processCpuStatus.setMaxMinCpu(curDeltaCpu); processCpuStatus.averageCpu = (float) curProcessTime / (curUptime * totalProcessorCount * 10000); lastUpTime = curUptime; lastProcessCpuTime = curProcessTime; } } for (GarbageCollectorMXBean garbage : garbageCollectorMXBeanList) { GCStatus gcStatus = processGCStatus.gcStatusMap.get(garbage.getName()); if (gcStatus == null) { gcStatus = new GCStatus(); gcStatus.name = garbage.getName(); processGCStatus.gcStatusMap.put(garbage.getName(), gcStatus); } long curTotalGcCount = garbage.getCollectionCount(); gcStatus.setCurTotalGcCount(curTotalGcCount); long curtotalGcTime = garbage.getCollectionTime(); gcStatus.setCurTotalGcTime(curtotalGcTime); } if (memoryPoolMXBeanList != null && !memoryPoolMXBeanList.isEmpty()) { for (MemoryPoolMXBean pool : memoryPoolMXBeanList) { MemoryStatus memoryStatus = processMomoryStatus.memoryStatusMap.get(pool.getName()); if (memoryStatus == null) { memoryStatus = new MemoryStatus(); memoryStatus.name = pool.getName(); processMomoryStatus.memoryStatusMap.put(pool.getName(), memoryStatus); } memoryStatus.commitedSize = pool.getUsage().getCommitted(); memoryStatus.setMaxMinUsedSize(pool.getUsage().getUsed()); long maxMemory = memoryStatus.commitedSize > 0 ? memoryStatus.commitedSize : memoryStatus.maxSize; memoryStatus.setMaxMinPercent(maxMemory > 0 ? (float) 100 * memoryStatus.usedSize / maxMemory : -1); } } if (print) { LOG.info(processCpuStatus.getDeltaString() + processMomoryStatus.getDeltaString() + processGCStatus.getDeltaString()); } } catch (Exception e) { LOG.warn("no need care, the fail is ignored : vmInfo getDelta failed " + e.getMessage(), e); } } public static boolean isSunOsMBean(OperatingSystemMXBean operatingSystem) { final String className = operatingSystem.getClass().getName(); return "com.sun.management.UnixOperatingSystem".equals(className); } public static long getLongFromOperatingSystem(OperatingSystemMXBean operatingSystem, String methodName) { try { final Method method = operatingSystem.getClass().getMethod(methodName, (Class[]) null); method.setAccessible(true); return (Long) method.invoke(operatingSystem, (Object[]) null); } catch (final Exception e) { LOG.info(String.format("OperatingSystemMXBean %s failed, Exception = %s ", methodName, e.getMessage())); } return -1; } private class PhyOSStatus { long totalPhysicalMemory = -1; long freePhysicalMemory = -1; long maxFileDescriptorCount = -1; long currentOpenFileDescriptorCount = -1; public String toString() { return String.format("\ttotalPhysicalMemory:\t%,.2fG\n" + "\tfreePhysicalMemory:\t%,.2fG\n" + "\tmaxFileDescriptorCount:\t%s\n" + "\tcurrentOpenFileDescriptorCount:\t%s\n", (float) totalPhysicalMemory / GB, (float) freePhysicalMemory / GB, maxFileDescriptorCount, currentOpenFileDescriptorCount); } } private class ProcessCpuStatus { // 百分比的值 比如30.0 表示30.0% float maxDeltaCpu = -1; float minDeltaCpu = -1; float curDeltaCpu = -1; float averageCpu = -1; public void setMaxMinCpu(float curCpu) { this.curDeltaCpu = curCpu; if (maxDeltaCpu < curCpu) { maxDeltaCpu = curCpu; } if (minDeltaCpu == -1 || minDeltaCpu > curCpu) { minDeltaCpu = curCpu; } } public String getDeltaString() { StringBuilder sb = new StringBuilder(); sb.append("\n\t [delta cpu info] => \n"); sb.append("\t\t"); sb.append(String.format("%-30s | %-30s | %-30s | %-30s \n", "curDeltaCpu", "averageCpu", "maxDeltaCpu", "minDeltaCpu")); sb.append("\t\t"); sb.append(String.format("%-30s | %-30s | %-30s | %-30s \n", String.format("%,.2f%%", processCpuStatus.curDeltaCpu), String.format("%,.2f%%", processCpuStatus.averageCpu), String.format("%,.2f%%", processCpuStatus.maxDeltaCpu), String.format("%,.2f%%\n", processCpuStatus.minDeltaCpu))); return sb.toString(); } public String getTotalString() { StringBuilder sb = new StringBuilder(); sb.append("\n\t [total cpu info] => \n"); sb.append("\t\t"); sb.append(String.format("%-30s | %-30s | %-30s \n", "averageCpu", "maxDeltaCpu", "minDeltaCpu")); sb.append("\t\t"); sb.append(String.format("%-30s | %-30s | %-30s \n", String.format("%,.2f%%", processCpuStatus.averageCpu), String.format("%,.2f%%", processCpuStatus.maxDeltaCpu), String.format("%,.2f%%\n", processCpuStatus.minDeltaCpu))); return sb.toString(); } } private class ProcessGCStatus { final Map gcStatusMap = new HashMap(); public String toString() { return "\tGC Names\t" + gcStatusMap.keySet() + "\n"; } public String getDeltaString() { StringBuilder sb = new StringBuilder(); sb.append("\n\t [delta gc info] => \n"); sb.append("\t\t "); sb.append(String.format("%-20s | %-18s | %-18s | %-18s | %-18s | %-18s | %-18s | %-18s | %-18s \n", "NAME", "curDeltaGCCount", "totalGCCount", "maxDeltaGCCount", "minDeltaGCCount", "curDeltaGCTime", "totalGCTime", "maxDeltaGCTime", "minDeltaGCTime")); for (GCStatus gc : gcStatusMap.values()) { sb.append("\t\t "); sb.append(String.format("%-20s | %-18s | %-18s | %-18s | %-18s | %-18s | %-18s | %-18s | %-18s \n", gc.name, gc.curDeltaGCCount, gc.totalGCCount, gc.maxDeltaGCCount, gc.minDeltaGCCount, String.format("%,.3fs",(float)gc.curDeltaGCTime/1000), String.format("%,.3fs",(float)gc.totalGCTime/1000), String.format("%,.3fs",(float)gc.maxDeltaGCTime/1000), String.format("%,.3fs",(float)gc.minDeltaGCTime/1000))); } return sb.toString(); } public String getTotalString() { StringBuilder sb = new StringBuilder(); sb.append("\n\t [total gc info] => \n"); sb.append("\t\t "); sb.append(String.format("%-20s | %-18s | %-18s | %-18s | %-18s | %-18s | %-18s \n", "NAME", "totalGCCount", "maxDeltaGCCount", "minDeltaGCCount", "totalGCTime", "maxDeltaGCTime", "minDeltaGCTime")); for (GCStatus gc : gcStatusMap.values()) { sb.append("\t\t "); sb.append(String.format("%-20s | %-18s | %-18s | %-18s | %-18s | %-18s | %-18s \n", gc.name, gc.totalGCCount, gc.maxDeltaGCCount, gc.minDeltaGCCount, String.format("%,.3fs",(float)gc.totalGCTime/1000), String.format("%,.3fs",(float)gc.maxDeltaGCTime/1000), String.format("%,.3fs",(float)gc.minDeltaGCTime/1000))); } return sb.toString(); } } private class ProcessMemoryStatus { final Map memoryStatusMap = new HashMap(); public String toString() { StringBuilder sb = new StringBuilder(); sb.append("\t"); sb.append(String.format("%-30s | %-30s | %-30s \n", "MEMORY_NAME", "allocation_size", "init_size")); for (MemoryStatus ms : memoryStatusMap.values()) { sb.append("\t"); sb.append(String.format("%-30s | %-30s | %-30s \n", ms.name, String.format("%,.2fMB", (float) ms.maxSize / MB), String.format("%,.2fMB", (float) ms.initSize / MB))); } return sb.toString(); } public String getDeltaString() { StringBuilder sb = new StringBuilder(); sb.append("\n\t [delta memory info] => \n"); sb.append("\t\t "); sb.append(String.format("%-30s | %-30s | %-30s | %-30s | %-30s \n", "NAME", "used_size", "used_percent", "max_used_size", "max_percent")); for (MemoryStatus ms : memoryStatusMap.values()) { sb.append("\t\t "); sb.append(String.format("%-30s | %-30s | %-30s | %-30s | %-30s \n", ms.name, String.format("%,.2f", (float) ms.usedSize / MB) + "MB", String.format("%,.2f", (float) ms.percent) + "%", String.format("%,.2f", (float) ms.maxUsedSize / MB) + "MB", String.format("%,.2f", (float) ms.maxpercent) + "%")); } return sb.toString(); } } private class GCStatus { String name; long maxDeltaGCCount = -1; long minDeltaGCCount = -1; long curDeltaGCCount; long totalGCCount = 0; long maxDeltaGCTime = -1; long minDeltaGCTime = -1; long curDeltaGCTime; long totalGCTime = 0; public void setCurTotalGcCount(long curTotalGcCount) { this.curDeltaGCCount = curTotalGcCount - totalGCCount; this.totalGCCount = curTotalGcCount; if (maxDeltaGCCount < curDeltaGCCount) { maxDeltaGCCount = curDeltaGCCount; } if (minDeltaGCCount == -1 || minDeltaGCCount > curDeltaGCCount) { minDeltaGCCount = curDeltaGCCount; } } public void setCurTotalGcTime(long curTotalGcTime) { this.curDeltaGCTime = curTotalGcTime - totalGCTime; this.totalGCTime = curTotalGcTime; if (maxDeltaGCTime < curDeltaGCTime) { maxDeltaGCTime = curDeltaGCTime; } if (minDeltaGCTime == -1 || minDeltaGCTime > curDeltaGCTime) { minDeltaGCTime = curDeltaGCTime; } } } private class MemoryStatus { String name; long initSize; long maxSize; long commitedSize; long usedSize; float percent; long maxUsedSize = -1; float maxpercent = 0; void setMaxMinUsedSize(long curUsedSize) { if (maxUsedSize < curUsedSize) { maxUsedSize = curUsedSize; } this.usedSize = curUsedSize; } void setMaxMinPercent(float curPercent) { if (maxpercent < curPercent) { maxpercent = curPercent; } this.percent = curPercent; } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/Configuration.java ================================================ package com.alibaba.datax.common.util; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.spi.ErrorCode; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONWriter; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.CharUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.builder.ToStringBuilder; import java.io.*; import java.util.*; /** * Configuration 提供多级JSON配置信息无损存储
*
*

* 实例代码:
*

* 获取job的配置信息
* Configuration configuration = Configuration.from(new File("Config.json"));
* String jobContainerClass = * configuration.getString("core.container.job.class");
*

*
* 设置多级List
* configuration.set("job.reader.parameter.jdbcUrl", Arrays.asList(new String[] * {"jdbc", "jdbc"})); *

*

*
*
* 合并Configuration:
* configuration.merge(another); *

*

*
*
*
*

* Configuration 存在两种较好地实现方式
* 第一种是将JSON配置信息中所有的Key全部打平,用a.b.c的级联方式作为Map的Key,内部使用一个Map保存信息
* 第二种是将JSON的对象直接使用结构化树形结构保存
*

* 目前使用的第二种实现方式,使用第一种的问题在于:
* 1. 插入新对象,比较难处理,例如a.b.c="bazhen",此时如果需要插入a="bazhen",也即是根目录下第一层所有类型全部要废弃 * ,使用"bazhen"作为value,第一种方式使用字符串表示key,难以处理这类问题。
* 2. 返回树形结构,例如 a.b.c.d = "bazhen",如果返回"a"下的所有元素,实际上是一个Map,需要合并处理
* 3. 输出JSON,将上述对象转为JSON,要把上述Map的多级key转为树形结构,并输出为JSON
*/ public class Configuration { /** * 对于加密的keyPath,需要记录下来 * 为的是后面分布式情况下将该值加密后抛到DataXServer中 */ private Set secretKeyPathSet = new HashSet(); private Object root = null; /** * 初始化空白的Configuration */ public static Configuration newDefault() { return Configuration.from("{}"); } /** * 从JSON字符串加载Configuration */ public static Configuration from(String json) { json = StrUtil.replaceVariable(json); checkJSON(json); try { return new Configuration(json); } catch (Exception e) { throw DataXException.asDataXException(CommonErrorCode.CONFIG_ERROR, e); } } /** * 从包括json的File对象加载Configuration */ public static Configuration from(File file) { try { return Configuration.from(IOUtils .toString(new FileInputStream(file))); } catch (FileNotFoundException e) { throw DataXException.asDataXException(CommonErrorCode.CONFIG_ERROR, String.format("配置信息错误,您提供的配置文件[%s]不存在. 请检查您的配置文件.", file.getAbsolutePath())); } catch (IOException e) { throw DataXException.asDataXException( CommonErrorCode.CONFIG_ERROR, String.format("配置信息错误. 您提供配置文件[%s]读取失败,错误原因: %s. 请检查您的配置文件的权限设置.", file.getAbsolutePath(), e)); } } /** * 从包括json的InputStream对象加载Configuration */ public static Configuration from(InputStream is) { try { return Configuration.from(IOUtils.toString(is)); } catch (IOException e) { throw DataXException.asDataXException(CommonErrorCode.CONFIG_ERROR, String.format("请检查您的配置文件. 您提供的配置文件读取失败,错误原因: %s. 请检查您的配置文件的权限设置.", e)); } } /** * 从Map对象加载Configuration */ public static Configuration from(final Map object) { return Configuration.from(Configuration.toJSONString(object)); } /** * 从List对象加载Configuration */ public static Configuration from(final List object) { return Configuration.from(Configuration.toJSONString(object)); } public String getNecessaryValue(String key, ErrorCode errorCode) { String value = this.getString(key, null); if (StringUtils.isBlank(value)) { throw DataXException.asDataXException(errorCode, String.format("您提供配置文件有误,[%s]是必填参数,不允许为空或者留白 .", key)); } return value; } public String getUnnecessaryValue(String key,String defaultValue,ErrorCode errorCode) { String value = this.getString(key, defaultValue); if (StringUtils.isBlank(value)) { value = defaultValue; } return value; } public Boolean getNecessaryBool(String key, ErrorCode errorCode) { Boolean value = this.getBool(key); if (value == null) { throw DataXException.asDataXException(errorCode, String.format("您提供配置文件有误,[%s]是必填参数,不允许为空或者留白 .", key)); } return value; } /** * 根据用户提供的json path,寻址具体的对象。 *

*
*

* NOTE: 目前仅支持Map以及List下标寻址, 例如: *

*
*

* 对于如下JSON *

* {"a": {"b": {"c": [0,1,2,3]}}} *

* config.get("") 返回整个Map
* config.get("a") 返回a下属整个Map
* config.get("a.b.c") 返回c对应的数组List
* config.get("a.b.c[0]") 返回数字0 * * @return Java表示的JSON对象,如果path不存在或者对象不存在,均返回null。 */ public Object get(final String path) { this.checkPath(path); try { return this.findObject(path); } catch (Exception e) { return null; } } /** * 用户指定部分path,获取Configuration的子集 *

*
* 如果path获取的路径或者对象不存在,返回null */ public Configuration getConfiguration(final String path) { Object object = this.get(path); if (null == object) { return null; } return Configuration.from(Configuration.toJSONString(object)); } /** * 根据用户提供的json path,寻址String对象 * * @return String对象,如果path不存在或者String不存在,返回null */ public String getString(final String path) { Object string = this.get(path); if (null == string) { return null; } return String.valueOf(string); } /** * 根据用户提供的json path,寻址String对象,如果对象不存在,返回默认字符串 * * @return String对象,如果path不存在或者String不存在,返回默认字符串 */ public String getString(final String path, final String defaultValue) { String result = this.getString(path); if (null == result) { return defaultValue; } return result; } /** * 根据用户提供的json path,寻址Character对象 * * @return Character对象,如果path不存在或者Character不存在,返回null */ public Character getChar(final String path) { String result = this.getString(path); if (null == result) { return null; } try { return CharUtils.toChar(result); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONFIG_ERROR, String.format("任务读取配置文件出错. 因为配置文件路径[%s] 值非法,期望是字符类型: %s. 请检查您的配置并作出修改.", path, e.getMessage())); } } /** * 根据用户提供的json path,寻址Boolean对象,如果对象不存在,返回默认Character对象 * * @return Character对象,如果path不存在或者Character不存在,返回默认Character对象 */ public Character getChar(final String path, char defaultValue) { Character result = this.getChar(path); if (null == result) { return defaultValue; } return result; } /** * 根据用户提供的json path,寻址Boolean对象 * * @return Boolean对象,如果path值非true,false ,将报错.特别注意:当 path 不存在时,会返回:null. */ public Boolean getBool(final String path) { String result = this.getString(path); if (null == result) { return null; } else if ("true".equalsIgnoreCase(result)) { return Boolean.TRUE; } else if ("false".equalsIgnoreCase(result)) { return Boolean.FALSE; } else { throw DataXException.asDataXException(CommonErrorCode.CONFIG_ERROR, String.format("您提供的配置信息有误,因为从[%s]获取的值[%s]无法转换为bool类型. 请检查源表的配置并且做出相应的修改.", path, result)); } } /** * 根据用户提供的json path,寻址Boolean对象,如果对象不存在,返回默认Boolean对象 * * @return Boolean对象,如果path不存在或者Boolean不存在,返回默认Boolean对象 */ public Boolean getBool(final String path, boolean defaultValue) { Boolean result = this.getBool(path); if (null == result) { return defaultValue; } return result; } /** * 根据用户提供的json path,寻址Integer对象 * * @return Integer对象,如果path不存在或者Integer不存在,返回null */ public Integer getInt(final String path) { String result = this.getString(path); if (null == result) { return null; } try { return Integer.valueOf(result); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONFIG_ERROR, String.format("任务读取配置文件出错. 配置文件路径[%s] 值非法, 期望是整数类型: %s. 请检查您的配置并作出修改.", path, e.getMessage())); } } /** * 根据用户提供的json path,寻址Integer对象,如果对象不存在,返回默认Integer对象 * * @return Integer对象,如果path不存在或者Integer不存在,返回默认Integer对象 */ public Integer getInt(final String path, int defaultValue) { Integer object = this.getInt(path); if (null == object) { return defaultValue; } return object; } /** * 根据用户提供的json path,寻址Long对象 * * @return Long对象,如果path不存在或者Long不存在,返回null */ public Long getLong(final String path) { String result = this.getString(path); if (StringUtils.isBlank(result)) { return null; } try { return Long.valueOf(result); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONFIG_ERROR, String.format("任务读取配置文件出错. 配置文件路径[%s] 值非法, 期望是整数类型: %s. 请检查您的配置并作出修改.", path, e.getMessage())); } } /** * 根据用户提供的json path,寻址Long对象,如果对象不存在,返回默认Long对象 * * @return Long对象,如果path不存在或者Integer不存在,返回默认Long对象 */ public Long getLong(final String path, long defaultValue) { Long result = this.getLong(path); if (null == result) { return defaultValue; } return result; } /** * 根据用户提供的json path,寻址Double对象 * * @return Double对象,如果path不存在或者Double不存在,返回null */ public Double getDouble(final String path) { String result = this.getString(path); if (StringUtils.isBlank(result)) { return null; } try { return Double.valueOf(result); } catch (Exception e) { throw DataXException.asDataXException( CommonErrorCode.CONFIG_ERROR, String.format("任务读取配置文件出错. 配置文件路径[%s] 值非法, 期望是浮点类型: %s. 请检查您的配置并作出修改.", path, e.getMessage())); } } /** * 根据用户提供的json path,寻址Double对象,如果对象不存在,返回默认Double对象 * * @return Double对象,如果path不存在或者Double不存在,返回默认Double对象 */ public Double getDouble(final String path, double defaultValue) { Double result = this.getDouble(path); if (null == result) { return defaultValue; } return result; } /** * 根据用户提供的json path,寻址List对象,如果对象不存在,返回null */ @SuppressWarnings("unchecked") public List getList(final String path) { List list = this.get(path, List.class); if (null == list) { return null; } return list; } public List getListWithJson(final String path, Class t) { Object object = this.get(path, List.class); if (null == object) { return null; } return JSON.parseArray(JSON.toJSONString(object),t); } /** * 根据用户提供的json path,寻址List对象,如果对象不存在,返回null */ @SuppressWarnings("unchecked") public List getList(final String path, Class t) { Object object = this.get(path, List.class); if (null == object) { return null; } List result = new ArrayList(); List origin = (List) object; for (final Object each : origin) { result.add((T) each); } return result; } /** * 根据用户提供的json path,寻址List对象,如果对象不存在,返回默认List */ @SuppressWarnings("unchecked") public List getList(final String path, final List defaultList) { Object object = this.getList(path); if (null == object) { return defaultList; } return (List) object; } /** * 根据用户提供的json path,寻址List对象,如果对象不存在,返回默认List */ public List getList(final String path, final List defaultList, Class t) { List list = this.getList(path, t); if (null == list) { return defaultList; } return list; } /** * 根据用户提供的json path,寻址包含Configuration的List,如果对象不存在,返回默认null */ public List getListConfiguration(final String path) { List lists = getList(path); if (lists == null) { return null; } List result = new ArrayList(); for (final Object object : lists) { result.add(Configuration.from(Configuration.toJSONString(object))); } return result; } /** * 根据用户提供的json path,寻址Map对象,如果对象不存在,返回null */ @SuppressWarnings("unchecked") public Map getMap(final String path) { Map result = this.get(path, Map.class); if (null == result) { return null; } return result; } /** * 根据用户提供的json path,寻址Map对象,如果对象不存在,返回null; */ @SuppressWarnings("unchecked") public Map getMap(final String path, Class t) { Map map = this.get(path, Map.class); if (null == map) { return null; } Map result = new HashMap(); for (final String key : map.keySet()) { result.put(key, (T) map.get(key)); } return result; } /** * 根据用户提供的json path,寻址Map对象,如果对象不存在,返回默认map */ @SuppressWarnings("unchecked") public Map getMap(final String path, final Map defaultMap) { Object object = this.getMap(path); if (null == object) { return defaultMap; } return (Map) object; } /** * 根据用户提供的json path,寻址Map对象,如果对象不存在,返回默认map */ public Map getMap(final String path, final Map defaultMap, Class t) { Map result = getMap(path, t); if (null == result) { return defaultMap; } return result; } /** * 根据用户提供的json path,寻址包含Configuration的Map,如果对象不存在,返回默认null */ @SuppressWarnings("unchecked") public Map getMapConfiguration(final String path) { Map map = this.get(path, Map.class); if (null == map) { return null; } Map result = new HashMap(); for (final String key : map.keySet()) { result.put(key, Configuration.from(Configuration.toJSONString(map .get(key)))); } return result; } /** * 根据用户提供的json path,寻址具体的对象,并转为用户提供的类型 *

*
*

* NOTE: 目前仅支持Map以及List下标寻址, 例如: *

*
*

* 对于如下JSON *

* {"a": {"b": {"c": [0,1,2,3]}}} *

* config.get("") 返回整个Map
* config.get("a") 返回a下属整个Map
* config.get("a.b.c") 返回c对应的数组List
* config.get("a.b.c[0]") 返回数字0 * * @return Java表示的JSON对象,如果转型失败,将抛出异常 */ @SuppressWarnings("unchecked") public T get(final String path, Class clazz) { this.checkPath(path); return (T) this.get(path); } /** * 格式化Configuration输出 */ public String beautify() { return JSON.toJSONString(this.getInternal(), JSONWriter.Feature.PrettyFormat); } /** * 根据用户提供的json path,插入指定对象,并返回之前存在的对象(如果存在) *

*
*

* 目前仅支持.以及数组下标寻址, 例如: *

*
*

* config.set("a.b.c[3]", object); *

*
* 对于插入对象,Configuration不做任何限制,但是请务必保证该对象是简单对象(包括Map、List),不要使用自定义对象,否则后续对于JSON序列化等情况会出现未定义行为。 * * @param path * JSON path对象 * @param object * 需要插入的对象 * @return Java表示的JSON对象 */ public Object set(final String path, final Object object) { checkPath(path); Object result = this.get(path); setObject(path, extractConfiguration(object)); return result; } /** * 获取Configuration下所有叶子节点的key *

*
*

* 对于
*

* {"a": {"b": {"c": [0,1,2,3]}}, "x": "y"} *

* 下属的key包括: a.b.c[0],a.b.c[1],a.b.c[2],a.b.c[3],x */ public Set getKeys() { Set collect = new HashSet(); this.getKeysRecursive(this.getInternal(), "", collect); return collect; } /** * 删除path对应的值,如果path不存在,将抛出异常。 */ public Object remove(final String path) { final Object result = this.get(path); if (null == result) { throw DataXException.asDataXException( CommonErrorCode.RUNTIME_ERROR, String.format("配置文件对应Key[%s]并不存在,该情况是代码编程错误. 请联系DataX团队的同学.", path)); } this.set(path, null); return result; } /** * 合并其他Configuration,并修改两者冲突的KV配置 * * @param another * 合并加入的第三方Configuration * @param updateWhenConflict * 当合并双方出现KV冲突时候,选择更新当前KV,或者忽略该KV * @return 返回合并后对象 */ public Configuration merge(final Configuration another, boolean updateWhenConflict) { Set keys = another.getKeys(); for (final String key : keys) { // 如果使用更新策略,凡是another存在的key,均需要更新 if (updateWhenConflict) { this.set(key, another.get(key)); continue; } // 使用忽略策略,只有another Configuration存在但是当前Configuration不存在的key,才需要更新 boolean isCurrentExists = this.get(key) != null; if (isCurrentExists) { continue; } this.set(key, another.get(key)); } return this; } @Override public String toString() { return this.toJSON(); } /** * 将Configuration作为JSON输出 */ public String toJSON() { return Configuration.toJSONString(this.getInternal()); } /** * 拷贝当前Configuration,注意,这里使用了深拷贝,避免冲突 */ public Configuration clone() { Configuration config = Configuration .from(Configuration.toJSONString(this.getInternal())); config.addSecretKeyPath(this.secretKeyPathSet); return config; } /** * 按照configuration要求格式的path * 比如: * a.b.c * a.b[2].c * @param path */ public void addSecretKeyPath(String path) { if(StringUtils.isNotBlank(path)) { this.secretKeyPathSet.add(path); } } public void addSecretKeyPath(Set pathSet) { if(pathSet != null) { this.secretKeyPathSet.addAll(pathSet); } } public void setSecretKeyPathSet(Set keyPathSet) { if(keyPathSet != null) { this.secretKeyPathSet = keyPathSet; } } public boolean isSecretPath(String path) { return this.secretKeyPathSet.contains(path); } @SuppressWarnings("unchecked") void getKeysRecursive(final Object current, String path, Set collect) { boolean isRegularElement = !(current instanceof Map || current instanceof List); if (isRegularElement) { collect.add(path); return; } boolean isMap = current instanceof Map; if (isMap) { Map mapping = ((Map) current); for (final String key : mapping.keySet()) { if (StringUtils.isBlank(path)) { getKeysRecursive(mapping.get(key), key.trim(), collect); } else { getKeysRecursive(mapping.get(key), path + "." + key.trim(), collect); } } return; } boolean isList = current instanceof List; if (isList) { List lists = (List) current; for (int i = 0; i < lists.size(); i++) { getKeysRecursive(lists.get(i), path + String.format("[%d]", i), collect); } return; } return; } public Object getInternal() { return this.root; } private void setObject(final String path, final Object object) { Object newRoot = setObjectRecursive(this.root, split2List(path), 0, object); if (isSuitForRoot(newRoot)) { this.root = newRoot; return; } throw DataXException.asDataXException(CommonErrorCode.RUNTIME_ERROR, String.format("值[%s]无法适配您提供[%s], 该异常代表系统编程错误, 请联系DataX开发团队!", ToStringBuilder.reflectionToString(object), path)); } @SuppressWarnings("unchecked") private Object extractConfiguration(final Object object) { if (object instanceof Configuration) { return extractFromConfiguration(object); } if (object instanceof List) { List result = new ArrayList(); for (final Object each : (List) object) { result.add(extractFromConfiguration(each)); } return result; } if (object instanceof Map) { Map result = new HashMap(); for (final String key : ((Map) object).keySet()) { result.put(key, extractFromConfiguration(((Map) object) .get(key))); } return result; } return object; } private Object extractFromConfiguration(final Object object) { if (object instanceof Configuration) { return ((Configuration) object).getInternal(); } return object; } Object buildObject(final List paths, final Object object) { if (null == paths) { throw DataXException.asDataXException( CommonErrorCode.RUNTIME_ERROR, "Path不能为null,该异常代表系统编程错误, 请联系DataX开发团队 !"); } if (1 == paths.size() && StringUtils.isBlank(paths.get(0))) { return object; } Object child = object; for (int i = paths.size() - 1; i >= 0; i--) { String path = paths.get(i); if (isPathMap(path)) { Map mapping = new HashMap(); mapping.put(path, child); child = mapping; continue; } if (isPathList(path)) { List lists = new ArrayList( this.getIndex(path) + 1); expand(lists, this.getIndex(path) + 1); lists.set(this.getIndex(path), child); child = lists; continue; } throw DataXException.asDataXException( CommonErrorCode.RUNTIME_ERROR, String.format( "路径[%s]出现非法值类型[%s],该异常代表系统编程错误, 请联系DataX开发团队! .", StringUtils.join(paths, "."), path)); } return child; } @SuppressWarnings("unchecked") Object setObjectRecursive(Object current, final List paths, int index, final Object value) { // 如果是已经超出path,我们就返回value即可,作为最底层叶子节点 boolean isLastIndex = index == paths.size(); if (isLastIndex) { return value; } String path = paths.get(index).trim(); boolean isNeedMap = isPathMap(path); if (isNeedMap) { Map mapping; // 当前不是map,因此全部替换为map,并返回新建的map对象 boolean isCurrentMap = current instanceof Map; if (!isCurrentMap) { mapping = new HashMap(); mapping.put( path, buildObject(paths.subList(index + 1, paths.size()), value)); return mapping; } // 当前是map,但是没有对应的key,也就是我们需要新建对象插入该map,并返回该map mapping = ((Map) current); boolean hasSameKey = mapping.containsKey(path); if (!hasSameKey) { mapping.put( path, buildObject(paths.subList(index + 1, paths.size()), value)); return mapping; } // 当前是map,而且还竟然存在这个值,好吧,继续递归遍历 current = mapping.get(path); mapping.put(path, setObjectRecursive(current, paths, index + 1, value)); return mapping; } boolean isNeedList = isPathList(path); if (isNeedList) { List lists; int listIndexer = getIndex(path); // 当前是list,直接新建并返回即可 boolean isCurrentList = current instanceof List; if (!isCurrentList) { lists = expand(new ArrayList(), listIndexer + 1); lists.set( listIndexer, buildObject(paths.subList(index + 1, paths.size()), value)); return lists; } // 当前是list,但是对应的indexer是没有具体的值,也就是我们新建对象然后插入到该list,并返回该List lists = (List) current; lists = expand(lists, listIndexer + 1); boolean hasSameIndex = lists.get(listIndexer) != null; if (!hasSameIndex) { lists.set( listIndexer, buildObject(paths.subList(index + 1, paths.size()), value)); return lists; } // 当前是list,并且存在对应的index,没有办法继续递归寻找 current = lists.get(listIndexer); lists.set(listIndexer, setObjectRecursive(current, paths, index + 1, value)); return lists; } throw DataXException.asDataXException(CommonErrorCode.RUNTIME_ERROR, "该异常代表系统编程错误, 请联系DataX开发团队 !"); } private Object findObject(final String path) { boolean isRootQuery = StringUtils.isBlank(path); if (isRootQuery) { return this.root; } Object target = this.root; for (final String each : split2List(path)) { if (isPathMap(each)) { target = findObjectInMap(target, each); continue; } else { target = findObjectInList(target, each); continue; } } return target; } @SuppressWarnings("unchecked") private Object findObjectInMap(final Object target, final String index) { boolean isMap = (target instanceof Map); if (!isMap) { throw new IllegalArgumentException(String.format( "您提供的配置文件有误. 路径[%s]需要配置Json格式的Map对象,但该节点发现实际类型是[%s]. 请检查您的配置并作出修改.", index, target.getClass().toString())); } Object result = ((Map) target).get(index); if (null == result) { throw new IllegalArgumentException(String.format( "您提供的配置文件有误. 路径[%s]值为null,datax无法识别该配置. 请检查您的配置并作出修改.", index)); } return result; } @SuppressWarnings({ "unchecked" }) private Object findObjectInList(final Object target, final String each) { boolean isList = (target instanceof List); if (!isList) { throw new IllegalArgumentException(String.format( "您提供的配置文件有误. 路径[%s]需要配置Json格式的Map对象,但该节点发现实际类型是[%s]. 请检查您的配置并作出修改.", each, target.getClass().toString())); } String index = each.replace("[", "").replace("]", ""); if (!StringUtils.isNumeric(index)) { throw new IllegalArgumentException( String.format( "系统编程错误,列表下标必须为数字类型,但该节点发现实际类型是[%s] ,该异常代表系统编程错误, 请联系DataX开发团队 !", index)); } return ((List) target).get(Integer.valueOf(index)); } private List expand(List list, int size) { int expand = size - list.size(); while (expand-- > 0) { list.add(null); } return list; } private boolean isPathList(final String path) { return path.contains("[") && path.contains("]"); } private boolean isPathMap(final String path) { return StringUtils.isNotBlank(path) && !isPathList(path); } private int getIndex(final String index) { return Integer.valueOf(index.replace("[", "").replace("]", "")); } private boolean isSuitForRoot(final Object object) { if (null != object && (object instanceof List || object instanceof Map)) { return true; } return false; } private String split(final String path) { return StringUtils.replace(path, "[", ".["); } private List split2List(final String path) { return Arrays.asList(StringUtils.split(split(path), ".")); } private void checkPath(final String path) { if (null == path) { throw new IllegalArgumentException( "系统编程错误, 该异常代表系统编程错误, 请联系DataX开发团队!."); } for (final String each : StringUtils.split(path, ".")) { if (StringUtils.isBlank(each)) { throw new IllegalArgumentException(String.format( "系统编程错误, 路径[%s]不合法, 路径层次之间不能出现空白字符 .", path)); } } } @SuppressWarnings("unused") private String toJSONPath(final String path) { return (StringUtils.isBlank(path) ? "$" : "$." + path).replace("$.[", "$["); } private static void checkJSON(final String json) { if (StringUtils.isBlank(json)) { throw DataXException.asDataXException(CommonErrorCode.CONFIG_ERROR, "配置信息错误. 因为您提供的配置信息不是合法的JSON格式, JSON不能为空白. 请按照标准json格式提供配置信息. "); } } private Configuration(final String json) { try { this.root = JSON.parse(json); } catch (Exception e) { throw DataXException.asDataXException(CommonErrorCode.CONFIG_ERROR, String.format("配置信息错误. 您提供的配置信息不是合法的JSON格式: %s . 请按照标准json格式提供配置信息. ", e.getMessage())); } } private static String toJSONString(final Object object) { return JSON.toJSONString(object); } public Set getSecretKeyPathSet() { return secretKeyPathSet; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/ConfigurationUtil.java ================================================ package com.alibaba.datax.common.util; import java.util.Arrays; import java.util.List; import java.util.Set; import org.apache.commons.lang3.StringUtils; public class ConfigurationUtil { private static final List SENSITIVE_KEYS = Arrays.asList("password", "accessKey", "securityToken", "AccessKeyId", "AccessKeySecert", "AccessKeySecret", "clientPassword"); public static Configuration filterSensitive(Configuration origin) { // shell 任务configuration metric 可能为null。 if (origin == null) { return origin; } // 确保不影响入参的对象 Configuration configuration = origin.clone(); Set keys = configuration.getKeys(); for (final String key : keys) { boolean isSensitive = false; for (String sensitiveKey : SENSITIVE_KEYS) { if (StringUtils.endsWithIgnoreCase(key, sensitiveKey)) { isSensitive = true; break; } } if (isSensitive && configuration.get(key) instanceof String) { configuration.set(key, configuration.getString(key).replaceAll(".", "*")); } } return configuration; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/DESCipher.java ================================================ /** * (C) 2010-2022 Alibaba Group Holding Limited. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.alibaba.datax.common.util; import javax.crypto.Cipher; import javax.crypto.SecretKey; import javax.crypto.SecretKeyFactory; import javax.crypto.spec.DESKeySpec; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.security.SecureRandom; /** * DES加解密,支持与delphi交互(字符串编码需统一为UTF-8) * 将这个工具类抽取到 common 中,方便后续代码复用 */ public class DESCipher { private static Logger LOGGER = LoggerFactory.getLogger(DESCipher.class); /** * 密钥 */ public static final String KEY = ""; private final static String DES = "DES"; /** * 加密 * @param src 明文(字节) * @param key 密钥,长度必须是8的倍数 * @return 密文(字节) * @throws Exception */ public static byte[] encrypt(byte[] src, byte[] key) throws Exception { // DES算法要求有一个可信任的随机数源 SecureRandom sr = new SecureRandom(); // 从原始密匙数据创建DESKeySpec对象 DESKeySpec dks = new DESKeySpec(key); // 创建一个密匙工厂,然后用它把DESKeySpec转换成 // 一个SecretKey对象 SecretKeyFactory keyFactory = SecretKeyFactory.getInstance(DES); SecretKey securekey = keyFactory.generateSecret(dks); // Cipher对象实际完成加密操作 Cipher cipher = Cipher.getInstance(DES); // 用密匙初始化Cipher对象 cipher.init(Cipher.ENCRYPT_MODE, securekey, sr); // 现在,获取数据并加密 // 正式执行加密操作 return cipher.doFinal(src); } /** * * 解密 * * @param src * * 密文(字节) * * @param key * * 密钥,长度必须是8的倍数 * * @return 明文(字节) * * @throws Exception */ public static byte[] decrypt(byte[] src, byte[] key) throws Exception { // DES算法要求有一个可信任的随机数源 SecureRandom sr = new SecureRandom(); // 从原始密匙数据创建一个DESKeySpec对象 DESKeySpec dks = new DESKeySpec(key); // 创建一个密匙工厂,然后用它把DESKeySpec对象转换成 // 一个SecretKey对象 SecretKeyFactory keyFactory = SecretKeyFactory.getInstance(DES); SecretKey securekey = keyFactory.generateSecret(dks); // Cipher对象实际完成解密操作 Cipher cipher = Cipher.getInstance(DES); // 用密匙初始化Cipher对象 cipher.init(Cipher.DECRYPT_MODE, securekey, sr); // 现在,获取数据并解密 // 正式执行解密操作 return cipher.doFinal(src); } /** * 加密 * @param src * 明文(字节) * @return 密文(字节) * @throws Exception */ public static byte[] encrypt(byte[] src) throws Exception { return encrypt(src, KEY.getBytes()); } /** * 解密 * @param src 密文(字节) * @return 明文(字节) * @throws Exception */ public static byte[] decrypt(byte[] src) throws Exception { return decrypt(src, KEY.getBytes()); } /** * 加密 * @param src 明文(字符串) * @return 密文(16进制字符串) * @throws Exception */ public final static String encrypt(String src) { try { return byte2hex(encrypt(src.getBytes(), KEY.getBytes())); } catch (Exception e) { LOGGER.warn(e.getMessage(), e); } return null; } /** * 加密 * @param src 明文(字符串) * @param encryptKey 加密用的秘钥 * @return 密文(16进制字符串) * @throws Exception */ public final static String encrypt(String src, String encryptKey) { try { return byte2hex(encrypt(src.getBytes(), encryptKey.getBytes())); } catch (Exception e) { LOGGER.warn(e.getMessage(), e); } return null; } /** * 解密 * @param src 密文(字符串) * @return 明文(字符串) * @throws Exception */ public final static String decrypt(String src) { try { return new String(decrypt(hex2byte(src.getBytes()), KEY.getBytes())); } catch (Exception e) { LOGGER.warn(e.getMessage(), e); } return null; } /** * 解密 * @param src 密文(字符串) * @param decryptKey 解密用的秘钥 * @return 明文(字符串) * @throws Exception */ public final static String decrypt(String src, String decryptKey) { try { return new String(decrypt(hex2byte(src.getBytes()), decryptKey.getBytes())); } catch (Exception e) { LOGGER.warn(e.getMessage(), e); } return null; } /** * 加密 * @param src * 明文(字节) * @return 密文(16进制字符串) * @throws Exception */ public static String encryptToString(byte[] src) throws Exception { return encrypt(new String(src)); } /** * 解密 * @param src 密文(字节) * @return 明文(字符串) * @throws Exception */ public static String decryptToString(byte[] src) throws Exception { return decrypt(new String(src)); } public static String byte2hex(byte[] b) { String hs = ""; String stmp = ""; for (int n = 0; n < b.length; n++) { stmp = (Integer.toHexString(b[n] & 0XFF)); if (stmp.length() == 1) hs = hs + "0" + stmp; else hs = hs + stmp; } return hs.toUpperCase(); } public static byte[] hex2byte(byte[] b) { if ((b.length % 2) != 0) throw new IllegalArgumentException("The length is not an even number"); byte[] b2 = new byte[b.length / 2]; for (int n = 0; n < b.length; n += 2) { String item = new String(b, n, 2); b2[n / 2] = (byte) Integer.parseInt(item, 16); } return b2; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/DataXCaseEnvUtil.java ================================================ package com.alibaba.datax.common.util; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class DataXCaseEnvUtil { private static final Logger LOGGER = LoggerFactory.getLogger(DataXCaseEnvUtil.class); // datax回归测试效率提升 private static String DATAX_AUTOTEST_RETRY_TIME = System.getenv("DATAX_AUTOTEST_RETRY_TIME"); private static String DATAX_AUTOTEST_RETRY_INTERVAL = System.getenv("DATAX_AUTOTEST_RETRY_INTERVAL"); private static String DATAX_AUTOTEST_RETRY_EXPONENTIAL = System.getenv("DATAX_AUTOTEST_RETRY_EXPONENTIAL"); public static int getRetryTimes(int retryTimes) { int actualRetryTimes = DATAX_AUTOTEST_RETRY_TIME != null ? Integer.valueOf(DATAX_AUTOTEST_RETRY_TIME) : retryTimes; // LOGGER.info("The actualRetryTimes is {}", actualRetryTimes); return actualRetryTimes; } public static long getRetryInterval(long retryInterval) { long actualRetryInterval = DATAX_AUTOTEST_RETRY_INTERVAL != null ? Long.valueOf(DATAX_AUTOTEST_RETRY_INTERVAL) : retryInterval; // LOGGER.info("The actualRetryInterval is {}", actualRetryInterval); return actualRetryInterval; } public static boolean getRetryExponential(boolean retryExponential) { boolean actualRetryExponential = DATAX_AUTOTEST_RETRY_EXPONENTIAL != null ? Boolean.valueOf(DATAX_AUTOTEST_RETRY_EXPONENTIAL) : retryExponential; // LOGGER.info("The actualRetryExponential is {}", actualRetryExponential); return actualRetryExponential; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/FilterUtil.java ================================================ package com.alibaba.datax.common.util; import java.util.*; import java.util.regex.Pattern; /** * 提供从 List 中根据 regular 过滤的通用工具(返回值已经去重). 使用场景,比如:odpsreader * 的分区筛选,hdfsreader/txtfilereader的路径筛选等 */ public final class FilterUtil { //已经去重 public static List filterByRegular(List allStrs, String regular) { List matchedValues = new ArrayList(); // 语法习惯上的兼容处理(pt=* 实际正则应该是:pt=.*) String newReqular = regular.replace(".*", "*").replace("*", ".*"); Pattern p = Pattern.compile(newReqular); for (String partition : allStrs) { if (p.matcher(partition).matches()) { if (!matchedValues.contains(partition)) { matchedValues.add(partition); } } } return matchedValues; } //已经去重 public static List filterByRegulars(List allStrs, List regulars) { List matchedValues = new ArrayList(); List tempMatched = null; for (String regular : regulars) { tempMatched = filterByRegular(allStrs, regular); if (null != tempMatched && !tempMatched.isEmpty()) { for (String temp : tempMatched) { if (!matchedValues.contains(temp)) { matchedValues.add(temp); } } } } return matchedValues; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/HostUtils.java ================================================ package com.alibaba.datax.common.util; import org.apache.commons.io.IOUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.net.InetAddress; import java.net.UnknownHostException; /** * Created by liqiang on 15/8/25. */ public class HostUtils { public static final String IP; public static final String HOSTNAME; private static final Logger log = LoggerFactory.getLogger(HostUtils.class); static { String ip; String hostname; try { InetAddress addr = InetAddress.getLocalHost(); ip = addr.getHostAddress(); hostname = addr.getHostName(); } catch (UnknownHostException e) { log.error("Can't find out address: " + e.getMessage()); ip = "UNKNOWN"; hostname = "UNKNOWN"; } if (ip.equals("127.0.0.1") || ip.equals("::1") || ip.equals("UNKNOWN")) { try { Process process = Runtime.getRuntime().exec("hostname -i"); if (process.waitFor() == 0) { ip = new String(IOUtils.toByteArray(process.getInputStream()), "UTF8"); } process = Runtime.getRuntime().exec("hostname"); if (process.waitFor() == 0) { hostname = (new String(IOUtils.toByteArray(process.getInputStream()), "UTF8")).trim(); } } catch (Exception e) { log.warn("get hostname failed {}", e.getMessage()); } } IP = ip; HOSTNAME = hostname; log.info("IP {} HOSTNAME {}", IP, HOSTNAME); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/LimitLogger.java ================================================ package com.alibaba.datax.common.util; import org.apache.commons.lang3.StringUtils; import java.util.HashMap; import java.util.Map; /** * @author jitongchen * @date 2023/9/7 9:47 AM */ public class LimitLogger { private static Map lastPrintTime = new HashMap<>(); public static void limit(String name, long limit, LoggerFunction function) { if (StringUtils.isBlank(name)) { name = "__all__"; } if (limit <= 0) { function.apply(); } else { if (!lastPrintTime.containsKey(name)) { lastPrintTime.put(name, System.currentTimeMillis()); function.apply(); } else { if (System.currentTimeMillis() > lastPrintTime.get(name) + limit) { lastPrintTime.put(name, System.currentTimeMillis()); function.apply(); } } } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/ListUtil.java ================================================ package com.alibaba.datax.common.util; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang3.StringUtils; import java.util.ArrayList; import java.util.Collections; import java.util.HashSet; import java.util.List; /** * 提供针对 DataX中使用的 List 较为常见的一些封装。 比如:checkIfValueDuplicate 可以用于检查用户配置的 writer * 的列不能重复。makeSureNoValueDuplicate亦然,只是会严格报错。 */ public final class ListUtil { public static boolean checkIfValueDuplicate(List aList, boolean caseSensitive) { if (null == aList || aList.isEmpty()) { throw DataXException.asDataXException(CommonErrorCode.CONFIG_ERROR, "您提供的作业配置有误,List不能为空."); } try { makeSureNoValueDuplicate(aList, caseSensitive); } catch (Exception e) { return true; } return false; } public static void makeSureNoValueDuplicate(List aList, boolean caseSensitive) { if (null == aList || aList.isEmpty()) { throw new IllegalArgumentException("您提供的作业配置有误, List不能为空."); } if (1 == aList.size()) { return; } else { List list = null; if (!caseSensitive) { list = valueToLowerCase(aList); } else { list = new ArrayList(aList); } Collections.sort(list); for (int i = 0, len = list.size() - 1; i < len; i++) { if (list.get(i).equals(list.get(i + 1))) { throw DataXException .asDataXException( CommonErrorCode.CONFIG_ERROR, String.format( "您提供的作业配置信息有误, String:[%s] 不允许重复出现在列表中: [%s].", list.get(i), StringUtils.join(aList, ","))); } } } } public static boolean checkIfBInA(List aList, List bList, boolean caseSensitive) { if (null == aList || aList.isEmpty() || null == bList || bList.isEmpty()) { throw new IllegalArgumentException("您提供的作业配置有误, List不能为空."); } try { makeSureBInA(aList, bList, caseSensitive); } catch (Exception e) { return false; } return true; } public static void makeSureBInA(List aList, List bList, boolean caseSensitive) { if (null == aList || aList.isEmpty() || null == bList || bList.isEmpty()) { throw new IllegalArgumentException("您提供的作业配置有误, List不能为空."); } List all = null; List part = null; if (!caseSensitive) { all = valueToLowerCase(aList); part = valueToLowerCase(bList); } else { all = new ArrayList(aList); part = new ArrayList(bList); } for (String oneValue : part) { if (!all.contains(oneValue)) { throw DataXException .asDataXException( CommonErrorCode.CONFIG_ERROR, String.format( "您提供的作业配置信息有误, String:[%s] 不存在于列表中:[%s].", oneValue, StringUtils.join(aList, ","))); } } } public static boolean checkIfValueSame(List aList) { if (null == aList || aList.isEmpty()) { throw new IllegalArgumentException("您提供的作业配置有误, List不能为空."); } if (1 == aList.size()) { return true; } else { Boolean firstValue = aList.get(0); for (int i = 1, len = aList.size(); i < len; i++) { if (firstValue.booleanValue() != aList.get(i).booleanValue()) { return false; } } return true; } } public static List valueToLowerCase(List aList) { if (null == aList || aList.isEmpty()) { throw new IllegalArgumentException("您提供的作业配置有误, List不能为空."); } List result = new ArrayList(aList.size()); for (String oneValue : aList) { result.add(null != oneValue ? oneValue.toLowerCase() : null); } return result; } public static Boolean checkIfHasSameValue(List listA, List listB) { if (null == listA || listA.isEmpty() || null == listB || listB.isEmpty()) { return false; } for (String oneValue : listA) { if (listB.contains(oneValue)) { return true; } } return false; } public static boolean checkIfAllSameValue(List listA, List listB) { if (null == listA || listA.isEmpty() || null == listB || listB.isEmpty() || listA.size() != listB.size()) { return false; } return new HashSet<>(listA).containsAll(new HashSet<>(listB)); } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/LocalStrings.properties ================================================ very_like_yixiao=\u4e00{0}\u4e8c{1}\u4e09 configuration.1=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef\uff0c\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6[{0}]\u4e0d\u5b58\u5728. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. configuration.2=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6[{0}]\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.3=\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {0}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.4=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.5=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.6=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u56e0\u4e3a\u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5\uff0c\u671f\u671b\u662f\u5b57\u7b26\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.7=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u6709\u8bef\uff0c\u56e0\u4e3a\u4ece[{0}]\u83b7\u53d6\u7684\u503c[{1}]\u65e0\u6cd5\u8f6c\u6362\u4e3abool\u7c7b\u578b. \u8bf7\u68c0\u67e5\u6e90\u8868\u7684\u914d\u7f6e\u5e76\u4e14\u505a\u51fa\u76f8\u5e94\u7684\u4fee\u6539. configuration.8=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.9=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.10=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6d6e\u70b9\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.11=\u914d\u7f6e\u6587\u4ef6\u5bf9\u5e94Key[{0}]\u5e76\u4e0d\u5b58\u5728\uff0c\u8be5\u60c5\u51b5\u662f\u4ee3\u7801\u7f16\u7a0b\u9519\u8bef. \u8bf7\u8054\u7cfbDataX\u56e2\u961f\u7684\u540c\u5b66. configuration.12=\u503c[{0}]\u65e0\u6cd5\u9002\u914d\u60a8\u63d0\u4f9b[{1}]\uff0c \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! configuration.13=Path\u4e0d\u80fd\u4e3anull\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.14=\u8def\u5f84[{0}]\u51fa\u73b0\u975e\u6cd5\u503c\u7c7b\u578b[{1}]\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! . configuration.15=\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.16=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.17=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u503c\u4e3anull\uff0cdatax\u65e0\u6cd5\u8bc6\u522b\u8be5\u914d\u7f6e. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.18=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.19=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef\uff0c\u5217\u8868\u4e0b\u6807\u5fc5\u987b\u4e3a\u6570\u5b57\u7c7b\u578b\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{0}] \uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.20=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f!. configuration.21=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8def\u5f84[{0}]\u4e0d\u5408\u6cd5, \u8def\u5f84\u5c42\u6b21\u4e4b\u95f4\u4e0d\u80fd\u51fa\u73b0\u7a7a\u767d\u5b57\u7b26 . configuration.22=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u56e0\u4e3a\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f, JSON\u4e0d\u80fd\u4e3a\u7a7a\u767d. \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. configuration.23=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f: {0} . \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. listutil.1=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef\uff0cList\u4e0d\u80fd\u4e3a\u7a7a. listutil.2=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.3=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5141\u8bb8\u91cd\u590d\u51fa\u73b0\u5728\u5217\u8868\u4e2d: [{1}]. listutil.4=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.5=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.6=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5b58\u5728\u4e8e\u5217\u8868\u4e2d:[{1}]. listutil.7=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.8=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.1=\u5207\u5206\u4efd\u6570\u4e0d\u80fd\u5c0f\u4e8e1. \u6b64\u5904:expectSliceNumber=[{0}]. rangesplitutil.2=\u5bf9 BigInteger \u8fdb\u884c\u5207\u5206\u65f6\uff0c\u5176\u5de6\u53f3\u533a\u95f4\u4e0d\u80fd\u4e3a null. \u6b64\u5904:left=[{0}],right=[{1}]. rangesplitutil.3=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.4=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. rangesplitutil.5=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.6=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. retryutil.1=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2callable\u4e0d\u80fd\u4e3a\u7a7a ! retryutil.2=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2retrytime[%d]\u4e0d\u80fd\u5c0f\u4e8e1 ! retryutil.3=Exception when calling callable, \u5f02\u5e38Msg:{0} retryutil.4=Exception when calling callable, \u5373\u5c06\u5c1d\u8bd5\u6267\u884c\u7b2c{0}\u6b21\u91cd\u8bd5,\u5171\u8ba1\u91cd\u8bd5{1}\u6b21.\u672c\u6b21\u91cd\u8bd5\u8ba1\u5212\u7b49\u5f85[{2}]ms,\u5b9e\u9645\u7b49\u5f85[{3}]ms, \u5f02\u5e38Msg:[{4}] httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1}, STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/LocalStrings_en_US.properties ================================================ very_like_yixiao=1{0}2{1}3 configuration.1=Configuration information error. The configuration file [{0}] you provided does not exist. Please check your configuration files. configuration.2=Configuration information error. Failed to read the configuration file [{0}] you provided. Error reason: {1}. Please check the permission settings of your configuration files. configuration.3=Please check your configuration files. Failed to read the configuration file you provided. Error reason: {0}. Please check the permission settings of your configuration files. configuration.4=The configuration file you provided contains errors. [{0}] is a required parameter and cannot be empty or blank. configuration.5=The configuration file you provided contains errors. [{0}] is a required parameter and cannot be empty or blank. configuration.6=Task reading configuration file error. Invalid configuration file path [{0}] value. The expected value should be of the character type: {1}. Please check your configuration and make corrections. configuration.7=The configuration information you provided contains errors. The value [{1}] obtained from [{0}] cannot be converted to the Bool type. Please check the source table configuration and make corrections. configuration.8=Task reading configuration file error. Invalid configuration file path [{0}] value. The expected value should be of the integer type: {1}. Please check your configuration and make corrections. configuration.9=Task reading configuration file error. Invalid configuration file path [{0}] value. The expected value should be of the integer type: {1}. Please check your configuration and make corrections. configuration.10=Task reading configuration file error. Invalid configuration file path [{0}] value. The expected value should be of the floating-point type: {1}. Please check your configuration and make corrections. configuration.11=The Key [{0}] for the configuration file does not exist. This is a code programming error. Please contact the DataX team. configuration.12=The value [{0}] cannot adapt to the [{1}] you provided. This exception represents a system programming error. Please contact the DataX developer team. configuration.13=The path cannot be null. This exception represents a system programming error. Please contact the DataX developer team. configuration.14=The path [{0}] has an invalid value type [{1}]. This exception represents a system programming error. Please contact the DataX developer team. configuration.15=This exception represents a system programming error. Please contact the DataX developer team. configuration.16=The configuration file you provided contains errors. The path [{0}] requires you to configure a Map object in JSON format, but the actual type found on the node is [{1}]. Please check your configuration and make corrections. configuration.17=The configuration file you provided contains errors. The value of the path [{0}] is null and DataX cannot recognize the configuration. Please check your configuration and make corrections. configuration.18=The configuration file you provided contains errors. The path [{0}] requires you to configure a Map object in JSON format, but the actual type found on the node is [{1}]. Please check your configuration and make corrections. configuration.19=System programming error. The list subscript must be of the numeric type, but the actual type found on this node is [{0}]. This exception represents a system programming error. Please contact the DataX developer team. configuration.20=System programming error. This exception represents a system programming error. Please contact the DataX developer team. configuration.21=System programming error. Invalid path [{0}]. No spaces are allowed between path layers. configuration.22=Configuration information error. The configuration information you provided is not in a legal JSON format. JSON cannot be blank. Please provide the configuration information in the standard JSON format. configuration.23=Configuration information error. The configuration information you provided is not in a valid JSON format: {0}. Please provide the configuration information in the standard JSON format. listutil.1=The job configuration you provided contains errors. The list cannot be empty. listutil.2=The job configuration you provided contains errors. The list cannot be empty. listutil.3=The job configuration information you provided contains errors. String: [{0}] is not allowed to be repeated in the list: [{1}]. listutil.4=The job configuration you provided contains errors. The list cannot be empty. listutil.5=The job configuration you provided contains errors. The list cannot be empty. listutil.6=The job configuration information you provided contains errors. String: [{0}] does not exist in the list: [{1}]. listutil.7=The job configuration you provided contains errors. The list cannot be empty. listutil.8=The job configuration you provided contains errors. The list cannot be empty. rangesplitutil.1=The slice number cannot be less than 1. Here: [expectSliceNumber]=[{0}]. rangesplitutil.2=The left or right intervals of BigInteger character strings cannot be null when they are sliced. Here: [left]=[{0}], [right]=[{1}]. rangesplitutil.3=The [bigInteger] parameter cannot be null. rangesplitutil.4=Only ASCII character strings are supported for character string slicing, but the [{0}] character string is not of the ASCII type. rangesplitutil.5=The [bigInteger] parameter cannot be null. rangesplitutil.6=Only ASCII character strings are supported for character string slicing, but the [{0}] character string is not of the ASCII type. retryutil.1=System programming error. The “callable” input parameter cannot be null. retryutil.2=System programming error. The “retrytime[%d]” input parameter cannot be less than 1. retryutil.3=Exception when calling callable. Exception Msg: {0} retryutil.4=Exception when calling callable. Retry Attempt: {0} will start soon. {1} attempts in total. This attempt planned to wait for [{2}]ms, and actually waited for [{3}]ms. Exception Msg: [{4}]. httpclientutil.1=Request address: {0}. Request method: {1}. STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=The remote interface returns -1. We will try again ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/LocalStrings_ja_JP.properties ================================================ very_like_yixiao=1{0}2{1}3 configuration.1=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef\uff0c\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6[{0}]\u4e0d\u5b58\u5728. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. configuration.2=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6[{0}]\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.3=\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {0}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.4=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.5=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.6=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u56e0\u4e3a\u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5\uff0c\u671f\u671b\u662f\u5b57\u7b26\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.7=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u6709\u8bef\uff0c\u56e0\u4e3a\u4ece[{0}]\u83b7\u53d6\u7684\u503c[{1}]\u65e0\u6cd5\u8f6c\u6362\u4e3abool\u7c7b\u578b. \u8bf7\u68c0\u67e5\u6e90\u8868\u7684\u914d\u7f6e\u5e76\u4e14\u505a\u51fa\u76f8\u5e94\u7684\u4fee\u6539. configuration.8=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.9=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.10=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6d6e\u70b9\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.11=\u914d\u7f6e\u6587\u4ef6\u5bf9\u5e94Key[{0}]\u5e76\u4e0d\u5b58\u5728\uff0c\u8be5\u60c5\u51b5\u662f\u4ee3\u7801\u7f16\u7a0b\u9519\u8bef. \u8bf7\u8054\u7cfbDataX\u56e2\u961f\u7684\u540c\u5b66. configuration.12=\u503c[{0}]\u65e0\u6cd5\u9002\u914d\u60a8\u63d0\u4f9b[{1}]\uff0c \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! configuration.13=Path\u4e0d\u80fd\u4e3anull\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.14=\u8def\u5f84[{0}]\u51fa\u73b0\u975e\u6cd5\u503c\u7c7b\u578b[{1}]\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! . configuration.15=\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.16=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.17=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u503c\u4e3anull\uff0cdatax\u65e0\u6cd5\u8bc6\u522b\u8be5\u914d\u7f6e. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.18=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.19=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef\uff0c\u5217\u8868\u4e0b\u6807\u5fc5\u987b\u4e3a\u6570\u5b57\u7c7b\u578b\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{0}] \uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.20=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f!. configuration.21=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8def\u5f84[{0}]\u4e0d\u5408\u6cd5, \u8def\u5f84\u5c42\u6b21\u4e4b\u95f4\u4e0d\u80fd\u51fa\u73b0\u7a7a\u767d\u5b57\u7b26 . configuration.22=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u56e0\u4e3a\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f, JSON\u4e0d\u80fd\u4e3a\u7a7a\u767d. \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. configuration.23=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f: {0} . \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. listutil.1=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef\uff0cList\u4e0d\u80fd\u4e3a\u7a7a. listutil.2=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.3=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5141\u8bb8\u91cd\u590d\u51fa\u73b0\u5728\u5217\u8868\u4e2d: [{1}]. listutil.4=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.5=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.6=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5b58\u5728\u4e8e\u5217\u8868\u4e2d:[{1}]. listutil.7=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.8=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.1=\u5207\u5206\u4efd\u6570\u4e0d\u80fd\u5c0f\u4e8e1. \u6b64\u5904:expectSliceNumber=[{0}]. rangesplitutil.2=\u5bf9 BigInteger \u8fdb\u884c\u5207\u5206\u65f6\uff0c\u5176\u5de6\u53f3\u533a\u95f4\u4e0d\u80fd\u4e3a null. \u6b64\u5904:left=[{0}],right=[{1}]. rangesplitutil.3=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.4=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. rangesplitutil.5=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.6=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. retryutil.1=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2callable\u4e0d\u80fd\u4e3a\u7a7a ! retryutil.2=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2retrytime[%d]\u4e0d\u80fd\u5c0f\u4e8e1 ! retryutil.3=Exception when calling callable, \u5f02\u5e38Msg:{0} retryutil.4=Exception when calling callable, \u5373\u5c06\u5c1d\u8bd5\u6267\u884c\u7b2c{0}\u6b21\u91cd\u8bd5,\u5171\u8ba1\u91cd\u8bd5{1}\u6b21.\u672c\u6b21\u91cd\u8bd5\u8ba1\u5212\u7b49\u5f85[{2}]ms,\u5b9e\u9645\u7b49\u5f85[{3}]ms, \u5f02\u5e38Msg:[{4}] httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/LocalStrings_zh_CN.properties ================================================ very_like_yixiao=\u4e00{0}\u4e8c{1}\u4e09 configuration.1=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef\uff0c\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6[{0}]\u4e0d\u5b58\u5728. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. configuration.2=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6[{0}]\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.3=\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {0}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.4=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.5=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.6=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u56e0\u4e3a\u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5\uff0c\u671f\u671b\u662f\u5b57\u7b26\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.7=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u6709\u8bef\uff0c\u56e0\u4e3a\u4ece[{0}]\u83b7\u53d6\u7684\u503c[{1}]\u65e0\u6cd5\u8f6c\u6362\u4e3abool\u7c7b\u578b. \u8bf7\u68c0\u67e5\u6e90\u8868\u7684\u914d\u7f6e\u5e76\u4e14\u505a\u51fa\u76f8\u5e94\u7684\u4fee\u6539. configuration.8=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.9=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.10=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6d6e\u70b9\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.11=\u914d\u7f6e\u6587\u4ef6\u5bf9\u5e94Key[{0}]\u5e76\u4e0d\u5b58\u5728\uff0c\u8be5\u60c5\u51b5\u662f\u4ee3\u7801\u7f16\u7a0b\u9519\u8bef. \u8bf7\u8054\u7cfbDataX\u56e2\u961f\u7684\u540c\u5b66. configuration.12=\u503c[{0}]\u65e0\u6cd5\u9002\u914d\u60a8\u63d0\u4f9b[{1}]\uff0c \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! configuration.13=Path\u4e0d\u80fd\u4e3anull\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.14=\u8def\u5f84[{0}]\u51fa\u73b0\u975e\u6cd5\u503c\u7c7b\u578b[{1}]\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! . configuration.15=\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.16=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.17=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u503c\u4e3anull\uff0cdatax\u65e0\u6cd5\u8bc6\u522b\u8be5\u914d\u7f6e. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.18=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.19=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef\uff0c\u5217\u8868\u4e0b\u6807\u5fc5\u987b\u4e3a\u6570\u5b57\u7c7b\u578b\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{0}] \uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.20=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f!. configuration.21=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8def\u5f84[{0}]\u4e0d\u5408\u6cd5, \u8def\u5f84\u5c42\u6b21\u4e4b\u95f4\u4e0d\u80fd\u51fa\u73b0\u7a7a\u767d\u5b57\u7b26 . configuration.22=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u56e0\u4e3a\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f, JSON\u4e0d\u80fd\u4e3a\u7a7a\u767d. \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. configuration.23=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f: {0} . \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. listutil.1=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef\uff0cList\u4e0d\u80fd\u4e3a\u7a7a. listutil.2=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.3=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5141\u8bb8\u91cd\u590d\u51fa\u73b0\u5728\u5217\u8868\u4e2d: [{1}]. listutil.4=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.5=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.6=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5b58\u5728\u4e8e\u5217\u8868\u4e2d:[{1}]. listutil.7=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.8=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.1=\u5207\u5206\u4efd\u6570\u4e0d\u80fd\u5c0f\u4e8e1. \u6b64\u5904:expectSliceNumber=[{0}]. rangesplitutil.2=\u5bf9 BigInteger \u8fdb\u884c\u5207\u5206\u65f6\uff0c\u5176\u5de6\u53f3\u533a\u95f4\u4e0d\u80fd\u4e3a null. \u6b64\u5904:left=[{0}],right=[{1}]. rangesplitutil.3=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.4=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. rangesplitutil.5=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.6=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. retryutil.1=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2callable\u4e0d\u80fd\u4e3a\u7a7a ! retryutil.2=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2retrytime[%d]\u4e0d\u80fd\u5c0f\u4e8e1 ! retryutil.3=Exception when calling callable, \u5f02\u5e38Msg:{0} retryutil.4=Exception when calling callable, \u5373\u5c06\u5c1d\u8bd5\u6267\u884c\u7b2c{0}\u6b21\u91cd\u8bd5,\u5171\u8ba1\u91cd\u8bd5{1}\u6b21.\u672c\u6b21\u91cd\u8bd5\u8ba1\u5212\u7b49\u5f85[{2}]ms,\u5b9e\u9645\u7b49\u5f85[{3}]ms, \u5f02\u5e38Msg:[{4}] httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/LocalStrings_zh_HK.properties ================================================ very_like_yixiao=\u4e00{0}\u4e8c{1}\u4e09 configuration.1=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef\uff0c\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6[{0}]\u4e0d\u5b58\u5728. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. configuration.2=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6[{0}]\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.3=\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {0}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.4=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.5=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.6=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u56e0\u4e3a\u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5\uff0c\u671f\u671b\u662f\u5b57\u7b26\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.7=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u6709\u8bef\uff0c\u56e0\u4e3a\u4ece[{0}]\u83b7\u53d6\u7684\u503c[{1}]\u65e0\u6cd5\u8f6c\u6362\u4e3abool\u7c7b\u578b. \u8bf7\u68c0\u67e5\u6e90\u8868\u7684\u914d\u7f6e\u5e76\u4e14\u505a\u51fa\u76f8\u5e94\u7684\u4fee\u6539. configuration.8=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.9=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.10=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6d6e\u70b9\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.11=\u914d\u7f6e\u6587\u4ef6\u5bf9\u5e94Key[{0}]\u5e76\u4e0d\u5b58\u5728\uff0c\u8be5\u60c5\u51b5\u662f\u4ee3\u7801\u7f16\u7a0b\u9519\u8bef. \u8bf7\u8054\u7cfbDataX\u56e2\u961f\u7684\u540c\u5b66. configuration.12=\u503c[{0}]\u65e0\u6cd5\u9002\u914d\u60a8\u63d0\u4f9b[{1}]\uff0c \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! configuration.13=Path\u4e0d\u80fd\u4e3anull\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.14=\u8def\u5f84[{0}]\u51fa\u73b0\u975e\u6cd5\u503c\u7c7b\u578b[{1}]\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! . configuration.15=\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.16=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.17=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u503c\u4e3anull\uff0cdatax\u65e0\u6cd5\u8bc6\u522b\u8be5\u914d\u7f6e. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.18=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.19=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef\uff0c\u5217\u8868\u4e0b\u6807\u5fc5\u987b\u4e3a\u6570\u5b57\u7c7b\u578b\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{0}] \uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.20=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f!. configuration.21=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8def\u5f84[{0}]\u4e0d\u5408\u6cd5, \u8def\u5f84\u5c42\u6b21\u4e4b\u95f4\u4e0d\u80fd\u51fa\u73b0\u7a7a\u767d\u5b57\u7b26 . configuration.22=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u56e0\u4e3a\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f, JSON\u4e0d\u80fd\u4e3a\u7a7a\u767d. \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. configuration.23=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f: {0} . \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. listutil.1=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef\uff0cList\u4e0d\u80fd\u4e3a\u7a7a. listutil.2=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.3=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5141\u8bb8\u91cd\u590d\u51fa\u73b0\u5728\u5217\u8868\u4e2d: [{1}]. listutil.4=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.5=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.6=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5b58\u5728\u4e8e\u5217\u8868\u4e2d:[{1}]. listutil.7=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.8=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.1=\u5207\u5206\u4efd\u6570\u4e0d\u80fd\u5c0f\u4e8e1. \u6b64\u5904:expectSliceNumber=[{0}]. rangesplitutil.2=\u5bf9 BigInteger \u8fdb\u884c\u5207\u5206\u65f6\uff0c\u5176\u5de6\u53f3\u533a\u95f4\u4e0d\u80fd\u4e3a null. \u6b64\u5904:left=[{0}],right=[{1}]. rangesplitutil.3=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.4=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. rangesplitutil.5=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.6=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. retryutil.1=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2callable\u4e0d\u80fd\u4e3a\u7a7a ! retryutil.2=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2retrytime[%d]\u4e0d\u80fd\u5c0f\u4e8e1 ! retryutil.3=Exception when calling callable, \u5f02\u5e38Msg:{0} retryutil.4=Exception when calling callable, \u5373\u5c06\u5c1d\u8bd5\u6267\u884c\u7b2c{0}\u6b21\u91cd\u8bd5,\u5171\u8ba1\u91cd\u8bd5{1}\u6b21.\u672c\u6b21\u91cd\u8bd5\u8ba1\u5212\u7b49\u5f85[{2}]ms,\u5b9e\u9645\u7b49\u5f85[{3}]ms, \u5f02\u5e38Msg:[{4}] very_like_yixiao=一{0}二{1}三 configuration.1=配置資訊錯誤,您提供的配置檔案[{0}]不存在. 請檢查您的配置檔案. configuration.2=配置資訊錯誤. 您提供配置檔案[{0}]讀取失敗,錯誤原因: {1}. 請檢查您的配置檔案的權限設定. configuration.3=請檢查您的配置檔案. 您提供的配置檔案讀取失敗,錯誤原因: {0}. 請檢查您的配置檔案的權限設定. configuration.4=您提供配置檔案有誤,[{0}]是必填參數,不允許為空或者留白 . configuration.5=您提供配置檔案有誤,[{0}]是必填參數,不允許為空或者留白 . configuration.6=任務讀取配置檔案出錯. 因為配置檔案路徑[{0}] 值不合法,期望是字符類型: {1}. 請檢查您的配置並作出修改. configuration.7=您提供的配置資訊有誤,因為從[{0}]獲取的值[{1}]無法轉換為bool類型. 請檢查源表的配置並且做出相應的修改. configuration.8=任務讀取配置檔案出錯. 配置檔案路徑[{0}] 值不合法, 期望是整數類型: {1}. 請檢查您的配置並作出修改. configuration.9=任務讀取配置檔案出錯. 配置檔案路徑[{0}] 值不合法, 期望是整數類型: {1}. 請檢查您的配置並作出修改. configuration.10=任務讀取配置檔案出錯. 配置檔案路徑[{0}] 值不合法, 期望是浮點類型: {1}. 請檢查您的配置並作出修改. configuration.11=配置檔案對應Key[{0}]並不存在,該情況是代碼編程錯誤. 請聯絡DataX團隊的同學. configuration.12=值[{0}]無法適配您提供[{1}], 該異常代表系統編程錯誤, 請聯絡DataX開發團隊! configuration.13=Path不能為null,該異常代表系統編程錯誤, 請聯絡DataX開發團隊 ! configuration.14=路徑[{0}]出現不合法值類型[{1}],該異常代表系統編程錯誤, 請聯絡DataX開發團隊! . configuration.15=該異常代表系統編程錯誤, 請聯絡DataX開發團隊 ! configuration.16=您提供的配置檔案有誤. 路徑[{0}]需要配置Json格式的Map對象,但該節點發現實際類型是[{1}]. 請檢查您的配置並作出修改. configuration.17=您提供的配置檔案有誤. 路徑[{0}]值為null,datax無法識別該配置. 請檢查您的配置並作出修改. configuration.18=您提供的配置檔案有誤. 路徑[{0}]需要配置Json格式的Map對象,但該節點發現實際類型是[{1}]. 請檢查您的配置並作出修改. configuration.19=系統編程錯誤,清單下標必須為數字類型,但該節點發現實際類型是[{0}] ,該異常代表系統編程錯誤, 請聯絡DataX開發團隊 ! configuration.20=系統編程錯誤, 該異常代表系統編程錯誤, 請聯絡DataX開發團隊!. configuration.21=系統編程錯誤, 路徑[{0}]不合法, 路徑層次之間不能出現空白字符 . configuration.22=配置資訊錯誤. 因為您提供的配置資訊不是合法的JSON格式, JSON不能為空白. 請按照標準json格式提供配置資訊. configuration.23=配置資訊錯誤. 您提供的配置資訊不是合法的JSON格式: {0}. 請按照標準json格式提供配置資訊. listutil.1=您提供的作業配置有誤,List不能為空. listutil.2=您提供的作業配置有誤, List不能為空. listutil.3=您提供的作業配置資訊有誤, String:[{0}]不允許重複出現在清單中: [{1}]. listutil.4=您提供的作業配置有誤, List不能為空. listutil.5=您提供的作業配置有誤, List不能為空. listutil.6=您提供的作業配置資訊有誤, String:[{0}]不存在於清單中:[{1}]. listutil.7=您提供的作業配置有誤, List不能為空. listutil.8=您提供的作業配置有誤, List不能為空. rangesplitutil.1=切分份數不能小於1. 此處:expectSliceNumber=[{0}]. rangesplitutil.2=對 BigInteger 進行切分時,其左右區間不能為 null. 此處:left=[{0}],right=[{1}]. rangesplitutil.3=參數 bigInteger 不能為空. rangesplitutil.4=根據字符串進行切分時僅支援 ASCII 字符串,而字符串:[{0}]非 ASCII 字符串. rangesplitutil.5=參數 bigInteger 不能為空. rangesplitutil.6=根據字符串進行切分時僅支援 ASCII 字符串,而字符串:[{0}]非 ASCII 字符串. retryutil.1=系統編程錯誤, 入參callable不能為空 ! retryutil.2=系統編程錯誤, 入參retrytime[%d]不能小於1 ! retryutil.3=Exception when calling callable, 異常Msg:{0} retryutil.4=Exception when calling callable, 即將嘗試執行第{0}次重試,共計重試{1}次.本次重試計劃等待[{2}]ms,實際等待[{3}]ms, 異常Msg:[{4}] httpclientutil.1=\u8ACB\u6C42\u5730\u5740\uFF1A{0}, \u8ACB\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u9060\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C07\u91CD\u8A66 ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/LocalStrings_zh_TW.properties ================================================ very_like_yixiao=\u4e00{0}\u4e8c{1}\u4e09 configuration.1=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef\uff0c\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6[{0}]\u4e0d\u5b58\u5728. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. configuration.2=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6[{0}]\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.3=\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u8bfb\u53d6\u5931\u8d25\uff0c\u9519\u8bef\u539f\u56e0: {0}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u6587\u4ef6\u7684\u6743\u9650\u8bbe\u7f6e. configuration.4=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.5=\u60a8\u63d0\u4f9b\u914d\u7f6e\u6587\u4ef6\u6709\u8bef\uff0c[{0}]\u662f\u5fc5\u586b\u53c2\u6570\uff0c\u4e0d\u5141\u8bb8\u4e3a\u7a7a\u6216\u8005\u7559\u767d . configuration.6=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u56e0\u4e3a\u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5\uff0c\u671f\u671b\u662f\u5b57\u7b26\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.7=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u6709\u8bef\uff0c\u56e0\u4e3a\u4ece[{0}]\u83b7\u53d6\u7684\u503c[{1}]\u65e0\u6cd5\u8f6c\u6362\u4e3abool\u7c7b\u578b. \u8bf7\u68c0\u67e5\u6e90\u8868\u7684\u914d\u7f6e\u5e76\u4e14\u505a\u51fa\u76f8\u5e94\u7684\u4fee\u6539. configuration.8=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.9=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6574\u6570\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.10=\u4efb\u52a1\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\u51fa\u9519. \u914d\u7f6e\u6587\u4ef6\u8def\u5f84[{0}] \u503c\u975e\u6cd5, \u671f\u671b\u662f\u6d6e\u70b9\u7c7b\u578b: {1}. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.11=\u914d\u7f6e\u6587\u4ef6\u5bf9\u5e94Key[{0}]\u5e76\u4e0d\u5b58\u5728\uff0c\u8be5\u60c5\u51b5\u662f\u4ee3\u7801\u7f16\u7a0b\u9519\u8bef. \u8bf7\u8054\u7cfbDataX\u56e2\u961f\u7684\u540c\u5b66. configuration.12=\u503c[{0}]\u65e0\u6cd5\u9002\u914d\u60a8\u63d0\u4f9b[{1}]\uff0c \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! configuration.13=Path\u4e0d\u80fd\u4e3anull\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.14=\u8def\u5f84[{0}]\u51fa\u73b0\u975e\u6cd5\u503c\u7c7b\u578b[{1}]\uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f! . configuration.15=\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.16=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.17=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u503c\u4e3anull\uff0cdatax\u65e0\u6cd5\u8bc6\u522b\u8be5\u914d\u7f6e. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.18=\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u6587\u4ef6\u6709\u8bef. \u8def\u5f84[{0}]\u9700\u8981\u914d\u7f6eJson\u683c\u5f0f\u7684Map\u5bf9\u8c61\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{1}]. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. configuration.19=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef\uff0c\u5217\u8868\u4e0b\u6807\u5fc5\u987b\u4e3a\u6570\u5b57\u7c7b\u578b\uff0c\u4f46\u8be5\u8282\u70b9\u53d1\u73b0\u5b9e\u9645\u7c7b\u578b\u662f[{0}] \uff0c\u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f ! configuration.20=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8be5\u5f02\u5e38\u4ee3\u8868\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8bf7\u8054\u7cfbDataX\u5f00\u53d1\u56e2\u961f!. configuration.21=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u8def\u5f84[{0}]\u4e0d\u5408\u6cd5, \u8def\u5f84\u5c42\u6b21\u4e4b\u95f4\u4e0d\u80fd\u51fa\u73b0\u7a7a\u767d\u5b57\u7b26 . configuration.22=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u56e0\u4e3a\u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f, JSON\u4e0d\u80fd\u4e3a\u7a7a\u767d. \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. configuration.23=\u914d\u7f6e\u4fe1\u606f\u9519\u8bef. \u60a8\u63d0\u4f9b\u7684\u914d\u7f6e\u4fe1\u606f\u4e0d\u662f\u5408\u6cd5\u7684JSON\u683c\u5f0f: {0} . \u8bf7\u6309\u7167\u6807\u51c6json\u683c\u5f0f\u63d0\u4f9b\u914d\u7f6e\u4fe1\u606f. listutil.1=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef\uff0cList\u4e0d\u80fd\u4e3a\u7a7a. listutil.2=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.3=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5141\u8bb8\u91cd\u590d\u51fa\u73b0\u5728\u5217\u8868\u4e2d: [{1}]. listutil.4=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.5=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.6=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u4fe1\u606f\u6709\u8bef, String:[{0}] \u4e0d\u5b58\u5728\u4e8e\u5217\u8868\u4e2d:[{1}]. listutil.7=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. listutil.8=\u60a8\u63d0\u4f9b\u7684\u4f5c\u4e1a\u914d\u7f6e\u6709\u8bef, List\u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.1=\u5207\u5206\u4efd\u6570\u4e0d\u80fd\u5c0f\u4e8e1. \u6b64\u5904:expectSliceNumber=[{0}]. rangesplitutil.2=\u5bf9 BigInteger \u8fdb\u884c\u5207\u5206\u65f6\uff0c\u5176\u5de6\u53f3\u533a\u95f4\u4e0d\u80fd\u4e3a null. \u6b64\u5904:left=[{0}],right=[{1}]. rangesplitutil.3=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.4=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. rangesplitutil.5=\u53c2\u6570 bigInteger \u4e0d\u80fd\u4e3a\u7a7a. rangesplitutil.6=\u6839\u636e\u5b57\u7b26\u4e32\u8fdb\u884c\u5207\u5206\u65f6\u4ec5\u652f\u6301 ASCII \u5b57\u7b26\u4e32\uff0c\u800c\u5b57\u7b26\u4e32:[{0}]\u975e ASCII \u5b57\u7b26\u4e32. retryutil.1=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2callable\u4e0d\u80fd\u4e3a\u7a7a ! retryutil.2=\u7cfb\u7edf\u7f16\u7a0b\u9519\u8bef, \u5165\u53c2retrytime[%d]\u4e0d\u80fd\u5c0f\u4e8e1 ! retryutil.3=Exception when calling callable, \u5f02\u5e38Msg:{0} retryutil.4=Exception when calling callable, \u5373\u5c06\u5c1d\u8bd5\u6267\u884c\u7b2c{0}\u6b21\u91cd\u8bd5,\u5171\u8ba1\u91cd\u8bd5{1}\u6b21.\u672c\u6b21\u91cd\u8bd5\u8ba1\u5212\u7b49\u5f85[{2}]ms,\u5b9e\u9645\u7b49\u5f85[{3}]ms, \u5f02\u5e38Msg:[{4}] very_like_yixiao=一{0}二{1}三 configuration.1=配置資訊錯誤,您提供的配置檔案[{0}]不存在. 請檢查您的配置檔案. configuration.2=配置資訊錯誤. 您提供配置檔案[{0}]讀取失敗,錯誤原因: {1}. 請檢查您的配置檔案的權限設定. configuration.3=請檢查您的配置檔案. 您提供的配置檔案讀取失敗,錯誤原因: {0}. 請檢查您的配置檔案的權限設定. configuration.4=您提供配置檔案有誤,[{0}]是必填參數,不允許為空或者留白 . configuration.5=您提供配置檔案有誤,[{0}]是必填參數,不允許為空或者留白 . configuration.6=任務讀取配置檔案出錯. 因為配置檔案路徑[{0}] 值不合法,期望是字符類型: {1}. 請檢查您的配置並作出修改. configuration.7=您提供的配置資訊有誤,因為從[{0}]獲取的值[{1}]無法轉換為bool類型. 請檢查源表的配置並且做出相應的修改. configuration.8=任務讀取配置檔案出錯. 配置檔案路徑[{0}] 值不合法, 期望是整數類型: {1}. 請檢查您的配置並作出修改. configuration.9=任務讀取配置檔案出錯. 配置檔案路徑[{0}] 值不合法, 期望是整數類型: {1}. 請檢查您的配置並作出修改. configuration.10=任務讀取配置檔案出錯. 配置檔案路徑[{0}] 值不合法, 期望是浮點類型: {1}. 請檢查您的配置並作出修改. configuration.11=配置檔案對應Key[{0}]並不存在,該情況是代碼編程錯誤. 請聯絡DataX團隊的同學. configuration.12=值[{0}]無法適配您提供[{1}], 該異常代表系統編程錯誤, 請聯絡DataX開發團隊! configuration.13=Path不能為null,該異常代表系統編程錯誤, 請聯絡DataX開發團隊 ! configuration.14=路徑[{0}]出現不合法值類型[{1}],該異常代表系統編程錯誤, 請聯絡DataX開發團隊! . configuration.15=該異常代表系統編程錯誤, 請聯絡DataX開發團隊 ! configuration.16=您提供的配置檔案有誤. 路徑[{0}]需要配置Json格式的Map對象,但該節點發現實際類型是[{1}]. 請檢查您的配置並作出修改. configuration.17=您提供的配置檔案有誤. 路徑[{0}]值為null,datax無法識別該配置. 請檢查您的配置並作出修改. configuration.18=您提供的配置檔案有誤. 路徑[{0}]需要配置Json格式的Map對象,但該節點發現實際類型是[{1}]. 請檢查您的配置並作出修改. configuration.19=系統編程錯誤,清單下標必須為數字類型,但該節點發現實際類型是[{0}] ,該異常代表系統編程錯誤, 請聯絡DataX開發團隊 ! configuration.20=系統編程錯誤, 該異常代表系統編程錯誤, 請聯絡DataX開發團隊!. configuration.21=系統編程錯誤, 路徑[{0}]不合法, 路徑層次之間不能出現空白字符 . configuration.22=配置資訊錯誤. 因為您提供的配置資訊不是合法的JSON格式, JSON不能為空白. 請按照標準json格式提供配置資訊. configuration.23=配置資訊錯誤. 您提供的配置資訊不是合法的JSON格式: {0}. 請按照標準json格式提供配置資訊. listutil.1=您提供的作業配置有誤,List不能為空. listutil.2=您提供的作業配置有誤, List不能為空. listutil.3=您提供的作業配置資訊有誤, String:[{0}]不允許重複出現在清單中: [{1}]. listutil.4=您提供的作業配置有誤, List不能為空. listutil.5=您提供的作業配置有誤, List不能為空. listutil.6=您提供的作業配置資訊有誤, String:[{0}]不存在於清單中:[{1}]. listutil.7=您提供的作業配置有誤, List不能為空. listutil.8=您提供的作業配置有誤, List不能為空. rangesplitutil.1=切分份數不能小於1. 此處:expectSliceNumber=[{0}]. rangesplitutil.2=對 BigInteger 進行切分時,其左右區間不能為 null. 此處:left=[{0}],right=[{1}]. rangesplitutil.3=參數 bigInteger 不能為空. rangesplitutil.4=根據字符串進行切分時僅支援 ASCII 字符串,而字符串:[{0}]非 ASCII 字符串. rangesplitutil.5=參數 bigInteger 不能為空. rangesplitutil.6=根據字符串進行切分時僅支援 ASCII 字符串,而字符串:[{0}]非 ASCII 字符串. retryutil.1=系統編程錯誤, 入參callable不能為空 ! retryutil.2=系統編程錯誤, 入參retrytime[%d]不能小於1 ! retryutil.3=Exception when calling callable, 異常Msg:{0} retryutil.4=Exception when calling callable, 即將嘗試執行第{0}次重試,共計重試{1}次.本次重試計劃等待[{2}]ms,實際等待[{3}]ms, 異常Msg:[{4}] httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/LoggerFunction.java ================================================ package com.alibaba.datax.common.util; /** * @author molin.lxd * @date 2021-05-09 */ public interface LoggerFunction { void apply(); } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/MessageSource.java ================================================ package com.alibaba.datax.common.util; import java.text.MessageFormat; import java.util.HashMap; import java.util.Locale; import java.util.Map; import java.util.MissingResourceException; import java.util.ResourceBundle; import java.util.TimeZone; import org.apache.commons.lang3.LocaleUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class MessageSource { private static final Logger LOG = LoggerFactory.getLogger(MessageSource.class); private static Map resourceBundleCache = new HashMap(); public static Locale locale = null; public static TimeZone timeZone = null; private ResourceBundle resourceBundle = null; private MessageSource(ResourceBundle resourceBundle) { this.resourceBundle = resourceBundle; } /** * @param baseName * demo: javax.servlet.http.LocalStrings * * @throws MissingResourceException * - if no resource bundle for the specified base name can be * found * */ public static MessageSource loadResourceBundle(String baseName) { return loadResourceBundle(baseName, MessageSource.locale, MessageSource.timeZone); } /** * @param clazz * 根据其获取package name * */ public static MessageSource loadResourceBundle(Class clazz) { return loadResourceBundle(clazz.getPackage().getName()); } /** * @param clazz * 根据其获取package name * */ public static MessageSource loadResourceBundle(Class clazz, Locale locale, TimeZone timeZone) { return loadResourceBundle(clazz.getPackage().getName(), locale, timeZone); } /** * warn: * ok: ResourceBundle.getBundle("xxx.LocalStrings", Locale.getDefault(), LoadUtil.getJarLoader(PluginType.WRITER, "odpswriter")) * error: ResourceBundle.getBundle("xxx.LocalStrings", Locale.getDefault(), LoadUtil.getJarLoader(PluginType.WRITER, "odpswriter")) * @param baseName * demo: javax.servlet.http.LocalStrings * * @throws MissingResourceException * - if no resource bundle for the specified base name can be * found * * */ public static MessageSource loadResourceBundle(String baseName, Locale locale, TimeZone timeZone) { ResourceBundle resourceBundle = null; if (null == locale) { locale = LocaleUtils.toLocale("en_US"); } if (null == timeZone) { timeZone = TimeZone.getDefault(); } String resourceBaseName = String.format("%s.LocalStrings", baseName); LOG.debug( "initEnvironment MessageSource.locale[{}], MessageSource.timeZone[{}]", MessageSource.locale, MessageSource.timeZone); LOG.debug( "loadResourceBundle with locale[{}], timeZone[{}], baseName[{}]", locale, timeZone, resourceBaseName); // warn: 这个map的维护需要考虑Local吗, no? if (!MessageSource.resourceBundleCache.containsKey(resourceBaseName)) { ClassLoader clazzLoader = Thread.currentThread() .getContextClassLoader(); LOG.debug("loadResourceBundle classLoader:{}", clazzLoader); resourceBundle = ResourceBundle.getBundle(resourceBaseName, locale, clazzLoader); MessageSource.resourceBundleCache.put(resourceBaseName, resourceBundle); } else { resourceBundle = MessageSource.resourceBundleCache .get(resourceBaseName); } return new MessageSource(resourceBundle); } public static boolean unloadResourceBundle(Class clazz) { String baseName = clazz.getPackage().getName(); String resourceBaseName = String.format("%s.LocalStrings", baseName); if (!MessageSource.resourceBundleCache.containsKey(resourceBaseName)) { return false; } else { MessageSource.resourceBundleCache.remove(resourceBaseName); return true; } } public static MessageSource reloadResourceBundle(Class clazz) { MessageSource.unloadResourceBundle(clazz); return MessageSource.loadResourceBundle(clazz); } public static void setEnvironment(Locale locale, TimeZone timeZone) { // warn: 设置默认? @2018.03.21 将此处注释移除,否则在国际化多时区下会遇到问题 Locale.setDefault(locale); TimeZone.setDefault(timeZone); MessageSource.locale = locale; MessageSource.timeZone = timeZone; LOG.info("use Locale: {} timeZone: {}", locale, timeZone); } public static void init(final Configuration configuration) { Locale locale2Set = Locale.getDefault(); String localeStr = configuration.getString("common.column.locale", "zh_CN");// 默认操作系统的 if (StringUtils.isNotBlank(localeStr)) { try { locale2Set = LocaleUtils.toLocale(localeStr); } catch (Exception e) { LOG.warn("ignored locale parse exception: {}", e.getMessage()); } } TimeZone timeZone2Set = TimeZone.getDefault(); String timeZoneStr = configuration.getString("common.column.timeZone");// 默认操作系统的 if (StringUtils.isNotBlank(timeZoneStr)) { try { timeZone2Set = TimeZone.getTimeZone(timeZoneStr); } catch (Exception e) { LOG.warn("ignored timezone parse exception: {}", e.getMessage()); } } LOG.info("JVM TimeZone: {}, Locale: {}", timeZone2Set.getID(), locale2Set); MessageSource.setEnvironment(locale2Set, timeZone2Set); } public static void clearCache() { MessageSource.resourceBundleCache.clear(); } public String message(String code) { return this.messageWithDefaultMessage(code, null); } public String message(String code, String args1) { return this.messageWithDefaultMessage(code, null, new Object[] { args1 }); } public String message(String code, String args1, String args2) { return this.messageWithDefaultMessage(code, null, new Object[] { args1, args2 }); } public String message(String code, String args1, String args2, String args3) { return this.messageWithDefaultMessage(code, null, new Object[] { args1, args2, args3 }); } // 上面几个重载可以应对大多数情况, 避免使用这个可以提高性能的 public String message(String code, Object... args) { return this.messageWithDefaultMessage(code, null, args); } public String messageWithDefaultMessage(String code, String defaultMessage) { return this.messageWithDefaultMessage(code, defaultMessage, new Object[] {}); } /** * @param args * MessageFormat会依次调用对应对象的toString方法 * */ public String messageWithDefaultMessage(String code, String defaultMessage, Object... args) { String messageStr = null; try { messageStr = this.resourceBundle.getString(code); } catch (MissingResourceException e) { messageStr = defaultMessage; } if (null != messageStr && null != args && args.length > 0) { // warn: see loadResourceBundle set default locale return MessageFormat.format(messageStr, args); } else { return messageStr; } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/RangeSplitUtil.java ================================================ package com.alibaba.datax.common.util; import org.apache.commons.lang3.tuple.ImmutablePair; import org.apache.commons.lang3.tuple.Pair; import java.math.BigInteger; import java.util.*; /** * 提供通用的根据数字范围、字符串范围等进行切分的通用功能. */ public final class RangeSplitUtil { public static String[] doAsciiStringSplit(String left, String right, int expectSliceNumber) { int radix = 128; BigInteger[] tempResult = doBigIntegerSplit(stringToBigInteger(left, radix), stringToBigInteger(right, radix), expectSliceNumber); String[] result = new String[tempResult.length]; //处理第一个字符串(因为:在转换为数字,再还原的时候,如果首字符刚好是 basic,则不知道应该添加多少个 basic) result[0] = left; result[tempResult.length - 1] = right; for (int i = 1, len = tempResult.length - 1; i < len; i++) { result[i] = bigIntegerToString(tempResult[i], radix); } return result; } public static long[] doLongSplit(long left, long right, int expectSliceNumber) { BigInteger[] result = doBigIntegerSplit(BigInteger.valueOf(left), BigInteger.valueOf(right), expectSliceNumber); long[] returnResult = new long[result.length]; for (int i = 0, len = result.length; i < len; i++) { returnResult[i] = result[i].longValue(); } return returnResult; } public static BigInteger[] doBigIntegerSplit(BigInteger left, BigInteger right, int expectSliceNumber) { if (expectSliceNumber < 1) { throw new IllegalArgumentException(String.format( "切分份数不能小于1. 此处:expectSliceNumber=[%s].", expectSliceNumber)); } if (null == left || null == right) { throw new IllegalArgumentException(String.format( "对 BigInteger 进行切分时,其左右区间不能为 null. 此处:left=[%s],right=[%s].", left, right)); } if (left.compareTo(right) == 0) { return new BigInteger[]{left, right}; } else { // 调整大小顺序,确保 left < right if (left.compareTo(right) > 0) { BigInteger temp = left; left = right; right = temp; } //left < right BigInteger endAndStartGap = right.subtract(left); BigInteger step = endAndStartGap.divide(BigInteger.valueOf(expectSliceNumber)); BigInteger remainder = endAndStartGap.remainder(BigInteger.valueOf(expectSliceNumber)); //remainder 不可能超过expectSliceNumber,所以不需要检查remainder的 Integer 的范围 // 这里不能 step.intValue()==0,因为可能溢出 if (step.compareTo(BigInteger.ZERO) == 0) { expectSliceNumber = remainder.intValue(); } BigInteger[] result = new BigInteger[expectSliceNumber + 1]; result[0] = left; result[expectSliceNumber] = right; BigInteger lowerBound; BigInteger upperBound = left; for (int i = 1; i < expectSliceNumber; i++) { lowerBound = upperBound; upperBound = lowerBound.add(step); upperBound = upperBound.add((remainder.compareTo(BigInteger.valueOf(i)) >= 0) ? BigInteger.ONE : BigInteger.ZERO); result[i] = upperBound; } return result; } } private static void checkIfBetweenRange(int value, int left, int right) { if (value < left || value > right) { throw new IllegalArgumentException(String.format("parameter can not <[%s] or >[%s].", left, right)); } } /** * 由于只支持 ascii 码对应字符,所以radix 范围为[1,128] */ public static BigInteger stringToBigInteger(String aString, int radix) { if (null == aString) { throw new IllegalArgumentException("参数 bigInteger 不能为空."); } checkIfBetweenRange(radix, 1, 128); BigInteger result = BigInteger.ZERO; BigInteger radixBigInteger = BigInteger.valueOf(radix); int tempChar; int k = 0; for (int i = aString.length() - 1; i >= 0; i--) { tempChar = aString.charAt(i); if (tempChar >= 128) { throw new IllegalArgumentException(String.format("根据字符串进行切分时仅支持 ASCII 字符串,而字符串:[%s]非 ASCII 字符串.", aString)); } result = result.add(BigInteger.valueOf(tempChar).multiply(radixBigInteger.pow(k))); k++; } return result; } /** * 把BigInteger 转换为 String.注意:radix 和 basic 范围都为[1,128], radix + basic 的范围也必须在[1,128]. */ private static String bigIntegerToString(BigInteger bigInteger, int radix) { if (null == bigInteger) { throw new IllegalArgumentException("参数 bigInteger 不能为空."); } checkIfBetweenRange(radix, 1, 128); StringBuilder resultStringBuilder = new StringBuilder(); List list = new ArrayList(); BigInteger radixBigInteger = BigInteger.valueOf(radix); BigInteger currentValue = bigInteger; BigInteger quotient = currentValue.divide(radixBigInteger); while (quotient.compareTo(BigInteger.ZERO) > 0) { list.add(currentValue.remainder(radixBigInteger).intValue()); currentValue = currentValue.divide(radixBigInteger); quotient = currentValue; } Collections.reverse(list); if (list.isEmpty()) { list.add(0, bigInteger.remainder(radixBigInteger).intValue()); } Map map = new HashMap(); for (int i = 0; i < radix; i++) { map.put(i, (char) (i)); } // String msg = String.format("%s 转为 %s 进制,结果为:%s", bigInteger.longValue(), radix, list); // System.out.println(msg); for (Integer aList : list) { resultStringBuilder.append(map.get(aList)); } return resultStringBuilder.toString(); } /** * 获取字符串中的最小字符和最大字符(依据 ascii 进行判断).要求字符串必须非空,并且为 ascii 字符串. * 返回的Pair,left=最小字符,right=最大字符. */ public static Pair getMinAndMaxCharacter(String aString) { if (!isPureAscii(aString)) { throw new IllegalArgumentException(String.format("根据字符串进行切分时仅支持 ASCII 字符串,而字符串:[%s]非 ASCII 字符串.", aString)); } char min = aString.charAt(0); char max = min; char temp; for (int i = 1, len = aString.length(); i < len; i++) { temp = aString.charAt(i); min = min < temp ? min : temp; max = max > temp ? max : temp; } return new ImmutablePair(min, max); } private static boolean isPureAscii(String aString) { if (null == aString) { return false; } for (int i = 0, len = aString.length(); i < len; i++) { char ch = aString.charAt(i); if (ch >= 127 || ch < 0) { return false; } } return true; } /** * List拆分工具函数,主要用于reader插件的split拆分逻辑 * */ public static List> doListSplit(List objects, int adviceNumber) { List> splitLists = new ArrayList>(); if (null == objects) { return splitLists; } long[] splitPoint = RangeSplitUtil.doLongSplit(0, objects.size(), adviceNumber); for (int startIndex = 0; startIndex < splitPoint.length - 1; startIndex++) { List objectsForTask = new ArrayList(); int endIndex = startIndex + 1; for (long i = splitPoint[startIndex]; i < splitPoint[endIndex]; i++) { objectsForTask.add(objects.get((int) i)); } if (!objectsForTask.isEmpty()) { splitLists.add(objectsForTask); } } return splitLists; } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/RetryUtil.java ================================================ package com.alibaba.datax.common.util; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; import java.util.concurrent.*; public final class RetryUtil { private static final Logger LOG = LoggerFactory.getLogger(RetryUtil.class); private static final long MAX_SLEEP_MILLISECOND = 256 * 1000; /** * 重试次数工具方法. * * @param callable 实际逻辑 * @param retryTimes 最大重试次数(>1) * @param sleepTimeInMilliSecond 运行失败后休眠对应时间再重试 * @param exponential 休眠时间是否指数递增 * @param 返回值类型 * @return 经过重试的callable的执行结果 */ public static T executeWithRetry(Callable callable, int retryTimes, long sleepTimeInMilliSecond, boolean exponential) throws Exception { Retry retry = new Retry(); return retry.doRetry(callable, retryTimes, sleepTimeInMilliSecond, exponential, null); } /** * 重试次数工具方法. * * @param callable 实际逻辑 * @param retryTimes 最大重试次数(>1) * @param sleepTimeInMilliSecond 运行失败后休眠对应时间再重试 * @param exponential 休眠时间是否指数递增 * @param 返回值类型 * @param retryExceptionClasss 出现指定的异常类型时才进行重试 * @return 经过重试的callable的执行结果 */ public static T executeWithRetry(Callable callable, int retryTimes, long sleepTimeInMilliSecond, boolean exponential, List> retryExceptionClasss) throws Exception { Retry retry = new Retry(); return retry.doRetry(callable, retryTimes, sleepTimeInMilliSecond, exponential, retryExceptionClasss); } /** * 在外部线程执行并且重试。每次执行需要在timeoutMs内执行完,不然视为失败。 * 执行异步操作的线程池从外部传入,线程池的共享粒度由外部控制。比如,HttpClientUtil共享一个线程池。 *

* 限制条件:仅仅能够在阻塞的时候interrupt线程 * * @param callable 实际逻辑 * @param retryTimes 最大重试次数(>1) * @param sleepTimeInMilliSecond 运行失败后休眠对应时间再重试 * @param exponential 休眠时间是否指数递增 * @param timeoutMs callable执行超时时间,毫秒 * @param executor 执行异步操作的线程池 * @param 返回值类型 * @return 经过重试的callable的执行结果 */ public static T asyncExecuteWithRetry(Callable callable, int retryTimes, long sleepTimeInMilliSecond, boolean exponential, long timeoutMs, ThreadPoolExecutor executor) throws Exception { Retry retry = new AsyncRetry(timeoutMs, executor); return retry.doRetry(callable, retryTimes, sleepTimeInMilliSecond, exponential, null); } /** * 创建异步执行的线程池。特性如下: * core大小为0,初始状态下无线程,无初始消耗。 * max大小为5,最多五个线程。 * 60秒超时时间,闲置超过60秒线程会被回收。 * 使用SynchronousQueue,任务不会排队,必须要有可用线程才能提交成功,否则会RejectedExecutionException。 * * @return 线程池 */ public static ThreadPoolExecutor createThreadPoolExecutor() { return new ThreadPoolExecutor(0, 5, 60L, TimeUnit.SECONDS, new SynchronousQueue()); } private static class Retry { public T doRetry(Callable callable, int retryTimes, long sleepTimeInMilliSecond, boolean exponential, List> retryExceptionClasss) throws Exception { if (null == callable) { throw new IllegalArgumentException("系统编程错误, 入参callable不能为空 ! "); } if (retryTimes < 1) { throw new IllegalArgumentException(String.format( "系统编程错误, 入参retrytime[%d]不能小于1 !", retryTimes)); } Exception saveException = null; for (int i = 0; i < retryTimes; i++) { try { return call(callable); } catch (Exception e) { saveException = e; if (i == 0) { LOG.error(String.format("Exception when calling callable, 异常Msg:%s", saveException.getMessage()), saveException); } if (null != retryExceptionClasss && !retryExceptionClasss.isEmpty()) { boolean needRetry = false; for (Class eachExceptionClass : retryExceptionClasss) { if (eachExceptionClass == e.getClass()) { needRetry = true; break; } } if (!needRetry) { throw saveException; } } if (i + 1 < retryTimes && sleepTimeInMilliSecond > 0) { long startTime = System.currentTimeMillis(); long timeToSleep; if (exponential) { timeToSleep = sleepTimeInMilliSecond * (long) Math.pow(2, i); if(timeToSleep >= MAX_SLEEP_MILLISECOND) { timeToSleep = MAX_SLEEP_MILLISECOND; } } else { timeToSleep = sleepTimeInMilliSecond; if(timeToSleep >= MAX_SLEEP_MILLISECOND) { timeToSleep = MAX_SLEEP_MILLISECOND; } } try { Thread.sleep(timeToSleep); } catch (InterruptedException ignored) { } long realTimeSleep = System.currentTimeMillis()-startTime; LOG.error(String.format("Exception when calling callable, 即将尝试执行第%s次重试.本次重试计划等待[%s]ms,实际等待[%s]ms, 异常Msg:[%s]", i+1, timeToSleep,realTimeSleep, e.getMessage())); } } } throw saveException; } protected T call(Callable callable) throws Exception { return callable.call(); } } private static class AsyncRetry extends Retry { private long timeoutMs; private ThreadPoolExecutor executor; public AsyncRetry(long timeoutMs, ThreadPoolExecutor executor) { this.timeoutMs = timeoutMs; this.executor = executor; } /** * 使用传入的线程池异步执行任务,并且等待。 *

* future.get()方法,等待指定的毫秒数。如果任务在超时时间内结束,则正常返回。 * 如果抛异常(可能是执行超时、执行异常、被其他线程cancel或interrupt),都记录日志并且网上抛异常。 * 正常和非正常的情况都会判断任务是否结束,如果没有结束,则cancel任务。cancel参数为true,表示即使 * 任务正在执行,也会interrupt线程。 * * @param callable * @param * @return * @throws Exception */ @Override protected T call(Callable callable) throws Exception { Future future = executor.submit(callable); try { return future.get(timeoutMs, TimeUnit.MILLISECONDS); } catch (Exception e) { LOG.warn("Try once failed", e); throw e; } finally { if (!future.isDone()) { future.cancel(true); LOG.warn("Try once task not done, cancel it, active count: " + executor.getActiveCount()); } } } } } ================================================ FILE: common/src/main/java/com/alibaba/datax/common/util/StrUtil.java ================================================ package com.alibaba.datax.common.util; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; import java.text.DecimalFormat; import java.util.HashMap; import java.util.Map; import java.util.regex.Matcher; import java.util.regex.Pattern; public class StrUtil { private final static long KB_IN_BYTES = 1024; private final static long MB_IN_BYTES = 1024 * KB_IN_BYTES; private final static long GB_IN_BYTES = 1024 * MB_IN_BYTES; private final static long TB_IN_BYTES = 1024 * GB_IN_BYTES; private final static DecimalFormat df = new DecimalFormat("0.00"); private static final Pattern VARIABLE_PATTERN = Pattern .compile("(\\$)\\{?(\\w+)\\}?"); private static String SYSTEM_ENCODING = System.getProperty("file.encoding"); static { if (SYSTEM_ENCODING == null) { SYSTEM_ENCODING = "UTF-8"; } } private StrUtil() { } public static String stringify(long byteNumber) { if (byteNumber / TB_IN_BYTES > 0) { return df.format((double) byteNumber / (double) TB_IN_BYTES) + "TB"; } else if (byteNumber / GB_IN_BYTES > 0) { return df.format((double) byteNumber / (double) GB_IN_BYTES) + "GB"; } else if (byteNumber / MB_IN_BYTES > 0) { return df.format((double) byteNumber / (double) MB_IN_BYTES) + "MB"; } else if (byteNumber / KB_IN_BYTES > 0) { return df.format((double) byteNumber / (double) KB_IN_BYTES) + "KB"; } else { return String.valueOf(byteNumber) + "B"; } } public static String replaceVariable(final String param) { Map mapping = new HashMap(); Matcher matcher = VARIABLE_PATTERN.matcher(param); while (matcher.find()) { String variable = matcher.group(2); String value = System.getProperty(variable); if (StringUtils.isBlank(value)) { value = matcher.group(); } mapping.put(matcher.group(), value); } String retString = param; for (final String key : mapping.keySet()) { retString = retString.replace(key, mapping.get(key)); } return retString; } public static String compressMiddle(String s, int headLength, int tailLength) { Validate.notNull(s, "Input string must not be null"); Validate.isTrue(headLength > 0, "Head length must be larger than 0"); Validate.isTrue(tailLength > 0, "Tail length must be larger than 0"); if(headLength + tailLength >= s.length()) { return s; } return s.substring(0, headLength) + "..." + s.substring(s.length() - tailLength); } public static String getMd5(String plainText) { try { StringBuilder builder = new StringBuilder(); for (byte b : MessageDigest.getInstance("MD5").digest(plainText.getBytes())) { int i = b & 0xff; if (i < 0x10) { builder.append('0'); } builder.append(Integer.toHexString(i)); } return builder.toString(); } catch (NoSuchAlgorithmException e) { throw new RuntimeException(e); } } } ================================================ FILE: core/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT datax-core datax-core jar com.alibaba.datax datax-transformer ${datax-project-version} slf4j-log4j12 org.slf4j commons-configuration commons-configuration ${commons-configuration-version} commons-cli commons-cli ${commons-cli-version} commons-beanutils commons-beanutils 1.9.2 org.apache.httpcomponents httpclient 4.5.13 org.apache.httpcomponents fluent-hc 4.5 org.slf4j slf4j-api ch.qos.logback logback-classic org.codehaus.janino janino 2.5.16 junit junit test org.mockito mockito-core 1.8.5 test org.powermock powermock-api-mockito 1.4.10 test org.powermock powermock-module-junit4 1.4.10 test org.apache.commons commons-lang3 3.3.2 org.codehaus.groovy groovy-all 2.1.9 src/main/java **/*.properties org.apache.maven.plugins maven-jar-plugin com.alibaba.datax.core.Engine maven-assembly-plugin com.alibaba.datax.core.Engine datax src/main/assembly/package.xml package single maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} ================================================ FILE: core/src/main/assembly/package.xml ================================================ dir false src/main/bin *.* *.pyc 775 /bin src/main/script *.* 775 /script src/main/conf *.* /conf target/ datax-core-0.0.1-SNAPSHOT.jar /lib src/main/job/ *.json /job src/main/tools/ *.* /tools 777 src/main/tmp *.* /tmp false /lib runtime ================================================ FILE: core/src/main/conf/.secret.properties ================================================ #ds basicAuth config auth.user= auth.pass= current.keyVersion= current.publicKey= current.privateKey= current.service.username= current.service.password= ================================================ FILE: core/src/main/conf/core.json ================================================ { "entry": { "jvm": "-Xms1G -Xmx1G", "environment": {} }, "common": { "column": { "datetimeFormat": "yyyy-MM-dd HH:mm:ss", "timeFormat": "HH:mm:ss", "dateFormat": "yyyy-MM-dd", "extraFormats":["yyyyMMdd"], "timeZone": "GMT+8", "encoding": "utf-8" } }, "core": { "dataXServer": { "address": "http://localhost:7001/api", "timeout": 10000, "reportDataxLog": false, "reportPerfLog": false }, "transport": { "channel": { "class": "com.alibaba.datax.core.transport.channel.memory.MemoryChannel", "speed": { "byte": -1, "record": -1 }, "flowControlInterval": 20, "capacity": 512, "byteCapacity": 67108864 }, "exchanger": { "class": "com.alibaba.datax.core.plugin.BufferedRecordExchanger", "bufferSize": 32 } }, "container": { "job": { "reportInterval": 10000 }, "taskGroup": { "channel": 5 }, "trace": { "enable": "false" } }, "statistics": { "collector": { "plugin": { "taskClass": "com.alibaba.datax.core.statistics.plugin.task.StdoutPluginCollector", "maxDirtyNumber": 10 } } } } } ================================================ FILE: core/src/main/conf/logback.xml ================================================ UTF-8 %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{0} - %msg%n UTF-8 ${log.dir}/${ymd}/${log.file.name}-${byMillionSecond}.log false %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{0} - %msg%n UTF-8 ${perf.dir}/${ymd}/${log.file.name}-${byMillionSecond}.log false %msg%n ================================================ FILE: core/src/main/java/com/alibaba/datax/core/AbstractContainer.java ================================================ package com.alibaba.datax.core; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.container.communicator.AbstractContainerCommunicator; import org.apache.commons.lang.Validate; /** * 执行容器的抽象类,持有该容器全局的配置 configuration */ public abstract class AbstractContainer { protected Configuration configuration; protected AbstractContainerCommunicator containerCommunicator; public AbstractContainer(Configuration configuration) { Validate.notNull(configuration, "Configuration can not be null."); this.configuration = configuration; } public Configuration getConfiguration() { return configuration; } public AbstractContainerCommunicator getContainerCommunicator() { return containerCommunicator; } public void setContainerCommunicator(AbstractContainerCommunicator containerCommunicator) { this.containerCommunicator = containerCommunicator; } public abstract void start(); } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/Engine.java ================================================ package com.alibaba.datax.core; import com.alibaba.datax.common.element.ColumnCast; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.spi.ErrorCode; import com.alibaba.datax.common.statistics.PerfTrace; import com.alibaba.datax.common.statistics.VMInfo; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.core.job.JobContainer; import com.alibaba.datax.core.taskgroup.TaskGroupContainer; import com.alibaba.datax.core.util.ConfigParser; import com.alibaba.datax.core.util.ConfigurationValidate; import com.alibaba.datax.core.util.ExceptionTracker; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.core.util.container.LoadUtil; import org.apache.commons.cli.BasicParser; import org.apache.commons.cli.CommandLine; import org.apache.commons.cli.Options; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.Arrays; import java.util.List; import java.util.Set; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * Engine是DataX入口类,该类负责初始化Job或者Task的运行容器,并运行插件的Job或者Task逻辑 */ public class Engine { private static final Logger LOG = LoggerFactory.getLogger(Engine.class); private static String RUNTIME_MODE; /* check job model (job/task) first */ public void start(Configuration allConf) { // 绑定column转换信息 ColumnCast.bind(allConf); /** * 初始化PluginLoader,可以获取各种插件配置 */ LoadUtil.bind(allConf); boolean isJob = !("taskGroup".equalsIgnoreCase(allConf .getString(CoreConstant.DATAX_CORE_CONTAINER_MODEL))); //JobContainer会在schedule后再行进行设置和调整值 int channelNumber =0; AbstractContainer container; long instanceId; int taskGroupId = -1; if (isJob) { allConf.set(CoreConstant.DATAX_CORE_CONTAINER_JOB_MODE, RUNTIME_MODE); container = new JobContainer(allConf); instanceId = allConf.getLong( CoreConstant.DATAX_CORE_CONTAINER_JOB_ID, 0); } else { container = new TaskGroupContainer(allConf); instanceId = allConf.getLong( CoreConstant.DATAX_CORE_CONTAINER_JOB_ID); taskGroupId = allConf.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID); channelNumber = allConf.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL); } //缺省打开perfTrace boolean traceEnable = allConf.getBool(CoreConstant.DATAX_CORE_CONTAINER_TRACE_ENABLE, true); boolean perfReportEnable = allConf.getBool(CoreConstant.DATAX_CORE_REPORT_DATAX_PERFLOG, true); //standalone模式的 datax shell任务不进行汇报 if(instanceId == -1){ perfReportEnable = false; } Configuration jobInfoConfig = allConf.getConfiguration(CoreConstant.DATAX_JOB_JOBINFO); //初始化PerfTrace PerfTrace perfTrace = PerfTrace.getInstance(isJob, instanceId, taskGroupId, traceEnable); perfTrace.setJobInfo(jobInfoConfig,perfReportEnable,channelNumber); container.start(); } // 注意屏蔽敏感信息 public static String filterJobConfiguration(final Configuration configuration) { Configuration jobConfWithSetting = configuration.getConfiguration("job").clone(); Configuration jobContent = jobConfWithSetting.getConfiguration("content"); filterSensitiveConfiguration(jobContent); jobConfWithSetting.set("content",jobContent); return jobConfWithSetting.beautify(); } public static Configuration filterSensitiveConfiguration(Configuration configuration){ Set keys = configuration.getKeys(); for (final String key : keys) { boolean isSensitive = StringUtils.endsWithIgnoreCase(key, "password") || StringUtils.endsWithIgnoreCase(key, "accessKey"); if (isSensitive && configuration.get(key) instanceof String) { configuration.set(key, configuration.getString(key).replaceAll(".", "*")); } } return configuration; } public static void entry(final String[] args) throws Throwable { Options options = new Options(); options.addOption("job", true, "Job config."); options.addOption("jobid", true, "Job unique id."); options.addOption("mode", true, "Job runtime mode."); BasicParser parser = new BasicParser(); CommandLine cl = parser.parse(options, args); String jobPath = cl.getOptionValue("job"); // 如果用户没有明确指定jobid, 则 datax.py 会指定 jobid 默认值为-1 String jobIdString = cl.getOptionValue("jobid"); RUNTIME_MODE = cl.getOptionValue("mode"); Configuration configuration = ConfigParser.parse(jobPath); // 绑定i18n信息 MessageSource.init(configuration); MessageSource.reloadResourceBundle(Configuration.class); long jobId; if (!"-1".equalsIgnoreCase(jobIdString)) { jobId = Long.parseLong(jobIdString); } else { // only for dsc & ds & datax 3 update String dscJobUrlPatternString = "/instance/(\\d{1,})/config.xml"; String dsJobUrlPatternString = "/inner/job/(\\d{1,})/config"; String dsTaskGroupUrlPatternString = "/inner/job/(\\d{1,})/taskGroup/"; List patternStringList = Arrays.asList(dscJobUrlPatternString, dsJobUrlPatternString, dsTaskGroupUrlPatternString); jobId = parseJobIdFromUrl(patternStringList, jobPath); } boolean isStandAloneMode = "standalone".equalsIgnoreCase(RUNTIME_MODE); if (!isStandAloneMode && jobId == -1) { // 如果不是 standalone 模式,那么 jobId 一定不能为-1 throw DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR, "非 standalone 模式必须在 URL 中提供有效的 jobId."); } configuration.set(CoreConstant.DATAX_CORE_CONTAINER_JOB_ID, jobId); //打印vmInfo VMInfo vmInfo = VMInfo.getVmInfo(); if (vmInfo != null) { LOG.info(vmInfo.toString()); } LOG.info("\n" + Engine.filterJobConfiguration(configuration) + "\n"); LOG.debug(configuration.toJSON()); ConfigurationValidate.doValidate(configuration); Engine engine = new Engine(); engine.start(configuration); } /** * -1 表示未能解析到 jobId * * only for dsc & ds & datax 3 update */ private static long parseJobIdFromUrl(List patternStringList, String url) { long result = -1; for (String patternString : patternStringList) { result = doParseJobIdFromUrl(patternString, url); if (result != -1) { return result; } } return result; } private static long doParseJobIdFromUrl(String patternString, String url) { Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(url); if (matcher.find()) { return Long.parseLong(matcher.group(1)); } return -1; } public static void main(String[] args) throws Exception { int exitCode = 0; try { Engine.entry(args); } catch (Throwable e) { exitCode = 1; LOG.error("\n\n经DataX智能分析,该任务最可能的错误原因是:\n" + ExceptionTracker.trace(e)); if (e instanceof DataXException) { DataXException tempException = (DataXException) e; ErrorCode errorCode = tempException.getErrorCode(); if (errorCode instanceof FrameworkErrorCode) { FrameworkErrorCode tempErrorCode = (FrameworkErrorCode) errorCode; exitCode = tempErrorCode.toExitValue(); } } System.exit(exitCode); } System.exit(exitCode); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/LocalStrings.properties ================================================ very_like_yixiao=\u4e00{0}\u4e8c{1}\u4e09 engine.1=\u975e standalone \u6a21\u5f0f\u5fc5\u987b\u5728 URL \u4e2d\u63d0\u4f9b\u6709\u6548\u7684 jobId. engine.2=\n\n\u7ecfDataX\u667a\u80fd\u5206\u6790,\u8be5\u4efb\u52a1\u6700\u53ef\u80fd\u7684\u9519\u8bef\u539f\u56e0\u662f:\n{0} ================================================ FILE: core/src/main/java/com/alibaba/datax/core/LocalStrings_en_US.properties ================================================ very_like_yixiao=1{0}2{1}3 engine.1=A valid job ID must be provided in the URL for the non-standalone mode. engine.2=\n\nThrough the intelligent analysis by DataX, the most likely error reason of this task is: \n{0} ================================================ FILE: core/src/main/java/com/alibaba/datax/core/LocalStrings_ja_JP.properties ================================================ very_like_yixiao=1{0}2{1}3 engine.1=\u975e standalone \u6a21\u5f0f\u5fc5\u987b\u5728 URL \u4e2d\u63d0\u4f9b\u6709\u6548\u7684 jobId. engine.2=\n\n\u7ecfDataX\u667a\u80fd\u5206\u6790,\u8be5\u4efb\u52a1\u6700\u53ef\u80fd\u7684\u9519\u8bef\u539f\u56e0\u662f:\n{0} ================================================ FILE: core/src/main/java/com/alibaba/datax/core/LocalStrings_zh_CN.properties ================================================ very_like_yixiao=\u4e00{0}\u4e8c{1}\u4e09 engine.1=\u975e standalone \u6a21\u5f0f\u5fc5\u987b\u5728 URL \u4e2d\u63d0\u4f9b\u6709\u6548\u7684 jobId. engine.2=\n\n\u7ecfDataX\u667a\u80fd\u5206\u6790,\u8be5\u4efb\u52a1\u6700\u53ef\u80fd\u7684\u9519\u8bef\u539f\u56e0\u662f:\n{0} ================================================ FILE: core/src/main/java/com/alibaba/datax/core/LocalStrings_zh_HK.properties ================================================ very_like_yixiao=\u4e00{0}\u4e8c{1}\u4e09 engine.1=\u975e standalone \u6a21\u5f0f\u5fc5\u987b\u5728 URL \u4e2d\u63d0\u4f9b\u6709\u6548\u7684 jobId. engine.2=\n\n\u7ecfDataX\u667a\u80fd\u5206\u6790,\u8be5\u4efb\u52a1\u6700\u53ef\u80fd\u7684\u9519\u8bef\u539f\u56e0\u662f:\n{0} very_like_yixiao=一{0}二{1}三 engine.1=非 standalone 模式必須在 URL 中提供有效的 jobId. engine.2=\n\n經DataX智能分析,該任務最可能的錯誤原因是:\n{0} ================================================ FILE: core/src/main/java/com/alibaba/datax/core/LocalStrings_zh_TW.properties ================================================ very_like_yixiao=\u4e00{0}\u4e8c{1}\u4e09 engine.1=\u975e standalone \u6a21\u5f0f\u5fc5\u987b\u5728 URL \u4e2d\u63d0\u4f9b\u6709\u6548\u7684 jobId. engine.2=\n\n\u7ecfDataX\u667a\u80fd\u5206\u6790,\u8be5\u4efb\u52a1\u6700\u53ef\u80fd\u7684\u9519\u8bef\u539f\u56e0\u662f:\n{0} very_like_yixiao=一{0}二{1}三 engine.1=非 standalone 模式必須在 URL 中提供有效的 jobId. engine.2=\n\n經DataX智能分析,該任務最可能的錯誤原因是:\n{0} ================================================ FILE: core/src/main/java/com/alibaba/datax/core/container/util/HookInvoker.java ================================================ package com.alibaba.datax.core.container.util; /** * Created by xiafei.qiuxf on 14/12/17. */ import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.spi.Hook; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.JarLoader; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.File; import java.io.FilenameFilter; import java.util.HashMap; import java.util.Iterator; import java.util.Map; import java.util.ServiceLoader; /** * 扫描给定目录的所有一级子目录,每个子目录当作一个Hook的目录。 * 对于每个子目录,必须符合ServiceLoader的标准目录格式,见http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html。 * 加载里头的jar,使用ServiceLoader机制调用。 */ public class HookInvoker { private static final Logger LOG = LoggerFactory.getLogger(HookInvoker.class); private final Map msg; private final Configuration conf; private File baseDir; public HookInvoker(String baseDirName, Configuration conf, Map msg) { this.baseDir = new File(baseDirName); this.conf = conf; this.msg = msg; } public void invokeAll() { if (!baseDir.exists() || baseDir.isFile()) { LOG.info("No hook invoked, because base dir not exists or is a file: " + baseDir.getAbsolutePath()); return; } String[] subDirs = baseDir.list(new FilenameFilter() { @Override public boolean accept(File dir, String name) { return new File(dir, name).isDirectory(); } }); if (subDirs == null) { throw DataXException.asDataXException(FrameworkErrorCode.HOOK_LOAD_ERROR, "获取HOOK子目录返回null"); } for (String subDir : subDirs) { doInvoke(new File(baseDir, subDir).getAbsolutePath()); } } private void doInvoke(String path) { ClassLoader oldClassLoader = Thread.currentThread().getContextClassLoader(); try { JarLoader jarLoader = new JarLoader(new String[]{path}); Thread.currentThread().setContextClassLoader(jarLoader); Iterator hookIt = ServiceLoader.load(Hook.class).iterator(); if (!hookIt.hasNext()) { LOG.warn("No hook defined under path: " + path); } else { Hook hook = hookIt.next(); LOG.info("Invoke hook [{}], path: {}", hook.getName(), path); hook.invoke(conf, msg); } } catch (Exception e) { LOG.error("Exception when invoke hook", e); throw DataXException.asDataXException( CommonErrorCode.HOOK_INTERNAL_ERROR, "Exception when invoke hook", e); } finally { Thread.currentThread().setContextClassLoader(oldClassLoader); } } public static void main(String[] args) { new HookInvoker("/Users/xiafei/workspace/datax3/target/datax/datax/hook", null, new HashMap()).invokeAll(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/container/util/JobAssignUtil.java ================================================ package com.alibaba.datax.core.container.util; import com.alibaba.datax.common.constant.CommonConstant; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.util.container.CoreConstant; import org.apache.commons.lang.Validate; import org.apache.commons.lang3.StringUtils; import java.util.*; public final class JobAssignUtil { private JobAssignUtil() { } /** * 公平的分配 task 到对应的 taskGroup 中。 * 公平体现在:会考虑 task 中对资源负载作的 load 标识进行更均衡的作业分配操作。 * TODO 具体文档举例说明 */ public static List assignFairly(Configuration configuration, int channelNumber, int channelsPerTaskGroup) { Validate.isTrue(configuration != null, "框架获得的 Job 不能为 null."); List contentConfig = configuration.getListConfiguration(CoreConstant.DATAX_JOB_CONTENT); Validate.isTrue(contentConfig.size() > 0, "框架获得的切分后的 Job 无内容."); Validate.isTrue(channelNumber > 0 && channelsPerTaskGroup > 0, "每个channel的平均task数[averTaskPerChannel],channel数目[channelNumber],每个taskGroup的平均channel数[channelsPerTaskGroup]都应该为正数"); int taskGroupNumber = (int) Math.ceil(1.0 * channelNumber / channelsPerTaskGroup); Configuration aTaskConfig = contentConfig.get(0); String readerResourceMark = aTaskConfig.getString(CoreConstant.JOB_READER_PARAMETER + "." + CommonConstant.LOAD_BALANCE_RESOURCE_MARK); String writerResourceMark = aTaskConfig.getString(CoreConstant.JOB_WRITER_PARAMETER + "." + CommonConstant.LOAD_BALANCE_RESOURCE_MARK); boolean hasLoadBalanceResourceMark = StringUtils.isNotBlank(readerResourceMark) || StringUtils.isNotBlank(writerResourceMark); if (!hasLoadBalanceResourceMark) { // fake 一个固定的 key 作为资源标识(在 reader 或者 writer 上均可,此处选择在 reader 上进行 fake) for (Configuration conf : contentConfig) { conf.set(CoreConstant.JOB_READER_PARAMETER + "." + CommonConstant.LOAD_BALANCE_RESOURCE_MARK, "aFakeResourceMarkForLoadBalance"); } // 是为了避免某些插件没有设置 资源标识 而进行了一次随机打乱操作 Collections.shuffle(contentConfig, new Random(System.currentTimeMillis())); } LinkedHashMap> resourceMarkAndTaskIdMap = parseAndGetResourceMarkAndTaskIdMap(contentConfig); List taskGroupConfig = doAssign(resourceMarkAndTaskIdMap, configuration, taskGroupNumber); // 调整 每个 taskGroup 对应的 Channel 个数(属于优化范畴) adjustChannelNumPerTaskGroup(taskGroupConfig, channelNumber); return taskGroupConfig; } private static void adjustChannelNumPerTaskGroup(List taskGroupConfig, int channelNumber) { int taskGroupNumber = taskGroupConfig.size(); int avgChannelsPerTaskGroup = channelNumber / taskGroupNumber; int remainderChannelCount = channelNumber % taskGroupNumber; // 表示有 remainderChannelCount 个 taskGroup,其对应 Channel 个数应该为:avgChannelsPerTaskGroup + 1; // (taskGroupNumber - remainderChannelCount)个 taskGroup,其对应 Channel 个数应该为:avgChannelsPerTaskGroup int i = 0; for (; i < remainderChannelCount; i++) { taskGroupConfig.get(i).set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL, avgChannelsPerTaskGroup + 1); } for (int j = 0; j < taskGroupNumber - remainderChannelCount; j++) { taskGroupConfig.get(i + j).set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL, avgChannelsPerTaskGroup); } } /** * 根据task 配置,获取到: * 资源名称 --> taskId(List) 的 map 映射关系 */ private static LinkedHashMap> parseAndGetResourceMarkAndTaskIdMap(List contentConfig) { // key: resourceMark, value: taskId LinkedHashMap> readerResourceMarkAndTaskIdMap = new LinkedHashMap>(); LinkedHashMap> writerResourceMarkAndTaskIdMap = new LinkedHashMap>(); for (Configuration aTaskConfig : contentConfig) { int taskId = aTaskConfig.getInt(CoreConstant.TASK_ID); // 把 readerResourceMark 加到 readerResourceMarkAndTaskIdMap 中 String readerResourceMark = aTaskConfig.getString(CoreConstant.JOB_READER_PARAMETER + "." + CommonConstant.LOAD_BALANCE_RESOURCE_MARK); if (readerResourceMarkAndTaskIdMap.get(readerResourceMark) == null) { readerResourceMarkAndTaskIdMap.put(readerResourceMark, new LinkedList()); } readerResourceMarkAndTaskIdMap.get(readerResourceMark).add(taskId); // 把 writerResourceMark 加到 writerResourceMarkAndTaskIdMap 中 String writerResourceMark = aTaskConfig.getString(CoreConstant.JOB_WRITER_PARAMETER + "." + CommonConstant.LOAD_BALANCE_RESOURCE_MARK); if (writerResourceMarkAndTaskIdMap.get(writerResourceMark) == null) { writerResourceMarkAndTaskIdMap.put(writerResourceMark, new LinkedList()); } writerResourceMarkAndTaskIdMap.get(writerResourceMark).add(taskId); } if (readerResourceMarkAndTaskIdMap.size() >= writerResourceMarkAndTaskIdMap.size()) { // 采用 reader 对资源做的标记进行 shuffle return readerResourceMarkAndTaskIdMap; } else { // 采用 writer 对资源做的标记进行 shuffle return writerResourceMarkAndTaskIdMap; } } /** * /** * 需要实现的效果通过例子来说是: *

     * a 库上有表:0, 1, 2
     * b 库上有表:3, 4
     * c 库上有表:5, 6, 7
     *
     * 如果有 4个 taskGroup
     * 则 assign 后的结果为:
     * taskGroup-0: 0,  4,
     * taskGroup-1: 3,  6,
     * taskGroup-2: 5,  2,
     * taskGroup-3: 1,  7
     *
     * 
*/ private static List doAssign(LinkedHashMap> resourceMarkAndTaskIdMap, Configuration jobConfiguration, int taskGroupNumber) { List contentConfig = jobConfiguration.getListConfiguration(CoreConstant.DATAX_JOB_CONTENT); Configuration taskGroupTemplate = jobConfiguration.clone(); taskGroupTemplate.remove(CoreConstant.DATAX_JOB_CONTENT); List result = new LinkedList(); List> taskGroupConfigList = new ArrayList>(taskGroupNumber); for (int i = 0; i < taskGroupNumber; i++) { taskGroupConfigList.add(new LinkedList()); } int mapValueMaxLength = -1; List resourceMarks = new ArrayList(); for (Map.Entry> entry : resourceMarkAndTaskIdMap.entrySet()) { resourceMarks.add(entry.getKey()); if (entry.getValue().size() > mapValueMaxLength) { mapValueMaxLength = entry.getValue().size(); } } int taskGroupIndex = 0; for (int i = 0; i < mapValueMaxLength; i++) { for (String resourceMark : resourceMarks) { if (resourceMarkAndTaskIdMap.get(resourceMark).size() > 0) { int taskId = resourceMarkAndTaskIdMap.get(resourceMark).get(0); taskGroupConfigList.get(taskGroupIndex % taskGroupNumber).add(contentConfig.get(taskId)); taskGroupIndex++; resourceMarkAndTaskIdMap.get(resourceMark).remove(0); } } } Configuration tempTaskGroupConfig; for (int i = 0; i < taskGroupNumber; i++) { tempTaskGroupConfig = taskGroupTemplate.clone(); tempTaskGroupConfig.set(CoreConstant.DATAX_JOB_CONTENT, taskGroupConfigList.get(i)); tempTaskGroupConfig.set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID, i); result.add(tempTaskGroupConfig); } return result; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/job/JobContainer.java ================================================ package com.alibaba.datax.core.job; import com.alibaba.datax.common.constant.PluginType; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.AbstractJobPlugin; import com.alibaba.datax.common.plugin.JobPluginCollector; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.statistics.PerfTrace; import com.alibaba.datax.common.statistics.VMInfo; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.StrUtil; import com.alibaba.datax.core.AbstractContainer; import com.alibaba.datax.core.Engine; import com.alibaba.datax.core.container.util.HookInvoker; import com.alibaba.datax.core.container.util.JobAssignUtil; import com.alibaba.datax.core.job.scheduler.AbstractScheduler; import com.alibaba.datax.core.job.scheduler.processinner.StandAloneScheduler; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.core.statistics.container.communicator.AbstractContainerCommunicator; import com.alibaba.datax.core.statistics.container.communicator.job.StandAloneJobContainerCommunicator; import com.alibaba.datax.core.statistics.plugin.DefaultJobPluginCollector; import com.alibaba.datax.core.util.ErrorRecordChecker; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.ClassLoaderSwapper; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.core.util.container.LoadUtil; import com.alibaba.datax.dataxservice.face.domain.enums.ExecuteMode; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang.StringUtils; import org.apache.commons.lang.Validate; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.List; /** * Created by jingxing on 14-8-24. *

* job实例运行在jobContainer容器中,它是所有任务的master,负责初始化、拆分、调度、运行、回收、监控和汇报 * 但它并不做实际的数据同步操作 */ public class JobContainer extends AbstractContainer { private static final Logger LOG = LoggerFactory .getLogger(JobContainer.class); private static final SimpleDateFormat dateFormat = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss"); private ClassLoaderSwapper classLoaderSwapper = ClassLoaderSwapper .newCurrentThreadClassLoaderSwapper(); private long jobId; private String readerPluginName; private String writerPluginName; /** * reader和writer jobContainer的实例 */ private Reader.Job jobReader; private Writer.Job jobWriter; private Configuration userConf; private long startTimeStamp; private long endTimeStamp; private long startTransferTimeStamp; private long endTransferTimeStamp; private int needChannelNumber; private int totalStage = 1; private ErrorRecordChecker errorLimit; public JobContainer(Configuration configuration) { super(configuration); errorLimit = new ErrorRecordChecker(configuration); } /** * jobContainer主要负责的工作全部在start()里面,包括init、prepare、split、scheduler、 * post以及destroy和statistics */ @Override public void start() { LOG.info("DataX jobContainer starts job."); boolean hasException = false; boolean isDryRun = false; try { this.startTimeStamp = System.currentTimeMillis(); isDryRun = configuration.getBool(CoreConstant.DATAX_JOB_SETTING_DRYRUN, false); if(isDryRun) { LOG.info("jobContainer starts to do preCheck ..."); this.preCheck(); } else { userConf = configuration.clone(); LOG.debug("jobContainer starts to do preHandle ..."); this.preHandle(); LOG.debug("jobContainer starts to do init ..."); this.init(); LOG.info("jobContainer starts to do prepare ..."); this.prepare(); LOG.info("jobContainer starts to do split ..."); this.totalStage = this.split(); LOG.info("jobContainer starts to do schedule ..."); this.schedule(); LOG.debug("jobContainer starts to do post ..."); this.post(); LOG.debug("jobContainer starts to do postHandle ..."); this.postHandle(); LOG.info("DataX jobId [{}] completed successfully.", this.jobId); this.invokeHooks(); } } catch (Throwable e) { LOG.error("Exception when job run", e); hasException = true; if (e instanceof OutOfMemoryError) { this.destroy(); System.gc(); } if (super.getContainerCommunicator() == null) { // 由于 containerCollector 是在 scheduler() 中初始化的,所以当在 scheduler() 之前出现异常时,需要在此处对 containerCollector 进行初始化 AbstractContainerCommunicator tempContainerCollector; // standalone tempContainerCollector = new StandAloneJobContainerCommunicator(configuration); super.setContainerCommunicator(tempContainerCollector); } Communication communication = super.getContainerCommunicator().collect(); // 汇报前的状态,不需要手动进行设置 // communication.setState(State.FAILED); communication.setThrowable(e); communication.setTimestamp(this.endTimeStamp); Communication tempComm = new Communication(); tempComm.setTimestamp(this.startTransferTimeStamp); Communication reportCommunication = CommunicationTool.getReportCommunication(communication, tempComm, this.totalStage); super.getContainerCommunicator().report(reportCommunication); throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, e); } finally { if(!isDryRun) { this.destroy(); this.endTimeStamp = System.currentTimeMillis(); if (!hasException) { //最后打印cpu的平均消耗,GC的统计 VMInfo vmInfo = VMInfo.getVmInfo(); if (vmInfo != null) { vmInfo.getDelta(false); LOG.info(vmInfo.totalString()); } LOG.info(PerfTrace.getInstance().summarizeNoException()); this.logStatistics(); } } } } private void preCheck() { this.preCheckInit(); this.adjustChannelNumber(); if (this.needChannelNumber <= 0) { this.needChannelNumber = 1; } this.preCheckReader(); this.preCheckWriter(); LOG.info("PreCheck通过"); } private void preCheckInit() { this.jobId = this.configuration.getLong( CoreConstant.DATAX_CORE_CONTAINER_JOB_ID, -1); if (this.jobId < 0) { LOG.info("Set jobId = 0"); this.jobId = 0; this.configuration.set(CoreConstant.DATAX_CORE_CONTAINER_JOB_ID, this.jobId); } Thread.currentThread().setName("job-" + this.jobId); JobPluginCollector jobPluginCollector = new DefaultJobPluginCollector( this.getContainerCommunicator()); this.jobReader = this.preCheckReaderInit(jobPluginCollector); this.jobWriter = this.preCheckWriterInit(jobPluginCollector); } private Reader.Job preCheckReaderInit(JobPluginCollector jobPluginCollector) { this.readerPluginName = this.configuration.getString( CoreConstant.DATAX_JOB_CONTENT_READER_NAME); classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.READER, this.readerPluginName)); Reader.Job jobReader = (Reader.Job) LoadUtil.loadJobPlugin( PluginType.READER, this.readerPluginName); this.configuration.set(CoreConstant.DATAX_JOB_CONTENT_READER_PARAMETER + ".dryRun", true); // 设置reader的jobConfig jobReader.setPluginJobConf(this.configuration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_READER_PARAMETER)); // 设置reader的readerConfig jobReader.setPeerPluginJobConf(this.configuration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_READER_PARAMETER)); jobReader.setJobPluginCollector(jobPluginCollector); classLoaderSwapper.restoreCurrentThreadClassLoader(); return jobReader; } private Writer.Job preCheckWriterInit(JobPluginCollector jobPluginCollector) { this.writerPluginName = this.configuration.getString( CoreConstant.DATAX_JOB_CONTENT_WRITER_NAME); classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.WRITER, this.writerPluginName)); Writer.Job jobWriter = (Writer.Job) LoadUtil.loadJobPlugin( PluginType.WRITER, this.writerPluginName); this.configuration.set(CoreConstant.DATAX_JOB_CONTENT_WRITER_PARAMETER + ".dryRun", true); // 设置writer的jobConfig jobWriter.setPluginJobConf(this.configuration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_WRITER_PARAMETER)); // 设置reader的readerConfig jobWriter.setPeerPluginJobConf(this.configuration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_READER_PARAMETER)); jobWriter.setPeerPluginName(this.readerPluginName); jobWriter.setJobPluginCollector(jobPluginCollector); classLoaderSwapper.restoreCurrentThreadClassLoader(); return jobWriter; } private void preCheckReader() { classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.READER, this.readerPluginName)); LOG.info(String.format("DataX Reader.Job [%s] do preCheck work .", this.readerPluginName)); this.jobReader.preCheck(); classLoaderSwapper.restoreCurrentThreadClassLoader(); } private void preCheckWriter() { classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.WRITER, this.writerPluginName)); LOG.info(String.format("DataX Writer.Job [%s] do preCheck work .", this.writerPluginName)); this.jobWriter.preCheck(); classLoaderSwapper.restoreCurrentThreadClassLoader(); } /** * reader和writer的初始化 */ private void init() { this.jobId = this.configuration.getLong( CoreConstant.DATAX_CORE_CONTAINER_JOB_ID, -1); if (this.jobId < 0) { LOG.info("Set jobId = 0"); this.jobId = 0; this.configuration.set(CoreConstant.DATAX_CORE_CONTAINER_JOB_ID, this.jobId); } Thread.currentThread().setName("job-" + this.jobId); JobPluginCollector jobPluginCollector = new DefaultJobPluginCollector( this.getContainerCommunicator()); //必须先Reader ,后Writer this.jobReader = this.initJobReader(jobPluginCollector); this.jobWriter = this.initJobWriter(jobPluginCollector); } private void prepare() { this.prepareJobReader(); this.prepareJobWriter(); } private void preHandle() { String handlerPluginTypeStr = this.configuration.getString( CoreConstant.DATAX_JOB_PREHANDLER_PLUGINTYPE); if(!StringUtils.isNotEmpty(handlerPluginTypeStr)){ return; } PluginType handlerPluginType; try { handlerPluginType = PluginType.valueOf(handlerPluginTypeStr.toUpperCase()); } catch (IllegalArgumentException e) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, String.format("Job preHandler's pluginType(%s) set error, reason(%s)", handlerPluginTypeStr.toUpperCase(), e.getMessage())); } String handlerPluginName = this.configuration.getString( CoreConstant.DATAX_JOB_PREHANDLER_PLUGINNAME); classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( handlerPluginType, handlerPluginName)); AbstractJobPlugin handler = LoadUtil.loadJobPlugin( handlerPluginType, handlerPluginName); JobPluginCollector jobPluginCollector = new DefaultJobPluginCollector( this.getContainerCommunicator()); handler.setJobPluginCollector(jobPluginCollector); //todo configuration的安全性,将来必须保证 handler.preHandler(configuration); classLoaderSwapper.restoreCurrentThreadClassLoader(); LOG.info("After PreHandler: \n" + Engine.filterJobConfiguration(configuration) + "\n"); } private void postHandle() { String handlerPluginTypeStr = this.configuration.getString( CoreConstant.DATAX_JOB_POSTHANDLER_PLUGINTYPE); if(!StringUtils.isNotEmpty(handlerPluginTypeStr)){ return; } PluginType handlerPluginType; try { handlerPluginType = PluginType.valueOf(handlerPluginTypeStr.toUpperCase()); } catch (IllegalArgumentException e) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, String.format("Job postHandler's pluginType(%s) set error, reason(%s)", handlerPluginTypeStr.toUpperCase(), e.getMessage())); } String handlerPluginName = this.configuration.getString( CoreConstant.DATAX_JOB_POSTHANDLER_PLUGINNAME); classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( handlerPluginType, handlerPluginName)); AbstractJobPlugin handler = LoadUtil.loadJobPlugin( handlerPluginType, handlerPluginName); JobPluginCollector jobPluginCollector = new DefaultJobPluginCollector( this.getContainerCommunicator()); handler.setJobPluginCollector(jobPluginCollector); handler.postHandler(configuration); classLoaderSwapper.restoreCurrentThreadClassLoader(); } /** * 执行reader和writer最细粒度的切分,需要注意的是,writer的切分结果要参照reader的切分结果, * 达到切分后数目相等,才能满足1:1的通道模型,所以这里可以将reader和writer的配置整合到一起, * 然后,为避免顺序给读写端带来长尾影响,将整合的结果shuffler掉 */ private int split() { this.adjustChannelNumber(); if (this.needChannelNumber <= 0) { this.needChannelNumber = 1; } List readerTaskConfigs = this .doReaderSplit(this.needChannelNumber); int taskNumber = readerTaskConfigs.size(); List writerTaskConfigs = this .doWriterSplit(taskNumber); List transformerList = this.configuration.getListConfiguration(CoreConstant.DATAX_JOB_CONTENT_TRANSFORMER); LOG.debug("transformer configuration: "+ JSON.toJSONString(transformerList)); /** * 输入是reader和writer的parameter list,输出是content下面元素的list */ List contentConfig = mergeReaderAndWriterTaskConfigs( readerTaskConfigs, writerTaskConfigs, transformerList); LOG.debug("contentConfig configuration: "+ JSON.toJSONString(contentConfig)); this.configuration.set(CoreConstant.DATAX_JOB_CONTENT, contentConfig); return contentConfig.size(); } private void adjustChannelNumber() { int needChannelNumberByByte = Integer.MAX_VALUE; int needChannelNumberByRecord = Integer.MAX_VALUE; boolean isByteLimit = (this.configuration.getInt( CoreConstant.DATAX_JOB_SETTING_SPEED_BYTE, 0) > 0); if (isByteLimit) { long globalLimitedByteSpeed = this.configuration.getInt( CoreConstant.DATAX_JOB_SETTING_SPEED_BYTE, 10 * 1024 * 1024); // 在byte流控情况下,单个Channel流量最大值必须设置,否则报错! Long channelLimitedByteSpeed = this.configuration .getLong(CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_BYTE); if (channelLimitedByteSpeed == null || channelLimitedByteSpeed <= 0) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, "在有总bps限速条件下,单个channel的bps值不能为空,也不能为非正数"); } needChannelNumberByByte = (int) (globalLimitedByteSpeed / channelLimitedByteSpeed); needChannelNumberByByte = needChannelNumberByByte > 0 ? needChannelNumberByByte : 1; LOG.info("Job set Max-Byte-Speed to " + globalLimitedByteSpeed + " bytes."); } boolean isRecordLimit = (this.configuration.getInt( CoreConstant.DATAX_JOB_SETTING_SPEED_RECORD, 0)) > 0; if (isRecordLimit) { long globalLimitedRecordSpeed = this.configuration.getInt( CoreConstant.DATAX_JOB_SETTING_SPEED_RECORD, 100000); Long channelLimitedRecordSpeed = this.configuration.getLong( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_RECORD); if (channelLimitedRecordSpeed == null || channelLimitedRecordSpeed <= 0) { throw DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR, "在有总tps限速条件下,单个channel的tps值不能为空,也不能为非正数"); } needChannelNumberByRecord = (int) (globalLimitedRecordSpeed / channelLimitedRecordSpeed); needChannelNumberByRecord = needChannelNumberByRecord > 0 ? needChannelNumberByRecord : 1; LOG.info("Job set Max-Record-Speed to " + globalLimitedRecordSpeed + " records."); } // 取较小值 this.needChannelNumber = needChannelNumberByByte < needChannelNumberByRecord ? needChannelNumberByByte : needChannelNumberByRecord; // 如果从byte或record上设置了needChannelNumber则退出 if (this.needChannelNumber < Integer.MAX_VALUE) { return; } boolean isChannelLimit = (this.configuration.getInt( CoreConstant.DATAX_JOB_SETTING_SPEED_CHANNEL, 0) > 0); if (isChannelLimit) { this.needChannelNumber = this.configuration.getInt( CoreConstant.DATAX_JOB_SETTING_SPEED_CHANNEL); LOG.info("Job set Channel-Number to " + this.needChannelNumber + " channels."); return; } throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, "Job运行速度必须设置"); } /** * schedule首先完成的工作是把上一步reader和writer split的结果整合到具体taskGroupContainer中, * 同时不同的执行模式调用不同的调度策略,将所有任务调度起来 */ private void schedule() { /** * 这里的全局speed和每个channel的速度设置为B/s */ int channelsPerTaskGroup = this.configuration.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL, 5); int taskNumber = this.configuration.getList( CoreConstant.DATAX_JOB_CONTENT).size(); this.needChannelNumber = Math.min(this.needChannelNumber, taskNumber); PerfTrace.getInstance().setChannelNumber(needChannelNumber); /** * 通过获取配置信息得到每个taskGroup需要运行哪些tasks任务 */ List taskGroupConfigs = JobAssignUtil.assignFairly(this.configuration, this.needChannelNumber, channelsPerTaskGroup); LOG.info("Scheduler starts [{}] taskGroups.", taskGroupConfigs.size()); ExecuteMode executeMode = null; AbstractScheduler scheduler; try { executeMode = ExecuteMode.STANDALONE; scheduler = initStandaloneScheduler(this.configuration); //设置 executeMode for (Configuration taskGroupConfig : taskGroupConfigs) { taskGroupConfig.set(CoreConstant.DATAX_CORE_CONTAINER_JOB_MODE, executeMode.getValue()); } if (executeMode == ExecuteMode.LOCAL || executeMode == ExecuteMode.DISTRIBUTE) { if (this.jobId <= 0) { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "在[ local | distribute ]模式下必须设置jobId,并且其值 > 0 ."); } } LOG.info("Running by {} Mode.", executeMode); this.startTransferTimeStamp = System.currentTimeMillis(); scheduler.schedule(taskGroupConfigs); this.endTransferTimeStamp = System.currentTimeMillis(); } catch (Exception e) { LOG.error("运行scheduler 模式[{}]出错.", executeMode); this.endTransferTimeStamp = System.currentTimeMillis(); throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, e); } /** * 检查任务执行情况 */ this.checkLimit(); } private AbstractScheduler initStandaloneScheduler(Configuration configuration) { AbstractContainerCommunicator containerCommunicator = new StandAloneJobContainerCommunicator(configuration); super.setContainerCommunicator(containerCommunicator); return new StandAloneScheduler(containerCommunicator); } private void post() { this.postJobWriter(); this.postJobReader(); } private void destroy() { if (this.jobWriter != null) { this.jobWriter.destroy(); this.jobWriter = null; } if (this.jobReader != null) { this.jobReader.destroy(); this.jobReader = null; } } private void logStatistics() { long totalCosts = (this.endTimeStamp - this.startTimeStamp) / 1000; long transferCosts = (this.endTransferTimeStamp - this.startTransferTimeStamp) / 1000; if (0L == transferCosts) { transferCosts = 1L; } if (super.getContainerCommunicator() == null) { return; } Communication communication = super.getContainerCommunicator().collect(); communication.setTimestamp(this.endTimeStamp); Communication tempComm = new Communication(); tempComm.setTimestamp(this.startTransferTimeStamp); Communication reportCommunication = CommunicationTool.getReportCommunication(communication, tempComm, this.totalStage); // 字节速率 long byteSpeedPerSecond = communication.getLongCounter(CommunicationTool.READ_SUCCEED_BYTES) / transferCosts; long recordSpeedPerSecond = communication.getLongCounter(CommunicationTool.READ_SUCCEED_RECORDS) / transferCosts; reportCommunication.setLongCounter(CommunicationTool.BYTE_SPEED, byteSpeedPerSecond); reportCommunication.setLongCounter(CommunicationTool.RECORD_SPEED, recordSpeedPerSecond); super.getContainerCommunicator().report(reportCommunication); LOG.info(String.format( "\n" + "%-26s: %-18s\n" + "%-26s: %-18s\n" + "%-26s: %19s\n" + "%-26s: %19s\n" + "%-26s: %19s\n" + "%-26s: %19s\n" + "%-26s: %19s\n", "任务启动时刻", dateFormat.format(startTimeStamp), "任务结束时刻", dateFormat.format(endTimeStamp), "任务总计耗时", String.valueOf(totalCosts) + "s", "任务平均流量", StrUtil.stringify(byteSpeedPerSecond) + "/s", "记录写入速度", String.valueOf(recordSpeedPerSecond) + "rec/s", "读出记录总数", String.valueOf(CommunicationTool.getTotalReadRecords(communication)), "读写失败总数", String.valueOf(CommunicationTool.getTotalErrorRecords(communication)) )); if (communication.getLongCounter(CommunicationTool.TRANSFORMER_SUCCEED_RECORDS) > 0 || communication.getLongCounter(CommunicationTool.TRANSFORMER_FAILED_RECORDS) > 0 || communication.getLongCounter(CommunicationTool.TRANSFORMER_FILTER_RECORDS) > 0) { LOG.info(String.format( "\n" + "%-26s: %19s\n" + "%-26s: %19s\n" + "%-26s: %19s\n", "Transformer成功记录总数", communication.getLongCounter(CommunicationTool.TRANSFORMER_SUCCEED_RECORDS), "Transformer失败记录总数", communication.getLongCounter(CommunicationTool.TRANSFORMER_FAILED_RECORDS), "Transformer过滤记录总数", communication.getLongCounter(CommunicationTool.TRANSFORMER_FILTER_RECORDS) )); } } /** * reader job的初始化,返回Reader.Job * * @return */ private Reader.Job initJobReader( JobPluginCollector jobPluginCollector) { this.readerPluginName = this.configuration.getString( CoreConstant.DATAX_JOB_CONTENT_READER_NAME); classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.READER, this.readerPluginName)); Reader.Job jobReader = (Reader.Job) LoadUtil.loadJobPlugin( PluginType.READER, this.readerPluginName); // 设置reader的jobConfig jobReader.setPluginJobConf(this.configuration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_READER_PARAMETER)); // 设置reader的readerConfig jobReader.setPeerPluginJobConf(this.configuration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_WRITER_PARAMETER)); jobReader.setJobPluginCollector(jobPluginCollector); jobReader.init(); classLoaderSwapper.restoreCurrentThreadClassLoader(); return jobReader; } /** * writer job的初始化,返回Writer.Job * * @return */ private Writer.Job initJobWriter( JobPluginCollector jobPluginCollector) { this.writerPluginName = this.configuration.getString( CoreConstant.DATAX_JOB_CONTENT_WRITER_NAME); classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.WRITER, this.writerPluginName)); Writer.Job jobWriter = (Writer.Job) LoadUtil.loadJobPlugin( PluginType.WRITER, this.writerPluginName); // 设置writer的jobConfig jobWriter.setPluginJobConf(this.configuration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_WRITER_PARAMETER)); // 设置reader的readerConfig jobWriter.setPeerPluginJobConf(this.configuration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_READER_PARAMETER)); jobWriter.setPeerPluginName(this.readerPluginName); jobWriter.setJobPluginCollector(jobPluginCollector); jobWriter.init(); classLoaderSwapper.restoreCurrentThreadClassLoader(); return jobWriter; } private void prepareJobReader() { classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.READER, this.readerPluginName)); LOG.info(String.format("DataX Reader.Job [%s] do prepare work .", this.readerPluginName)); this.jobReader.prepare(); classLoaderSwapper.restoreCurrentThreadClassLoader(); } private void prepareJobWriter() { classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.WRITER, this.writerPluginName)); LOG.info(String.format("DataX Writer.Job [%s] do prepare work .", this.writerPluginName)); this.jobWriter.prepare(); classLoaderSwapper.restoreCurrentThreadClassLoader(); } // TODO: 如果源头就是空数据 private List doReaderSplit(int adviceNumber) { classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.READER, this.readerPluginName)); List readerSlicesConfigs = this.jobReader.split(adviceNumber); if (readerSlicesConfigs == null || readerSlicesConfigs.size() <= 0) { throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_SPLIT_ERROR, "reader切分的task数目不能小于等于0"); } LOG.info("DataX Reader.Job [{}] splits to [{}] tasks.", this.readerPluginName, readerSlicesConfigs.size()); classLoaderSwapper.restoreCurrentThreadClassLoader(); return readerSlicesConfigs; } private List doWriterSplit(int readerTaskNumber) { classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.WRITER, this.writerPluginName)); List writerSlicesConfigs = this.jobWriter .split(readerTaskNumber); if (writerSlicesConfigs == null || writerSlicesConfigs.size() <= 0) { throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_SPLIT_ERROR, "writer切分的task不能小于等于0"); } LOG.info("DataX Writer.Job [{}] splits to [{}] tasks.", this.writerPluginName, writerSlicesConfigs.size()); classLoaderSwapper.restoreCurrentThreadClassLoader(); return writerSlicesConfigs; } /** * 按顺序整合reader和writer的配置,这里的顺序不能乱! 输入是reader、writer级别的配置,输出是一个完整task的配置 * * @param readerTasksConfigs * @param writerTasksConfigs * @return */ private List mergeReaderAndWriterTaskConfigs( List readerTasksConfigs, List writerTasksConfigs) { return mergeReaderAndWriterTaskConfigs(readerTasksConfigs, writerTasksConfigs, null); } private List mergeReaderAndWriterTaskConfigs( List readerTasksConfigs, List writerTasksConfigs, List transformerConfigs) { if (readerTasksConfigs.size() != writerTasksConfigs.size()) { throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_SPLIT_ERROR, String.format("reader切分的task数目[%d]不等于writer切分的task数目[%d].", readerTasksConfigs.size(), writerTasksConfigs.size()) ); } List contentConfigs = new ArrayList(); for (int i = 0; i < readerTasksConfigs.size(); i++) { Configuration taskConfig = Configuration.newDefault(); taskConfig.set(CoreConstant.JOB_READER_NAME, this.readerPluginName); taskConfig.set(CoreConstant.JOB_READER_PARAMETER, readerTasksConfigs.get(i)); taskConfig.set(CoreConstant.JOB_WRITER_NAME, this.writerPluginName); taskConfig.set(CoreConstant.JOB_WRITER_PARAMETER, writerTasksConfigs.get(i)); if(transformerConfigs!=null && transformerConfigs.size()>0){ taskConfig.set(CoreConstant.JOB_TRANSFORMER, transformerConfigs); } taskConfig.set(CoreConstant.TASK_ID, i); contentConfigs.add(taskConfig); } return contentConfigs; } /** * 这里比较复杂,分两步整合 1. tasks到channel 2. channel到taskGroup * 合起来考虑,其实就是把tasks整合到taskGroup中,需要满足计算出的channel数,同时不能多起channel *

* example: *

* 前提条件: 切分后是1024个分表,假设用户要求总速率是1000M/s,每个channel的速率的3M/s, * 每个taskGroup负责运行7个channel *

* 计算: 总channel数为:1000M/s / 3M/s = * 333个,为平均分配,计算可知有308个每个channel有3个tasks,而有25个每个channel有4个tasks, * 需要的taskGroup数为:333 / 7 = * 47...4,也就是需要48个taskGroup,47个是每个负责7个channel,有4个负责1个channel *

* 处理:我们先将这负责4个channel的taskGroup处理掉,逻辑是: * 先按平均为3个tasks找4个channel,设置taskGroupId为0, * 接下来就像发牌一样轮询分配task到剩下的包含平均channel数的taskGroup中 *

* TODO delete it * * @param averTaskPerChannel * @param channelNumber * @param channelsPerTaskGroup * @return 每个taskGroup独立的全部配置 */ @SuppressWarnings("serial") private List distributeTasksToTaskGroup( int averTaskPerChannel, int channelNumber, int channelsPerTaskGroup) { Validate.isTrue(averTaskPerChannel > 0 && channelNumber > 0 && channelsPerTaskGroup > 0, "每个channel的平均task数[averTaskPerChannel],channel数目[channelNumber],每个taskGroup的平均channel数[channelsPerTaskGroup]都应该为正数"); List taskConfigs = this.configuration .getListConfiguration(CoreConstant.DATAX_JOB_CONTENT); int taskGroupNumber = channelNumber / channelsPerTaskGroup; int leftChannelNumber = channelNumber % channelsPerTaskGroup; if (leftChannelNumber > 0) { taskGroupNumber += 1; } /** * 如果只有一个taskGroup,直接打标返回 */ if (taskGroupNumber == 1) { final Configuration taskGroupConfig = this.configuration.clone(); /** * configure的clone不能clone出 */ taskGroupConfig.set(CoreConstant.DATAX_JOB_CONTENT, this.configuration .getListConfiguration(CoreConstant.DATAX_JOB_CONTENT)); taskGroupConfig.set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL, channelNumber); taskGroupConfig.set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID, 0); return new ArrayList() { { add(taskGroupConfig); } }; } List taskGroupConfigs = new ArrayList(); /** * 将每个taskGroup中content的配置清空 */ for (int i = 0; i < taskGroupNumber; i++) { Configuration taskGroupConfig = this.configuration.clone(); List taskGroupJobContent = taskGroupConfig .getListConfiguration(CoreConstant.DATAX_JOB_CONTENT); taskGroupJobContent.clear(); taskGroupConfig.set(CoreConstant.DATAX_JOB_CONTENT, taskGroupJobContent); taskGroupConfigs.add(taskGroupConfig); } int taskConfigIndex = 0; int channelIndex = 0; int taskGroupConfigIndex = 0; /** * 先处理掉taskGroup包含channel数不是平均值的taskGroup */ if (leftChannelNumber > 0) { Configuration taskGroupConfig = taskGroupConfigs.get(taskGroupConfigIndex); for (; channelIndex < leftChannelNumber; channelIndex++) { for (int i = 0; i < averTaskPerChannel; i++) { List taskGroupJobContent = taskGroupConfig .getListConfiguration(CoreConstant.DATAX_JOB_CONTENT); taskGroupJobContent.add(taskConfigs.get(taskConfigIndex++)); taskGroupConfig.set(CoreConstant.DATAX_JOB_CONTENT, taskGroupJobContent); } } taskGroupConfig.set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL, leftChannelNumber); taskGroupConfig.set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID, taskGroupConfigIndex++); } /** * 下面需要轮询分配,并打上channel数和taskGroupId标记 */ int equalDivisionStartIndex = taskGroupConfigIndex; for (; taskConfigIndex < taskConfigs.size() && equalDivisionStartIndex < taskGroupConfigs.size(); ) { for (taskGroupConfigIndex = equalDivisionStartIndex; taskGroupConfigIndex < taskGroupConfigs .size() && taskConfigIndex < taskConfigs.size(); taskGroupConfigIndex++) { Configuration taskGroupConfig = taskGroupConfigs.get(taskGroupConfigIndex); List taskGroupJobContent = taskGroupConfig .getListConfiguration(CoreConstant.DATAX_JOB_CONTENT); taskGroupJobContent.add(taskConfigs.get(taskConfigIndex++)); taskGroupConfig.set( CoreConstant.DATAX_JOB_CONTENT, taskGroupJobContent); } } for (taskGroupConfigIndex = equalDivisionStartIndex; taskGroupConfigIndex < taskGroupConfigs.size(); ) { Configuration taskGroupConfig = taskGroupConfigs.get(taskGroupConfigIndex); taskGroupConfig.set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL, channelsPerTaskGroup); taskGroupConfig.set(CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID, taskGroupConfigIndex++); } return taskGroupConfigs; } private void postJobReader() { classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.READER, this.readerPluginName)); LOG.info("DataX Reader.Job [{}] do post work.", this.readerPluginName); this.jobReader.post(); classLoaderSwapper.restoreCurrentThreadClassLoader(); } private void postJobWriter() { classLoaderSwapper.setCurrentThreadClassLoader(LoadUtil.getJarLoader( PluginType.WRITER, this.writerPluginName)); LOG.info("DataX Writer.Job [{}] do post work.", this.writerPluginName); this.jobWriter.post(); classLoaderSwapper.restoreCurrentThreadClassLoader(); } /** * 检查最终结果是否超出阈值,如果阈值设定小于1,则表示百分数阈值,大于1表示条数阈值。 * * @param */ private void checkLimit() { Communication communication = super.getContainerCommunicator().collect(); errorLimit.checkRecordLimit(communication); errorLimit.checkPercentageLimit(communication); } /** * 调用外部hook */ private void invokeHooks() { Communication comm = super.getContainerCommunicator().collect(); HookInvoker invoker = new HookInvoker(CoreConstant.DATAX_HOME + "/hook", configuration, comm.getCounter()); invoker.invokeAll(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/job/meta/ExecuteMode.java ================================================ package com.alibaba.datax.core.job.meta; /** * Created by liupeng on 15/12/21. */ public enum ExecuteMode { STANDALONE("standalone"), ; String value; private ExecuteMode(String value) { this.value = value; } public String value() { return this.value; } public String getValue() { return this.value; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/job/meta/State.java ================================================ package com.alibaba.datax.core.job.meta; /** * Created by liupeng on 15/12/21. */ public enum State { SUBMITTING(10), WAITING(20), RUNNING(30), KILLING(40), KILLED(50), FAILED(60), SUCCEEDED(70), ; int value; private State(int value) { this.value = value; } public int value() { return this.value; } public boolean isFinished() { return this == KILLED || this == FAILED || this == SUCCEEDED; } public boolean isRunning() { return !this.isFinished(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/job/scheduler/AbstractScheduler.java ================================================ package com.alibaba.datax.core.job.scheduler; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.core.statistics.container.communicator.AbstractContainerCommunicator; import com.alibaba.datax.core.util.ErrorRecordChecker; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.dataxservice.face.domain.enums.State; import org.apache.commons.lang.Validate; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public abstract class AbstractScheduler { private static final Logger LOG = LoggerFactory .getLogger(AbstractScheduler.class); private ErrorRecordChecker errorLimit; private AbstractContainerCommunicator containerCommunicator; private Long jobId; public Long getJobId() { return jobId; } public AbstractScheduler(AbstractContainerCommunicator containerCommunicator) { this.containerCommunicator = containerCommunicator; } public void schedule(List configurations) { Validate.notNull(configurations, "scheduler配置不能为空"); int jobReportIntervalInMillSec = configurations.get(0).getInt( CoreConstant.DATAX_CORE_CONTAINER_JOB_REPORTINTERVAL, 30000); int jobSleepIntervalInMillSec = configurations.get(0).getInt( CoreConstant.DATAX_CORE_CONTAINER_JOB_SLEEPINTERVAL, 10000); this.jobId = configurations.get(0).getLong( CoreConstant.DATAX_CORE_CONTAINER_JOB_ID); errorLimit = new ErrorRecordChecker(configurations.get(0)); /** * 给 taskGroupContainer 的 Communication 注册 */ this.containerCommunicator.registerCommunication(configurations); int totalTasks = calculateTaskCount(configurations); startAllTaskGroup(configurations); Communication lastJobContainerCommunication = new Communication(); long lastReportTimeStamp = System.currentTimeMillis(); try { while (true) { /** * step 1: collect job stat * step 2: getReport info, then report it * step 3: errorLimit do check * step 4: dealSucceedStat(); * step 5: dealKillingStat(); * step 6: dealFailedStat(); * step 7: refresh last job stat, and then sleep for next while * * above steps, some ones should report info to DS * */ Communication nowJobContainerCommunication = this.containerCommunicator.collect(); nowJobContainerCommunication.setTimestamp(System.currentTimeMillis()); LOG.debug(nowJobContainerCommunication.toString()); //汇报周期 long now = System.currentTimeMillis(); if (now - lastReportTimeStamp > jobReportIntervalInMillSec) { Communication reportCommunication = CommunicationTool .getReportCommunication(nowJobContainerCommunication, lastJobContainerCommunication, totalTasks); this.containerCommunicator.report(reportCommunication); lastReportTimeStamp = now; lastJobContainerCommunication = nowJobContainerCommunication; } errorLimit.checkRecordLimit(nowJobContainerCommunication); if (nowJobContainerCommunication.getState() == State.SUCCEEDED) { LOG.info("Scheduler accomplished all tasks."); break; } if (isJobKilling(this.getJobId())) { dealKillingStat(this.containerCommunicator, totalTasks); } else if (nowJobContainerCommunication.getState() == State.FAILED) { dealFailedStat(this.containerCommunicator, nowJobContainerCommunication.getThrowable()); } Thread.sleep(jobSleepIntervalInMillSec); } } catch (InterruptedException e) { // 以 failed 状态退出 LOG.error("捕获到InterruptedException异常!", e); throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, e); } } protected abstract void startAllTaskGroup(List configurations); protected abstract void dealFailedStat(AbstractContainerCommunicator frameworkCollector, Throwable throwable); protected abstract void dealKillingStat(AbstractContainerCommunicator frameworkCollector, int totalTasks); private int calculateTaskCount(List configurations) { int totalTasks = 0; for (Configuration taskGroupConfiguration : configurations) { totalTasks += taskGroupConfiguration.getListConfiguration( CoreConstant.DATAX_JOB_CONTENT).size(); } return totalTasks; } // private boolean isJobKilling(Long jobId) { // Result jobInfo = DataxServiceUtil.getJobInfo(jobId); // return jobInfo.getData() == State.KILLING.value(); // } protected abstract boolean isJobKilling(Long jobId); } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/job/scheduler/processinner/ProcessInnerScheduler.java ================================================ package com.alibaba.datax.core.job.scheduler.processinner; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.job.scheduler.AbstractScheduler; import com.alibaba.datax.core.statistics.container.communicator.AbstractContainerCommunicator; import com.alibaba.datax.core.taskgroup.TaskGroupContainer; import com.alibaba.datax.core.taskgroup.runner.TaskGroupContainerRunner; import com.alibaba.datax.core.util.FrameworkErrorCode; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; public abstract class ProcessInnerScheduler extends AbstractScheduler { private ExecutorService taskGroupContainerExecutorService; public ProcessInnerScheduler(AbstractContainerCommunicator containerCommunicator) { super(containerCommunicator); } @Override public void startAllTaskGroup(List configurations) { this.taskGroupContainerExecutorService = Executors .newFixedThreadPool(configurations.size()); for (Configuration taskGroupConfiguration : configurations) { TaskGroupContainerRunner taskGroupContainerRunner = newTaskGroupContainerRunner(taskGroupConfiguration); this.taskGroupContainerExecutorService.execute(taskGroupContainerRunner); } this.taskGroupContainerExecutorService.shutdown(); } @Override public void dealFailedStat(AbstractContainerCommunicator frameworkCollector, Throwable throwable) { this.taskGroupContainerExecutorService.shutdownNow(); throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_RUNTIME_ERROR, throwable); } @Override public void dealKillingStat(AbstractContainerCommunicator frameworkCollector, int totalTasks) { //通过进程退出返回码标示状态 this.taskGroupContainerExecutorService.shutdownNow(); throw DataXException.asDataXException(FrameworkErrorCode.KILLED_EXIT_VALUE, "job killed status"); } private TaskGroupContainerRunner newTaskGroupContainerRunner( Configuration configuration) { TaskGroupContainer taskGroupContainer = new TaskGroupContainer(configuration); return new TaskGroupContainerRunner(taskGroupContainer); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/job/scheduler/processinner/StandAloneScheduler.java ================================================ package com.alibaba.datax.core.job.scheduler.processinner; import com.alibaba.datax.core.statistics.container.communicator.AbstractContainerCommunicator; /** * Created by hongjiao.hj on 2014/12/22. */ public class StandAloneScheduler extends ProcessInnerScheduler{ public StandAloneScheduler(AbstractContainerCommunicator containerCommunicator) { super(containerCommunicator); } @Override protected boolean isJobKilling(Long jobId) { return false; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/communication/Communication.java ================================================ package com.alibaba.datax.core.statistics.communication; import com.alibaba.datax.common.base.BaseObject; import com.alibaba.datax.dataxservice.face.domain.enums.State; import org.apache.commons.lang.StringUtils; import org.apache.commons.lang.Validate; import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.concurrent.ConcurrentHashMap; /** * DataX所有的状态及统计信息交互类,job、taskGroup、task等的消息汇报都走该类 */ public class Communication extends BaseObject implements Cloneable { /** * 所有的数值key-value对 * */ private Map counter; /** * 运行状态 * */ private State state; /** * 异常记录 * */ private Throwable throwable; /** * 记录的timestamp * */ private long timestamp; /** * task给job的信息 * */ Map> message; public Communication() { this.init(); } public synchronized void reset() { this.init(); } private void init() { this.counter = new ConcurrentHashMap(); this.state = State.RUNNING; this.throwable = null; this.message = new ConcurrentHashMap>(); this.timestamp = System.currentTimeMillis(); } public Map getCounter() { return this.counter; } public State getState() { return this.state; } public synchronized void setState(State state, boolean isForce) { if (!isForce && this.state.equals(State.FAILED)) { return; } this.state = state; } public synchronized void setState(State state) { setState(state, false); } public Throwable getThrowable() { return this.throwable; } public synchronized String getThrowableMessage() { return this.throwable == null ? "" : this.throwable.getMessage(); } public void setThrowable(Throwable throwable) { setThrowable(throwable, false); } public synchronized void setThrowable(Throwable throwable, boolean isForce) { if (isForce) { this.throwable = throwable; } else { this.throwable = this.throwable == null ? throwable : this.throwable; } } public long getTimestamp() { return this.timestamp; } public void setTimestamp(long timestamp) { this.timestamp = timestamp; } public Map> getMessage() { return this.message; } public List getMessage(final String key) { return message.get(key); } public synchronized void addMessage(final String key, final String value) { Validate.isTrue(StringUtils.isNotBlank(key), "增加message的key不能为空"); List valueList = this.message.get(key); if (null == valueList) { valueList = new ArrayList(); this.message.put(key, valueList); } valueList.add(value); } public synchronized Long getLongCounter(final String key) { Number value = this.counter.get(key); return value == null ? 0 : value.longValue(); } public synchronized void setLongCounter(final String key, final long value) { Validate.isTrue(StringUtils.isNotBlank(key), "设置counter的key不能为空"); this.counter.put(key, value); } public synchronized Double getDoubleCounter(final String key) { Number value = this.counter.get(key); return value == null ? 0.0d : value.doubleValue(); } public synchronized void setDoubleCounter(final String key, final double value) { Validate.isTrue(StringUtils.isNotBlank(key), "设置counter的key不能为空"); this.counter.put(key, value); } public synchronized void increaseCounter(final String key, final long deltaValue) { Validate.isTrue(StringUtils.isNotBlank(key), "增加counter的key不能为空"); long value = this.getLongCounter(key); this.counter.put(key, value + deltaValue); } @Override public Communication clone() { Communication communication = new Communication(); /** * clone counter */ if (this.counter != null) { for (Map.Entry entry : this.counter.entrySet()) { String key = entry.getKey(); Number value = entry.getValue(); if (value instanceof Long) { communication.setLongCounter(key, (Long) value); } else if (value instanceof Double) { communication.setDoubleCounter(key, (Double) value); } } } communication.setState(this.state, true); communication.setThrowable(this.throwable, true); communication.setTimestamp(this.timestamp); /** * clone message */ if (this.message != null) { for (final Map.Entry> entry : this.message.entrySet()) { String key = entry.getKey(); List value = new ArrayList() {{ addAll(entry.getValue()); }}; communication.getMessage().put(key, value); } } return communication; } public synchronized Communication mergeFrom(final Communication otherComm) { if (otherComm == null) { return this; } /** * counter的合并,将otherComm的值累加到this中,不存在的则创建 * 同为long */ for (Entry entry : otherComm.getCounter().entrySet()) { String key = entry.getKey(); Number otherValue = entry.getValue(); if (otherValue == null) { continue; } Number value = this.counter.get(key); if (value == null) { value = otherValue; } else { if (value instanceof Long && otherValue instanceof Long) { value = value.longValue() + otherValue.longValue(); } else { value = value.doubleValue() + value.doubleValue(); } } this.counter.put(key, value); } // 合并state mergeStateFrom(otherComm); /** * 合并throwable,当this的throwable为空时, * 才将otherComm的throwable合并进来 */ this.throwable = this.throwable == null ? otherComm.getThrowable() : this.throwable; /** * timestamp是整个一次合并的时间戳,单独两两communication不作合并 */ /** * message的合并采取求并的方式,即全部累计在一起 */ for (Entry> entry : otherComm.getMessage().entrySet()) { String key = entry.getKey(); List valueList = this.message.get(key); if (valueList == null) { valueList = new ArrayList(); this.message.put(key, valueList); } valueList.addAll(entry.getValue()); } return this; } /** * 合并state,优先级: (Failed | Killed) > Running > Success * 这里不会出现 Killing 状态,killing 状态只在 Job 自身状态上才有. */ public synchronized State mergeStateFrom(final Communication otherComm) { State retState = this.getState(); if (otherComm == null) { return retState; } if (this.state == State.FAILED || otherComm.getState() == State.FAILED || this.state == State.KILLED || otherComm.getState() == State.KILLED) { retState = State.FAILED; } else if (this.state.isRunning() || otherComm.state.isRunning()) { retState = State.RUNNING; } this.setState(retState); return retState; } public synchronized boolean isFinished(){ return this.state == State.SUCCEEDED || this.state == State.FAILED || this.state == State.KILLED; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/communication/CommunicationTool.java ================================================ package com.alibaba.datax.core.statistics.communication; import com.alibaba.datax.common.statistics.PerfTrace; import com.alibaba.datax.common.util.StrUtil; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang.Validate; import java.text.DecimalFormat; import java.util.HashMap; import java.util.Map; /** * 这里主要是业务层面的处理 */ public final class CommunicationTool { public static final String STAGE = "stage"; public static final String BYTE_SPEED = "byteSpeed"; public static final String RECORD_SPEED = "recordSpeed"; public static final String PERCENTAGE = "percentage"; public static final String READ_SUCCEED_RECORDS = "readSucceedRecords"; public static final String READ_SUCCEED_BYTES = "readSucceedBytes"; public static final String READ_FAILED_RECORDS = "readFailedRecords"; public static final String READ_FAILED_BYTES = "readFailedBytes"; public static final String WRITE_RECEIVED_RECORDS = "writeReceivedRecords"; public static final String WRITE_RECEIVED_BYTES = "writeReceivedBytes"; public static final String WRITE_FAILED_RECORDS = "writeFailedRecords"; public static final String WRITE_FAILED_BYTES = "writeFailedBytes"; public static final String TOTAL_READ_RECORDS = "totalReadRecords"; private static final String TOTAL_READ_BYTES = "totalReadBytes"; private static final String TOTAL_ERROR_RECORDS = "totalErrorRecords"; private static final String TOTAL_ERROR_BYTES = "totalErrorBytes"; private static final String WRITE_SUCCEED_RECORDS = "writeSucceedRecords"; private static final String WRITE_SUCCEED_BYTES = "writeSucceedBytes"; public static final String WAIT_WRITER_TIME = "waitWriterTime"; public static final String WAIT_READER_TIME = "waitReaderTime"; public static final String TRANSFORMER_USED_TIME = "totalTransformerUsedTime"; public static final String TRANSFORMER_SUCCEED_RECORDS = "totalTransformerSuccessRecords"; public static final String TRANSFORMER_FAILED_RECORDS = "totalTransformerFailedRecords"; public static final String TRANSFORMER_FILTER_RECORDS = "totalTransformerFilterRecords"; public static final String TRANSFORMER_NAME_PREFIX = "usedTimeByTransformer_"; public static Communication getReportCommunication(Communication now, Communication old, int totalStage) { Validate.isTrue(now != null && old != null, "为汇报准备的新旧metric不能为null"); long totalReadRecords = getTotalReadRecords(now); long totalReadBytes = getTotalReadBytes(now); now.setLongCounter(TOTAL_READ_RECORDS, totalReadRecords); now.setLongCounter(TOTAL_READ_BYTES, totalReadBytes); now.setLongCounter(TOTAL_ERROR_RECORDS, getTotalErrorRecords(now)); now.setLongCounter(TOTAL_ERROR_BYTES, getTotalErrorBytes(now)); now.setLongCounter(WRITE_SUCCEED_RECORDS, getWriteSucceedRecords(now)); now.setLongCounter(WRITE_SUCCEED_BYTES, getWriteSucceedBytes(now)); long timeInterval = now.getTimestamp() - old.getTimestamp(); long sec = timeInterval <= 1000 ? 1 : timeInterval / 1000; long bytesSpeed = (totalReadBytes - getTotalReadBytes(old)) / sec; long recordsSpeed = (totalReadRecords - getTotalReadRecords(old)) / sec; now.setLongCounter(BYTE_SPEED, bytesSpeed < 0 ? 0 : bytesSpeed); now.setLongCounter(RECORD_SPEED, recordsSpeed < 0 ? 0 : recordsSpeed); now.setDoubleCounter(PERCENTAGE, now.getLongCounter(STAGE) / (double) totalStage); if (old.getThrowable() != null) { now.setThrowable(old.getThrowable()); } return now; } public static long getTotalReadRecords(final Communication communication) { return communication.getLongCounter(READ_SUCCEED_RECORDS) + communication.getLongCounter(READ_FAILED_RECORDS); } public static long getTotalReadBytes(final Communication communication) { return communication.getLongCounter(READ_SUCCEED_BYTES) + communication.getLongCounter(READ_FAILED_BYTES); } public static long getTotalErrorRecords(final Communication communication) { return communication.getLongCounter(READ_FAILED_RECORDS) + communication.getLongCounter(WRITE_FAILED_RECORDS); } public static long getTotalErrorBytes(final Communication communication) { return communication.getLongCounter(READ_FAILED_BYTES) + communication.getLongCounter(WRITE_FAILED_BYTES); } public static long getWriteSucceedRecords(final Communication communication) { return communication.getLongCounter(WRITE_RECEIVED_RECORDS) - communication.getLongCounter(WRITE_FAILED_RECORDS); } public static long getWriteSucceedBytes(final Communication communication) { return communication.getLongCounter(WRITE_RECEIVED_BYTES) - communication.getLongCounter(WRITE_FAILED_BYTES); } public static class Stringify { private final static DecimalFormat df = new DecimalFormat("0.00"); public static String getSnapshot(final Communication communication) { StringBuilder sb = new StringBuilder(); sb.append("Total "); sb.append(getTotal(communication)); sb.append(" | "); sb.append("Speed "); sb.append(getSpeed(communication)); sb.append(" | "); sb.append("Error "); sb.append(getError(communication)); sb.append(" | "); sb.append(" All Task WaitWriterTime "); sb.append(PerfTrace.unitTime(communication.getLongCounter(WAIT_WRITER_TIME))); sb.append(" | "); sb.append(" All Task WaitReaderTime "); sb.append(PerfTrace.unitTime(communication.getLongCounter(WAIT_READER_TIME))); sb.append(" | "); if (communication.getLongCounter(CommunicationTool.TRANSFORMER_USED_TIME) > 0 || communication.getLongCounter(CommunicationTool.TRANSFORMER_SUCCEED_RECORDS) > 0 ||communication.getLongCounter(CommunicationTool.TRANSFORMER_FAILED_RECORDS) > 0 || communication.getLongCounter(CommunicationTool.TRANSFORMER_FILTER_RECORDS) > 0) { sb.append("Transfermor Success "); sb.append(String.format("%d records", communication.getLongCounter(CommunicationTool.TRANSFORMER_SUCCEED_RECORDS))); sb.append(" | "); sb.append("Transformer Error "); sb.append(String.format("%d records", communication.getLongCounter(CommunicationTool.TRANSFORMER_FAILED_RECORDS))); sb.append(" | "); sb.append("Transformer Filter "); sb.append(String.format("%d records", communication.getLongCounter(CommunicationTool.TRANSFORMER_FILTER_RECORDS))); sb.append(" | "); sb.append("Transformer usedTime "); sb.append(PerfTrace.unitTime(communication.getLongCounter(CommunicationTool.TRANSFORMER_USED_TIME))); sb.append(" | "); } sb.append("Percentage "); sb.append(getPercentage(communication)); return sb.toString(); } private static String getTotal(final Communication communication) { return String.format("%d records, %d bytes", communication.getLongCounter(TOTAL_READ_RECORDS), communication.getLongCounter(TOTAL_READ_BYTES)); } private static String getSpeed(final Communication communication) { return String.format("%s/s, %d records/s", StrUtil.stringify(communication.getLongCounter(BYTE_SPEED)), communication.getLongCounter(RECORD_SPEED)); } private static String getError(final Communication communication) { return String.format("%d records, %d bytes", communication.getLongCounter(TOTAL_ERROR_RECORDS), communication.getLongCounter(TOTAL_ERROR_BYTES)); } private static String getPercentage(final Communication communication) { return df.format(communication.getDoubleCounter(PERCENTAGE) * 100) + "%"; } } public static class Jsonify { @SuppressWarnings("rawtypes") public static String getSnapshot(Communication communication) { Validate.notNull(communication); Map state = new HashMap(); Pair pair = getTotalBytes(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getTotalRecords(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getSpeedRecord(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getSpeedByte(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getStage(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getErrorRecords(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getErrorBytes(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getErrorMessage(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getPercentage(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getWaitReaderTime(communication); state.put((String) pair.getKey(), pair.getValue()); pair = getWaitWriterTime(communication); state.put((String) pair.getKey(), pair.getValue()); return JSON.toJSONString(state); } private static Pair getTotalBytes(final Communication communication) { return new Pair("totalBytes", communication.getLongCounter(TOTAL_READ_BYTES)); } private static Pair getTotalRecords(final Communication communication) { return new Pair("totalRecords", communication.getLongCounter(TOTAL_READ_RECORDS)); } private static Pair getSpeedByte(final Communication communication) { return new Pair("speedBytes", communication.getLongCounter(BYTE_SPEED)); } private static Pair getSpeedRecord(final Communication communication) { return new Pair("speedRecords", communication.getLongCounter(RECORD_SPEED)); } private static Pair getErrorRecords(final Communication communication) { return new Pair("errorRecords", communication.getLongCounter(TOTAL_ERROR_RECORDS)); } private static Pair getErrorBytes(final Communication communication) { return new Pair("errorBytes", communication.getLongCounter(TOTAL_ERROR_BYTES)); } private static Pair getStage(final Communication communication) { return new Pair("stage", communication.getLongCounter(STAGE)); } private static Pair getPercentage(final Communication communication) { return new Pair("percentage", communication.getDoubleCounter(PERCENTAGE)); } private static Pair getErrorMessage(final Communication communication) { return new Pair("errorMessage", communication.getThrowableMessage()); } private static Pair getWaitReaderTime(final Communication communication) { return new Pair("waitReaderTime", communication.getLongCounter(CommunicationTool.WAIT_READER_TIME)); } private static Pair getWaitWriterTime(final Communication communication) { return new Pair("waitWriterTime", communication.getLongCounter(CommunicationTool.WAIT_WRITER_TIME)); } static class Pair { public Pair(final K key, final V value) { this.key = key; this.value = value; } public K getKey() { return key; } public V getValue() { return value; } private K key; private V value; } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/communication/LocalTGCommunicationManager.java ================================================ package com.alibaba.datax.core.statistics.communication; import com.alibaba.datax.dataxservice.face.domain.enums.State; import org.apache.commons.lang3.Validate; import java.util.Map; import java.util.Set; import java.util.concurrent.ConcurrentHashMap; public final class LocalTGCommunicationManager { private static Map taskGroupCommunicationMap = new ConcurrentHashMap(); public static void registerTaskGroupCommunication( int taskGroupId, Communication communication) { taskGroupCommunicationMap.put(taskGroupId, communication); } public static Communication getJobCommunication() { Communication communication = new Communication(); communication.setState(State.SUCCEEDED); for (Communication taskGroupCommunication : taskGroupCommunicationMap.values()) { communication.mergeFrom(taskGroupCommunication); } return communication; } /** * 采用获取taskGroupId后再获取对应communication的方式, * 防止map遍历时修改,同时也防止对map key-value对的修改 * * @return */ public static Set getTaskGroupIdSet() { return taskGroupCommunicationMap.keySet(); } public static Communication getTaskGroupCommunication(int taskGroupId) { Validate.isTrue(taskGroupId >= 0, "taskGroupId不能小于0"); return taskGroupCommunicationMap.get(taskGroupId); } public static void updateTaskGroupCommunication(final int taskGroupId, final Communication communication) { Validate.isTrue(taskGroupCommunicationMap.containsKey( taskGroupId), String.format("taskGroupCommunicationMap中没有注册taskGroupId[%d]的Communication," + "无法更新该taskGroup的信息", taskGroupId)); taskGroupCommunicationMap.put(taskGroupId, communication); } public static void clear() { taskGroupCommunicationMap.clear(); } public static Map getTaskGroupCommunicationMap() { return taskGroupCommunicationMap; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/container/collector/AbstractCollector.java ================================================ package com.alibaba.datax.core.statistics.container.collector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.LocalTGCommunicationManager; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.dataxservice.face.domain.enums.State; import java.util.List; import java.util.Map; import java.util.concurrent.ConcurrentHashMap; public abstract class AbstractCollector { private Map taskCommunicationMap = new ConcurrentHashMap(); private Long jobId; public Map getTaskCommunicationMap() { return taskCommunicationMap; } public Long getJobId() { return jobId; } public void setJobId(Long jobId) { this.jobId = jobId; } public void registerTGCommunication(List taskGroupConfigurationList) { for (Configuration config : taskGroupConfigurationList) { int taskGroupId = config.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID); LocalTGCommunicationManager.registerTaskGroupCommunication(taskGroupId, new Communication()); } } public void registerTaskCommunication(List taskConfigurationList) { for (Configuration taskConfig : taskConfigurationList) { int taskId = taskConfig.getInt(CoreConstant.TASK_ID); this.taskCommunicationMap.put(taskId, new Communication()); } } public Communication collectFromTask() { Communication communication = new Communication(); communication.setState(State.SUCCEEDED); for (Communication taskCommunication : this.taskCommunicationMap.values()) { communication.mergeFrom(taskCommunication); } return communication; } public abstract Communication collectFromTaskGroup(); public Map getTGCommunicationMap() { return LocalTGCommunicationManager.getTaskGroupCommunicationMap(); } public Communication getTGCommunication(Integer taskGroupId) { return LocalTGCommunicationManager.getTaskGroupCommunication(taskGroupId); } public Communication getTaskCommunication(Integer taskId) { return this.taskCommunicationMap.get(taskId); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/container/collector/ProcessInnerCollector.java ================================================ package com.alibaba.datax.core.statistics.container.collector; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.LocalTGCommunicationManager; public class ProcessInnerCollector extends AbstractCollector { public ProcessInnerCollector(Long jobId) { super.setJobId(jobId); } @Override public Communication collectFromTaskGroup() { return LocalTGCommunicationManager.getJobCommunication(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/container/communicator/AbstractContainerCommunicator.java ================================================ package com.alibaba.datax.core.statistics.container.communicator; import com.alibaba.datax.common.statistics.VMInfo; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.container.collector.AbstractCollector; import com.alibaba.datax.core.statistics.container.report.AbstractReporter; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.dataxservice.face.domain.enums.State; import java.util.List; import java.util.Map; public abstract class AbstractContainerCommunicator { private Configuration configuration; private AbstractCollector collector; private AbstractReporter reporter; private Long jobId; private VMInfo vmInfo = VMInfo.getVmInfo(); private long lastReportTime = System.currentTimeMillis(); public Configuration getConfiguration() { return this.configuration; } public AbstractCollector getCollector() { return collector; } public AbstractReporter getReporter() { return reporter; } public void setCollector(AbstractCollector collector) { this.collector = collector; } public void setReporter(AbstractReporter reporter) { this.reporter = reporter; } public Long getJobId() { return jobId; } public AbstractContainerCommunicator(Configuration configuration) { this.configuration = configuration; this.jobId = configuration.getLong(CoreConstant.DATAX_CORE_CONTAINER_JOB_ID); } public abstract void registerCommunication(List configurationList); public abstract Communication collect(); public abstract void report(Communication communication); public abstract State collectState(); public abstract Communication getCommunication(Integer id); /** * 当 实现是 TGContainerCommunicator 时,返回的 Map: key=taskId, value=Communication * 当 实现是 JobContainerCommunicator 时,返回的 Map: key=taskGroupId, value=Communication */ public abstract Map getCommunicationMap(); public void resetCommunication(Integer id){ Map map = getCommunicationMap(); map.put(id, new Communication()); } public void reportVmInfo(){ long now = System.currentTimeMillis(); //每5分钟打印一次 if(now - lastReportTime >= 300000) { //当前仅打印 if (vmInfo != null) { vmInfo.getDelta(true); } lastReportTime = now; } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/container/communicator/job/StandAloneJobContainerCommunicator.java ================================================ package com.alibaba.datax.core.statistics.container.communicator.job; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.core.statistics.container.collector.ProcessInnerCollector; import com.alibaba.datax.core.statistics.container.communicator.AbstractContainerCommunicator; import com.alibaba.datax.core.statistics.container.report.ProcessInnerReporter; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.dataxservice.face.domain.enums.State; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; import java.util.Map; public class StandAloneJobContainerCommunicator extends AbstractContainerCommunicator { private static final Logger LOG = LoggerFactory .getLogger(StandAloneJobContainerCommunicator.class); public StandAloneJobContainerCommunicator(Configuration configuration) { super(configuration); super.setCollector(new ProcessInnerCollector(configuration.getLong( CoreConstant.DATAX_CORE_CONTAINER_JOB_ID))); super.setReporter(new ProcessInnerReporter()); } @Override public void registerCommunication(List configurationList) { super.getCollector().registerTGCommunication(configurationList); } @Override public Communication collect() { return super.getCollector().collectFromTaskGroup(); } @Override public State collectState() { return this.collect().getState(); } /** * 和 DistributeJobContainerCollector 的 report 实现一样 */ @Override public void report(Communication communication) { super.getReporter().reportJobCommunication(super.getJobId(), communication); LOG.info(CommunicationTool.Stringify.getSnapshot(communication)); reportVmInfo(); } @Override public Communication getCommunication(Integer taskGroupId) { return super.getCollector().getTGCommunication(taskGroupId); } @Override public Map getCommunicationMap() { return super.getCollector().getTGCommunicationMap(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/container/communicator/taskgroup/AbstractTGContainerCommunicator.java ================================================ package com.alibaba.datax.core.statistics.container.communicator.taskgroup; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.container.collector.ProcessInnerCollector; import com.alibaba.datax.core.statistics.container.communicator.AbstractContainerCommunicator; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.dataxservice.face.domain.enums.State; import org.apache.commons.lang.Validate; import java.util.List; import java.util.Map; /** * 该类是用于处理 taskGroupContainer 的 communication 的收集汇报的父类 * 主要是 taskCommunicationMap 记录了 taskExecutor 的 communication 属性 */ public abstract class AbstractTGContainerCommunicator extends AbstractContainerCommunicator { protected long jobId; /** * 由于taskGroupContainer是进程内部调度 * 其registerCommunication(),getCommunication(), * getCommunications(),collect()等方法是一致的 * 所有TG的Collector都是ProcessInnerCollector */ protected int taskGroupId; public AbstractTGContainerCommunicator(Configuration configuration) { super(configuration); this.jobId = configuration.getInt( CoreConstant.DATAX_CORE_CONTAINER_JOB_ID); super.setCollector(new ProcessInnerCollector(this.jobId)); this.taskGroupId = configuration.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID); } @Override public void registerCommunication(List configurationList) { super.getCollector().registerTaskCommunication(configurationList); } @Override public final Communication collect() { return this.getCollector().collectFromTask(); } @Override public final State collectState() { Communication communication = new Communication(); communication.setState(State.SUCCEEDED); for (Communication taskCommunication : super.getCollector().getTaskCommunicationMap().values()) { communication.mergeStateFrom(taskCommunication); } return communication.getState(); } @Override public final Communication getCommunication(Integer taskId) { Validate.isTrue(taskId >= 0, "注册的taskId不能小于0"); return super.getCollector().getTaskCommunication(taskId); } @Override public final Map getCommunicationMap() { return super.getCollector().getTaskCommunicationMap(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/container/communicator/taskgroup/StandaloneTGContainerCommunicator.java ================================================ package com.alibaba.datax.core.statistics.container.communicator.taskgroup; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.container.report.ProcessInnerReporter; import com.alibaba.datax.core.statistics.communication.Communication; public class StandaloneTGContainerCommunicator extends AbstractTGContainerCommunicator { public StandaloneTGContainerCommunicator(Configuration configuration) { super(configuration); super.setReporter(new ProcessInnerReporter()); } @Override public void report(Communication communication) { super.getReporter().reportTGCommunication(super.taskGroupId, communication); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/container/report/AbstractReporter.java ================================================ package com.alibaba.datax.core.statistics.container.report; import com.alibaba.datax.core.statistics.communication.Communication; public abstract class AbstractReporter { public abstract void reportJobCommunication(Long jobId, Communication communication); public abstract void reportTGCommunication(Integer taskGroupId, Communication communication); } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/container/report/ProcessInnerReporter.java ================================================ package com.alibaba.datax.core.statistics.container.report; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.LocalTGCommunicationManager; public class ProcessInnerReporter extends AbstractReporter { @Override public void reportJobCommunication(Long jobId, Communication communication) { // do nothing } @Override public void reportTGCommunication(Integer taskGroupId, Communication communication) { LocalTGCommunicationManager.updateTaskGroupCommunication(taskGroupId, communication); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/plugin/DefaultJobPluginCollector.java ================================================ package com.alibaba.datax.core.statistics.plugin; import com.alibaba.datax.common.plugin.JobPluginCollector; import com.alibaba.datax.core.statistics.container.communicator.AbstractContainerCommunicator; import com.alibaba.datax.core.statistics.communication.Communication; import java.util.List; import java.util.Map; /** * Created by jingxing on 14-9-9. */ public final class DefaultJobPluginCollector implements JobPluginCollector { private AbstractContainerCommunicator jobCollector; public DefaultJobPluginCollector(AbstractContainerCommunicator containerCollector) { this.jobCollector = containerCollector; } @Override public Map> getMessage() { Communication totalCommunication = this.jobCollector.collect(); return totalCommunication.getMessage(); } @Override public List getMessage(String key) { Communication totalCommunication = this.jobCollector.collect(); return totalCommunication.getMessage(key); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/AbstractTaskPluginCollector.java ================================================ package com.alibaba.datax.core.statistics.plugin.task; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.common.constant.PluginType; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.util.FrameworkErrorCode; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * Created by jingxing on 14-9-11. */ public abstract class AbstractTaskPluginCollector extends TaskPluginCollector { private static final Logger LOG = LoggerFactory .getLogger(AbstractTaskPluginCollector.class); private Communication communication; private Configuration configuration; private PluginType pluginType; public AbstractTaskPluginCollector(Configuration conf, Communication communication, PluginType type) { this.configuration = conf; this.communication = communication; this.pluginType = type; } public Communication getCommunication() { return communication; } public Configuration getConfiguration() { return configuration; } public PluginType getPluginType() { return pluginType; } @Override final public void collectMessage(String key, String value) { this.communication.addMessage(key, value); } @Override public void collectDirtyRecord(Record dirtyRecord, Throwable t, String errorMessage) { if (null == dirtyRecord) { LOG.warn("脏数据record=null."); return; } if (this.pluginType.equals(PluginType.READER)) { this.communication.increaseCounter( CommunicationTool.READ_FAILED_RECORDS, 1); this.communication.increaseCounter( CommunicationTool.READ_FAILED_BYTES, dirtyRecord.getByteSize()); } else if (this.pluginType.equals(PluginType.WRITER)) { this.communication.increaseCounter( CommunicationTool.WRITE_FAILED_RECORDS, 1); this.communication.increaseCounter( CommunicationTool.WRITE_FAILED_BYTES, dirtyRecord.getByteSize()); } else { throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, String.format("不知道的插件类型[%s].", this.pluginType)); } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/HttpPluginCollector.java ================================================ package com.alibaba.datax.core.statistics.plugin.task; import com.alibaba.datax.common.constant.PluginType; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; /** * Created by jingxing on 14-9-9. */ public class HttpPluginCollector extends AbstractTaskPluginCollector { public HttpPluginCollector(Configuration configuration, Communication Communication, PluginType type) { super(configuration, Communication, type); } @Override public void collectDirtyRecord(Record dirtyRecord, Throwable t, String errorMessage) { super.collectDirtyRecord(dirtyRecord, t, errorMessage); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/StdoutPluginCollector.java ================================================ package com.alibaba.datax.core.statistics.plugin.task; import com.alibaba.datax.common.constant.PluginType; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.core.statistics.plugin.task.util.DirtyRecord; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.HashMap; import java.util.Map; import java.util.concurrent.atomic.AtomicInteger; /** * Created by jingxing on 14-9-9. */ public class StdoutPluginCollector extends AbstractTaskPluginCollector { private static final Logger LOG = LoggerFactory .getLogger(StdoutPluginCollector.class); private static final int DEFAULT_MAX_DIRTYNUM = 128; private AtomicInteger maxLogNum = new AtomicInteger(0); private AtomicInteger currentLogNum = new AtomicInteger(0); public StdoutPluginCollector(Configuration configuration, Communication communication, PluginType type) { super(configuration, communication, type); maxLogNum = new AtomicInteger( configuration.getInt( CoreConstant.DATAX_CORE_STATISTICS_COLLECTOR_PLUGIN_MAXDIRTYNUM, DEFAULT_MAX_DIRTYNUM)); } private String formatDirty(final Record dirty, final Throwable t, final String msg) { Map msgGroup = new HashMap(); msgGroup.put("type", super.getPluginType().toString()); if (StringUtils.isNotBlank(msg)) { msgGroup.put("message", msg); } if (null != t && StringUtils.isNotBlank(t.getMessage())) { msgGroup.put("exception", t.getMessage()); } if (null != dirty) { msgGroup.put("record", DirtyRecord.asDirtyRecord(dirty) .getColumns()); } return JSON.toJSONString(msgGroup); } @Override public void collectDirtyRecord(Record dirtyRecord, Throwable t, String errorMessage) { int logNum = currentLogNum.getAndIncrement(); if(logNum==0 && t!=null){ LOG.error("", t); } if (maxLogNum.intValue() < 0 || currentLogNum.intValue() < maxLogNum.intValue()) { LOG.error("脏数据: \n" + this.formatDirty(dirtyRecord, t, errorMessage)); } super.collectDirtyRecord(dirtyRecord, t, errorMessage); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/statistics/plugin/task/util/DirtyRecord.java ================================================ package com.alibaba.datax.core.statistics.plugin.task.util; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.fastjson2.JSON; import java.math.BigDecimal; import java.math.BigInteger; import java.util.ArrayList; import java.util.Date; import java.util.List; import java.util.Map; public class DirtyRecord implements Record { private List columns = new ArrayList(); private Map meta; public static DirtyRecord asDirtyRecord(final Record record) { DirtyRecord result = new DirtyRecord(); for (int i = 0; i < record.getColumnNumber(); i++) { result.addColumn(record.getColumn(i)); } result.setMeta(record.getMeta()); return result; } @Override public void addColumn(Column column) { this.columns.add( DirtyColumn.asDirtyColumn(column, this.columns.size())); } @Override public String toString() { return JSON.toJSONString(this.columns); } @Override public void setColumn(int i, Column column) { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public Column getColumn(int i) { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public int getColumnNumber() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public int getByteSize() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public int getMemorySize() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public void setMeta(Map meta) { this.meta = meta; } @Override public Map getMeta() { return this.meta; } public List getColumns() { return columns; } public void setColumns(List columns) { this.columns = columns; } } class DirtyColumn extends Column { private int index; public static Column asDirtyColumn(final Column column, int index) { return new DirtyColumn(column, index); } private DirtyColumn(Column column, int index) { this(null == column ? null : column.getRawData(), null == column ? Column.Type.NULL : column.getType(), null == column ? 0 : column.getByteSize(), index); } public int getIndex() { return index; } public void setIndex(int index) { this.index = index; } @Override public Long asLong() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public Double asDouble() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public String asString() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public Date asDate() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public Date asDate(String dateFormat) { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public byte[] asBytes() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public Boolean asBoolean() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public BigDecimal asBigDecimal() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } @Override public BigInteger asBigInteger() { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, "该方法不支持!"); } private DirtyColumn(Object object, Type type, int byteSize, int index) { super(object, type, byteSize); this.setIndex(index); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/taskgroup/TaskGroupContainer.java ================================================ package com.alibaba.datax.core.taskgroup; import com.alibaba.datax.common.constant.PluginType; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.statistics.PerfRecord; import com.alibaba.datax.common.statistics.PerfTrace; import com.alibaba.datax.common.statistics.VMInfo; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.AbstractContainer; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.core.statistics.container.communicator.taskgroup.StandaloneTGContainerCommunicator; import com.alibaba.datax.core.statistics.plugin.task.AbstractTaskPluginCollector; import com.alibaba.datax.core.taskgroup.runner.AbstractRunner; import com.alibaba.datax.core.taskgroup.runner.ReaderRunner; import com.alibaba.datax.core.taskgroup.runner.WriterRunner; import com.alibaba.datax.core.transport.channel.Channel; import com.alibaba.datax.core.transport.exchanger.BufferedRecordExchanger; import com.alibaba.datax.core.transport.exchanger.BufferedRecordTransformerExchanger; import com.alibaba.datax.core.transport.transformer.TransformerExecution; import com.alibaba.datax.core.util.ClassUtil; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.TransformerUtil; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.core.util.container.LoadUtil; import com.alibaba.datax.dataxservice.face.domain.enums.State; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.Validate; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; public class TaskGroupContainer extends AbstractContainer { private static final Logger LOG = LoggerFactory .getLogger(TaskGroupContainer.class); /** * 当前taskGroup所属jobId */ private long jobId; /** * 当前taskGroupId */ private int taskGroupId; /** * 使用的channel类 */ private String channelClazz; /** * task收集器使用的类 */ private String taskCollectorClass; private TaskMonitor taskMonitor = TaskMonitor.getInstance(); public TaskGroupContainer(Configuration configuration) { super(configuration); initCommunicator(configuration); this.jobId = this.configuration.getLong( CoreConstant.DATAX_CORE_CONTAINER_JOB_ID); this.taskGroupId = this.configuration.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID); this.channelClazz = this.configuration.getString( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_CLASS); this.taskCollectorClass = this.configuration.getString( CoreConstant.DATAX_CORE_STATISTICS_COLLECTOR_PLUGIN_TASKCLASS); } private void initCommunicator(Configuration configuration) { super.setContainerCommunicator(new StandaloneTGContainerCommunicator(configuration)); } public long getJobId() { return jobId; } public int getTaskGroupId() { return taskGroupId; } @Override public void start() { try { /** * 状态check时间间隔,较短,可以把任务及时分发到对应channel中 */ int sleepIntervalInMillSec = this.configuration.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_SLEEPINTERVAL, 100); /** * 状态汇报时间间隔,稍长,避免大量汇报 */ long reportIntervalInMillSec = this.configuration.getLong( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_REPORTINTERVAL, 10000); /** * 2分钟汇报一次性能统计 */ // 获取channel数目 int channelNumber = this.configuration.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL); int taskMaxRetryTimes = this.configuration.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASK_FAILOVER_MAXRETRYTIMES, 1); long taskRetryIntervalInMsec = this.configuration.getLong( CoreConstant.DATAX_CORE_CONTAINER_TASK_FAILOVER_RETRYINTERVALINMSEC, 10000); long taskMaxWaitInMsec = this.configuration.getLong(CoreConstant.DATAX_CORE_CONTAINER_TASK_FAILOVER_MAXWAITINMSEC, 60000); List taskConfigs = this.configuration .getListConfiguration(CoreConstant.DATAX_JOB_CONTENT); if(LOG.isDebugEnabled()) { LOG.debug("taskGroup[{}]'s task configs[{}]", this.taskGroupId, JSON.toJSONString(taskConfigs)); } int taskCountInThisTaskGroup = taskConfigs.size(); LOG.info(String.format( "taskGroupId=[%d] start [%d] channels for [%d] tasks.", this.taskGroupId, channelNumber, taskCountInThisTaskGroup)); this.containerCommunicator.registerCommunication(taskConfigs); Map taskConfigMap = buildTaskConfigMap(taskConfigs); //taskId与task配置 List taskQueue = buildRemainTasks(taskConfigs); //待运行task列表 Map taskFailedExecutorMap = new HashMap(); //taskId与上次失败实例 List runTasks = new ArrayList(channelNumber); //正在运行task Map taskStartTimeMap = new HashMap(); //任务开始时间 long lastReportTimeStamp = 0; Communication lastTaskGroupContainerCommunication = new Communication(); while (true) { //1.判断task状态 boolean failedOrKilled = false; Map communicationMap = containerCommunicator.getCommunicationMap(); for(Map.Entry entry : communicationMap.entrySet()){ Integer taskId = entry.getKey(); Communication taskCommunication = entry.getValue(); if(!taskCommunication.isFinished()){ continue; } TaskExecutor taskExecutor = removeTask(runTasks, taskId); //上面从runTasks里移除了,因此对应在monitor里移除 taskMonitor.removeTask(taskId); //失败,看task是否支持failover,重试次数未超过最大限制 if(taskCommunication.getState() == State.FAILED){ taskFailedExecutorMap.put(taskId, taskExecutor); if(taskExecutor.supportFailOver() && taskExecutor.getAttemptCount() < taskMaxRetryTimes){ taskExecutor.shutdown(); //关闭老的executor containerCommunicator.resetCommunication(taskId); //将task的状态重置 Configuration taskConfig = taskConfigMap.get(taskId); taskQueue.add(taskConfig); //重新加入任务列表 }else{ failedOrKilled = true; break; } }else if(taskCommunication.getState() == State.KILLED){ failedOrKilled = true; break; }else if(taskCommunication.getState() == State.SUCCEEDED){ Long taskStartTime = taskStartTimeMap.get(taskId); if(taskStartTime != null){ Long usedTime = System.currentTimeMillis() - taskStartTime; LOG.info("taskGroup[{}] taskId[{}] is successed, used[{}]ms", this.taskGroupId, taskId, usedTime); //usedTime*1000*1000 转换成PerfRecord记录的ns,这里主要是简单登记,进行最长任务的打印。因此增加特定静态方法 PerfRecord.addPerfRecord(taskGroupId, taskId, PerfRecord.PHASE.TASK_TOTAL,taskStartTime, usedTime * 1000L * 1000L); taskStartTimeMap.remove(taskId); taskConfigMap.remove(taskId); } } } // 2.发现该taskGroup下taskExecutor的总状态失败则汇报错误 if (failedOrKilled) { lastTaskGroupContainerCommunication = reportTaskGroupCommunication( lastTaskGroupContainerCommunication, taskCountInThisTaskGroup); throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_RUNTIME_ERROR, lastTaskGroupContainerCommunication.getThrowable()); } //3.有任务未执行,且正在运行的任务数小于最大通道限制 Iterator iterator = taskQueue.iterator(); while(iterator.hasNext() && runTasks.size() < channelNumber){ Configuration taskConfig = iterator.next(); Integer taskId = taskConfig.getInt(CoreConstant.TASK_ID); int attemptCount = 1; TaskExecutor lastExecutor = taskFailedExecutorMap.get(taskId); if(lastExecutor!=null){ attemptCount = lastExecutor.getAttemptCount() + 1; long now = System.currentTimeMillis(); long failedTime = lastExecutor.getTimeStamp(); if(now - failedTime < taskRetryIntervalInMsec){ //未到等待时间,继续留在队列 continue; } if(!lastExecutor.isShutdown()){ //上次失败的task仍未结束 if(now - failedTime > taskMaxWaitInMsec){ markCommunicationFailed(taskId); reportTaskGroupCommunication(lastTaskGroupContainerCommunication, taskCountInThisTaskGroup); throw DataXException.asDataXException(CommonErrorCode.WAIT_TIME_EXCEED, "task failover等待超时"); }else{ lastExecutor.shutdown(); //再次尝试关闭 continue; } }else{ LOG.info("taskGroup[{}] taskId[{}] attemptCount[{}] has already shutdown", this.taskGroupId, taskId, lastExecutor.getAttemptCount()); } } Configuration taskConfigForRun = taskMaxRetryTimes > 1 ? taskConfig.clone() : taskConfig; TaskExecutor taskExecutor = new TaskExecutor(taskConfigForRun, attemptCount); taskStartTimeMap.put(taskId, System.currentTimeMillis()); taskExecutor.doStart(); iterator.remove(); runTasks.add(taskExecutor); //上面,增加task到runTasks列表,因此在monitor里注册。 taskMonitor.registerTask(taskId, this.containerCommunicator.getCommunication(taskId)); taskFailedExecutorMap.remove(taskId); LOG.info("taskGroup[{}] taskId[{}] attemptCount[{}] is started", this.taskGroupId, taskId, attemptCount); } //4.任务列表为空,executor已结束, 搜集状态为success--->成功 if (taskQueue.isEmpty() && isAllTaskDone(runTasks) && containerCommunicator.collectState() == State.SUCCEEDED) { // 成功的情况下,也需要汇报一次。否则在任务结束非常快的情况下,采集的信息将会不准确 lastTaskGroupContainerCommunication = reportTaskGroupCommunication( lastTaskGroupContainerCommunication, taskCountInThisTaskGroup); LOG.info("taskGroup[{}] completed it's tasks.", this.taskGroupId); break; } // 5.如果当前时间已经超出汇报时间的interval,那么我们需要马上汇报 long now = System.currentTimeMillis(); if (now - lastReportTimeStamp > reportIntervalInMillSec) { lastTaskGroupContainerCommunication = reportTaskGroupCommunication( lastTaskGroupContainerCommunication, taskCountInThisTaskGroup); lastReportTimeStamp = now; //taskMonitor对于正在运行的task,每reportIntervalInMillSec进行检查 for(TaskExecutor taskExecutor:runTasks){ taskMonitor.report(taskExecutor.getTaskId(),this.containerCommunicator.getCommunication(taskExecutor.getTaskId())); } } Thread.sleep(sleepIntervalInMillSec); } //6.最后还要汇报一次 reportTaskGroupCommunication(lastTaskGroupContainerCommunication, taskCountInThisTaskGroup); } catch (Throwable e) { Communication nowTaskGroupContainerCommunication = this.containerCommunicator.collect(); if (nowTaskGroupContainerCommunication.getThrowable() == null) { nowTaskGroupContainerCommunication.setThrowable(e); } nowTaskGroupContainerCommunication.setState(State.FAILED); this.containerCommunicator.report(nowTaskGroupContainerCommunication); throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, e); }finally { if(!PerfTrace.getInstance().isJob()){ //最后打印cpu的平均消耗,GC的统计 VMInfo vmInfo = VMInfo.getVmInfo(); if (vmInfo != null) { vmInfo.getDelta(false); LOG.info(vmInfo.totalString()); } LOG.info(PerfTrace.getInstance().summarizeNoException()); } } } private Map buildTaskConfigMap(List configurations){ Map map = new HashMap(); for(Configuration taskConfig : configurations){ int taskId = taskConfig.getInt(CoreConstant.TASK_ID); map.put(taskId, taskConfig); } return map; } private List buildRemainTasks(List configurations){ List remainTasks = new LinkedList(); for(Configuration taskConfig : configurations){ remainTasks.add(taskConfig); } return remainTasks; } private TaskExecutor removeTask(List taskList, int taskId){ Iterator iterator = taskList.iterator(); while(iterator.hasNext()){ TaskExecutor taskExecutor = iterator.next(); if(taskExecutor.getTaskId() == taskId){ iterator.remove(); return taskExecutor; } } return null; } private boolean isAllTaskDone(List taskList){ for(TaskExecutor taskExecutor : taskList){ if(!taskExecutor.isTaskFinished()){ return false; } } return true; } private Communication reportTaskGroupCommunication(Communication lastTaskGroupContainerCommunication, int taskCount){ Communication nowTaskGroupContainerCommunication = this.containerCommunicator.collect(); nowTaskGroupContainerCommunication.setTimestamp(System.currentTimeMillis()); Communication reportCommunication = CommunicationTool.getReportCommunication(nowTaskGroupContainerCommunication, lastTaskGroupContainerCommunication, taskCount); this.containerCommunicator.report(reportCommunication); return reportCommunication; } private void markCommunicationFailed(Integer taskId){ Communication communication = containerCommunicator.getCommunication(taskId); communication.setState(State.FAILED); } /** * TaskExecutor是一个完整task的执行器 * 其中包括1:1的reader和writer */ class TaskExecutor { private Configuration taskConfig; private int taskId; private int attemptCount; private Channel channel; private Thread readerThread; private Thread writerThread; private ReaderRunner readerRunner; private WriterRunner writerRunner; /** * 该处的taskCommunication在多处用到: * 1. channel * 2. readerRunner和writerRunner * 3. reader和writer的taskPluginCollector */ private Communication taskCommunication; public TaskExecutor(Configuration taskConf, int attemptCount) { // 获取该taskExecutor的配置 this.taskConfig = taskConf; Validate.isTrue(null != this.taskConfig.getConfiguration(CoreConstant.JOB_READER) && null != this.taskConfig.getConfiguration(CoreConstant.JOB_WRITER), "[reader|writer]的插件参数不能为空!"); // 得到taskId this.taskId = this.taskConfig.getInt(CoreConstant.TASK_ID); this.attemptCount = attemptCount; /** * 由taskId得到该taskExecutor的Communication * 要传给readerRunner和writerRunner,同时要传给channel作统计用 */ this.taskCommunication = containerCommunicator .getCommunication(taskId); Validate.notNull(this.taskCommunication, String.format("taskId[%d]的Communication没有注册过", taskId)); this.channel = ClassUtil.instantiate(channelClazz, Channel.class, configuration); this.channel.setCommunication(this.taskCommunication); /** * 获取transformer的参数 */ List transformerInfoExecs = TransformerUtil.buildTransformerInfo(taskConfig); /** * 生成writerThread */ writerRunner = (WriterRunner) generateRunner(PluginType.WRITER); this.writerThread = new Thread(writerRunner, String.format("%d-%d-%d-writer", jobId, taskGroupId, this.taskId)); //通过设置thread的contextClassLoader,即可实现同步和主程序不通的加载器 this.writerThread.setContextClassLoader(LoadUtil.getJarLoader( PluginType.WRITER, this.taskConfig.getString( CoreConstant.JOB_WRITER_NAME))); /** * 生成readerThread */ readerRunner = (ReaderRunner) generateRunner(PluginType.READER,transformerInfoExecs); this.readerThread = new Thread(readerRunner, String.format("%d-%d-%d-reader", jobId, taskGroupId, this.taskId)); /** * 通过设置thread的contextClassLoader,即可实现同步和主程序不通的加载器 */ this.readerThread.setContextClassLoader(LoadUtil.getJarLoader( PluginType.READER, this.taskConfig.getString( CoreConstant.JOB_READER_NAME))); } public void doStart() { this.writerThread.start(); // reader没有起来,writer不可能结束 if (!this.writerThread.isAlive() || this.taskCommunication.getState() == State.FAILED) { throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, this.taskCommunication.getThrowable()); } this.readerThread.start(); // 这里reader可能很快结束 if (!this.readerThread.isAlive() && this.taskCommunication.getState() == State.FAILED) { // 这里有可能出现Reader线上启动即挂情况 对于这类情况 需要立刻抛出异常 throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, this.taskCommunication.getThrowable()); } } private AbstractRunner generateRunner(PluginType pluginType) { return generateRunner(pluginType, null); } private AbstractRunner generateRunner(PluginType pluginType, List transformerInfoExecs) { AbstractRunner newRunner = null; TaskPluginCollector pluginCollector; switch (pluginType) { case READER: newRunner = LoadUtil.loadPluginRunner(pluginType, this.taskConfig.getString(CoreConstant.JOB_READER_NAME)); newRunner.setJobConf(this.taskConfig.getConfiguration( CoreConstant.JOB_READER_PARAMETER)); pluginCollector = ClassUtil.instantiate( taskCollectorClass, AbstractTaskPluginCollector.class, configuration, this.taskCommunication, PluginType.READER); RecordSender recordSender; if (transformerInfoExecs != null && transformerInfoExecs.size() > 0) { recordSender = new BufferedRecordTransformerExchanger(taskGroupId, this.taskId, this.channel,this.taskCommunication ,pluginCollector, transformerInfoExecs); } else { recordSender = new BufferedRecordExchanger(this.channel, pluginCollector); } ((ReaderRunner) newRunner).setRecordSender(recordSender); /** * 设置taskPlugin的collector,用来处理脏数据和job/task通信 */ newRunner.setTaskPluginCollector(pluginCollector); break; case WRITER: newRunner = LoadUtil.loadPluginRunner(pluginType, this.taskConfig.getString(CoreConstant.JOB_WRITER_NAME)); newRunner.setJobConf(this.taskConfig .getConfiguration(CoreConstant.JOB_WRITER_PARAMETER)); pluginCollector = ClassUtil.instantiate( taskCollectorClass, AbstractTaskPluginCollector.class, configuration, this.taskCommunication, PluginType.WRITER); ((WriterRunner) newRunner).setRecordReceiver(new BufferedRecordExchanger( this.channel, pluginCollector)); /** * 设置taskPlugin的collector,用来处理脏数据和job/task通信 */ newRunner.setTaskPluginCollector(pluginCollector); break; default: throw DataXException.asDataXException(FrameworkErrorCode.ARGUMENT_ERROR, "Cant generateRunner for:" + pluginType); } newRunner.setTaskGroupId(taskGroupId); newRunner.setTaskId(this.taskId); newRunner.setRunnerCommunication(this.taskCommunication); return newRunner; } // 检查任务是否结束 private boolean isTaskFinished() { // 如果reader 或 writer没有完成工作,那么直接返回工作没有完成 if (readerThread.isAlive() || writerThread.isAlive()) { return false; } if(taskCommunication==null || !taskCommunication.isFinished()){ return false; } return true; } private int getTaskId(){ return taskId; } private long getTimeStamp(){ return taskCommunication.getTimestamp(); } private int getAttemptCount(){ return attemptCount; } private boolean supportFailOver(){ return writerRunner.supportFailOver(); } private void shutdown(){ writerRunner.shutdown(); readerRunner.shutdown(); if(writerThread.isAlive()){ writerThread.interrupt(); } if(readerThread.isAlive()){ readerThread.interrupt(); } } private boolean isShutdown(){ return !readerThread.isAlive() && !writerThread.isAlive(); } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/taskgroup/TaskMonitor.java ================================================ package com.alibaba.datax.core.taskgroup; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.dataxservice.face.domain.enums.State; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.concurrent.ConcurrentHashMap; /** * Created by liqiang on 15/7/23. */ public class TaskMonitor { private static final Logger LOG = LoggerFactory.getLogger(TaskMonitor.class); private static final TaskMonitor instance = new TaskMonitor(); private static long EXPIRED_TIME = 172800 * 1000; private ConcurrentHashMap tasks = new ConcurrentHashMap(); private TaskMonitor() { } public static TaskMonitor getInstance() { return instance; } public void registerTask(Integer taskid, Communication communication) { //如果task已经finish,直接返回 if (communication.isFinished()) { return; } tasks.putIfAbsent(taskid, new TaskCommunication(taskid, communication)); } public void removeTask(Integer taskid) { tasks.remove(taskid); } public void report(Integer taskid, Communication communication) { //如果task已经finish,直接返回 if (communication.isFinished()) { return; } if (!tasks.containsKey(taskid)) { LOG.warn("unexpected: taskid({}) missed.", taskid); tasks.putIfAbsent(taskid, new TaskCommunication(taskid, communication)); } else { tasks.get(taskid).report(communication); } } public TaskCommunication getTaskCommunication(Integer taskid) { return tasks.get(taskid); } public static class TaskCommunication { private Integer taskid; //记录最后更新的communication private long lastAllReadRecords = -1; //只有第一次,或者统计变更时才会更新TS private long lastUpdateComunicationTS; private long ttl; private TaskCommunication(Integer taskid, Communication communication) { this.taskid = taskid; lastAllReadRecords = CommunicationTool.getTotalReadRecords(communication); ttl = System.currentTimeMillis(); lastUpdateComunicationTS = ttl; } public void report(Communication communication) { ttl = System.currentTimeMillis(); //采集的数量增长,则变更当前记录, 优先判断这个条件,因为目的是不卡住,而不是expired if (CommunicationTool.getTotalReadRecords(communication) > lastAllReadRecords) { lastAllReadRecords = CommunicationTool.getTotalReadRecords(communication); lastUpdateComunicationTS = ttl; } else if (isExpired(lastUpdateComunicationTS)) { communication.setState(State.FAILED); communication.setTimestamp(ttl); communication.setThrowable(DataXException.asDataXException(CommonErrorCode.TASK_HUNG_EXPIRED, String.format("task(%s) hung expired [allReadRecord(%s), elased(%s)]", taskid, lastAllReadRecords, (ttl - lastUpdateComunicationTS)))); } } private boolean isExpired(long lastUpdateComunicationTS) { return System.currentTimeMillis() - lastUpdateComunicationTS > EXPIRED_TIME; } public Integer getTaskid() { return taskid; } public long getLastAllReadRecords() { return lastAllReadRecords; } public long getLastUpdateComunicationTS() { return lastUpdateComunicationTS; } public long getTtl() { return ttl; } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/taskgroup/runner/AbstractRunner.java ================================================ package com.alibaba.datax.core.taskgroup.runner; import com.alibaba.datax.common.plugin.AbstractTaskPlugin; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.dataxservice.face.domain.enums.State; import org.apache.commons.lang.Validate; public abstract class AbstractRunner { private AbstractTaskPlugin plugin; private Configuration jobConf; private Communication runnerCommunication; private int taskGroupId; private int taskId; public AbstractRunner(AbstractTaskPlugin taskPlugin) { this.plugin = taskPlugin; } public void destroy() { if (this.plugin != null) { this.plugin.destroy(); } } public State getRunnerState() { return this.runnerCommunication.getState(); } public AbstractTaskPlugin getPlugin() { return plugin; } public void setPlugin(AbstractTaskPlugin plugin) { this.plugin = plugin; } public Configuration getJobConf() { return jobConf; } public void setJobConf(Configuration jobConf) { this.jobConf = jobConf; this.plugin.setPluginJobConf(jobConf); } public void setTaskPluginCollector(TaskPluginCollector pluginCollector) { this.plugin.setTaskPluginCollector(pluginCollector); } private void mark(State state) { this.runnerCommunication.setState(state); if (state == State.SUCCEEDED) { // 对 stage + 1 this.runnerCommunication.setLongCounter(CommunicationTool.STAGE, this.runnerCommunication.getLongCounter(CommunicationTool.STAGE) + 1); } } public void markRun() { mark(State.RUNNING); } public void markSuccess() { mark(State.SUCCEEDED); } public void markFail(final Throwable throwable) { mark(State.FAILED); this.runnerCommunication.setTimestamp(System.currentTimeMillis()); this.runnerCommunication.setThrowable(throwable); } /** * @param taskGroupId the taskGroupId to set */ public void setTaskGroupId(int taskGroupId) { this.taskGroupId = taskGroupId; this.plugin.setTaskGroupId(taskGroupId); } /** * @return the taskGroupId */ public int getTaskGroupId() { return taskGroupId; } public int getTaskId() { return taskId; } public void setTaskId(int taskId) { this.taskId = taskId; this.plugin.setTaskId(taskId); } public void setRunnerCommunication(final Communication runnerCommunication) { Validate.notNull(runnerCommunication, "插件的Communication不能为空"); this.runnerCommunication = runnerCommunication; } public Communication getRunnerCommunication() { return runnerCommunication; } public abstract void shutdown(); } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/taskgroup/runner/ReaderRunner.java ================================================ package com.alibaba.datax.core.taskgroup.runner; import com.alibaba.datax.common.plugin.AbstractTaskPlugin; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.statistics.PerfRecord; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * Created by jingxing on 14-9-1. *

* 单个slice的reader执行调用 */ public class ReaderRunner extends AbstractRunner implements Runnable { private static final Logger LOG = LoggerFactory .getLogger(ReaderRunner.class); private RecordSender recordSender; public void setRecordSender(RecordSender recordSender) { this.recordSender = recordSender; } public ReaderRunner(AbstractTaskPlugin abstractTaskPlugin) { super(abstractTaskPlugin); } @Override public void run() { assert null != this.recordSender; Reader.Task taskReader = (Reader.Task) this.getPlugin(); //统计waitWriterTime,并且在finally才end。 PerfRecord channelWaitWrite = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.WAIT_WRITE_TIME); try { channelWaitWrite.start(); LOG.debug("task reader starts to do init ..."); PerfRecord initPerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.READ_TASK_INIT); initPerfRecord.start(); taskReader.init(); initPerfRecord.end(); LOG.debug("task reader starts to do prepare ..."); PerfRecord preparePerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.READ_TASK_PREPARE); preparePerfRecord.start(); taskReader.prepare(); preparePerfRecord.end(); LOG.debug("task reader starts to read ..."); PerfRecord dataPerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.READ_TASK_DATA); dataPerfRecord.start(); taskReader.startRead(recordSender); recordSender.terminate(); dataPerfRecord.addCount(CommunicationTool.getTotalReadRecords(super.getRunnerCommunication())); dataPerfRecord.addSize(CommunicationTool.getTotalReadBytes(super.getRunnerCommunication())); dataPerfRecord.end(); LOG.debug("task reader starts to do post ..."); PerfRecord postPerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.READ_TASK_POST); postPerfRecord.start(); taskReader.post(); postPerfRecord.end(); // automatic flush // super.markSuccess(); 这里不能标记为成功,成功的标志由 writerRunner 来标志(否则可能导致 reader 先结束,而 writer 还没有结束的严重 bug) } catch (Throwable e) { LOG.error("Reader runner Received Exceptions:", e); super.markFail(e); } finally { LOG.debug("task reader starts to do destroy ..."); PerfRecord desPerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.READ_TASK_DESTROY); desPerfRecord.start(); super.destroy(); desPerfRecord.end(); channelWaitWrite.end(super.getRunnerCommunication().getLongCounter(CommunicationTool.WAIT_WRITER_TIME)); long transformerUsedTime = super.getRunnerCommunication().getLongCounter(CommunicationTool.TRANSFORMER_USED_TIME); if (transformerUsedTime > 0) { PerfRecord transformerRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.TRANSFORMER_TIME); transformerRecord.start(); transformerRecord.end(transformerUsedTime); } } } public void shutdown(){ recordSender.shutdown(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/taskgroup/runner/TaskGroupContainerRunner.java ================================================ package com.alibaba.datax.core.taskgroup.runner; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.core.taskgroup.TaskGroupContainer; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.dataxservice.face.domain.enums.State; public class TaskGroupContainerRunner implements Runnable { private TaskGroupContainer taskGroupContainer; private State state; public TaskGroupContainerRunner(TaskGroupContainer taskGroup) { this.taskGroupContainer = taskGroup; this.state = State.SUCCEEDED; } @Override public void run() { try { Thread.currentThread().setName( String.format("taskGroup-%d", this.taskGroupContainer.getTaskGroupId())); this.taskGroupContainer.start(); this.state = State.SUCCEEDED; } catch (Throwable e) { this.state = State.FAILED; throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, e); } } public TaskGroupContainer getTaskGroupContainer() { return taskGroupContainer; } public State getState() { return state; } public void setState(State state) { this.state = state; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/taskgroup/runner/WriterRunner.java ================================================ package com.alibaba.datax.core.taskgroup.runner; import com.alibaba.datax.common.plugin.AbstractTaskPlugin; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.statistics.PerfRecord; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import org.apache.commons.lang3.Validate; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * Created by jingxing on 14-9-1. *

* 单个slice的writer执行调用 */ public class WriterRunner extends AbstractRunner implements Runnable { private static final Logger LOG = LoggerFactory .getLogger(WriterRunner.class); private RecordReceiver recordReceiver; public void setRecordReceiver(RecordReceiver receiver) { this.recordReceiver = receiver; } public WriterRunner(AbstractTaskPlugin abstractTaskPlugin) { super(abstractTaskPlugin); } @Override public void run() { Validate.isTrue(this.recordReceiver != null); Writer.Task taskWriter = (Writer.Task) this.getPlugin(); //统计waitReadTime,并且在finally end PerfRecord channelWaitRead = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.WAIT_READ_TIME); try { channelWaitRead.start(); LOG.debug("task writer starts to do init ..."); PerfRecord initPerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.WRITE_TASK_INIT); initPerfRecord.start(); taskWriter.init(); initPerfRecord.end(); LOG.debug("task writer starts to do prepare ..."); PerfRecord preparePerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.WRITE_TASK_PREPARE); preparePerfRecord.start(); taskWriter.prepare(); preparePerfRecord.end(); LOG.debug("task writer starts to write ..."); PerfRecord dataPerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.WRITE_TASK_DATA); dataPerfRecord.start(); taskWriter.startWrite(recordReceiver); dataPerfRecord.addCount(CommunicationTool.getTotalReadRecords(super.getRunnerCommunication())); dataPerfRecord.addSize(CommunicationTool.getTotalReadBytes(super.getRunnerCommunication())); dataPerfRecord.end(); LOG.debug("task writer starts to do post ..."); PerfRecord postPerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.WRITE_TASK_POST); postPerfRecord.start(); taskWriter.post(); postPerfRecord.end(); super.markSuccess(); } catch (Throwable e) { LOG.error("Writer Runner Received Exceptions:", e); super.markFail(e); } finally { LOG.debug("task writer starts to do destroy ..."); PerfRecord desPerfRecord = new PerfRecord(getTaskGroupId(), getTaskId(), PerfRecord.PHASE.WRITE_TASK_DESTROY); desPerfRecord.start(); super.destroy(); desPerfRecord.end(); channelWaitRead.end(super.getRunnerCommunication().getLongCounter(CommunicationTool.WAIT_READER_TIME)); } } public boolean supportFailOver(){ Writer.Task taskWriter = (Writer.Task) this.getPlugin(); return taskWriter.supportFailOver(); } public void shutdown(){ recordReceiver.shutdown(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/channel/Channel.java ================================================ package com.alibaba.datax.core.transport.channel; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.core.transport.record.TerminateRecord; import com.alibaba.datax.core.util.container.CoreConstant; import org.apache.commons.lang.Validate; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.Collection; /** * Created by jingxing on 14-8-25. *

* 统计和限速都在这里 */ public abstract class Channel { private static final Logger LOG = LoggerFactory.getLogger(Channel.class); protected int taskGroupId; protected int capacity; protected int byteCapacity; protected long byteSpeed; // bps: bytes/s protected long recordSpeed; // tps: records/s protected long flowControlInterval; protected volatile boolean isClosed = false; protected Configuration configuration = null; protected volatile long waitReaderTime = 0; protected volatile long waitWriterTime = 0; private static Boolean isFirstPrint = true; private Communication currentCommunication; private Communication lastCommunication = new Communication(); public Channel(final Configuration configuration) { //channel的queue里默认record为1万条。原来为512条 int capacity = configuration.getInt( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_CAPACITY, 2048); long byteSpeed = configuration.getLong( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_BYTE, 1024 * 1024); long recordSpeed = configuration.getLong( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_SPEED_RECORD, 10000); if (capacity <= 0) { throw new IllegalArgumentException(String.format( "通道容量[%d]必须大于0.", capacity)); } synchronized (isFirstPrint) { if (isFirstPrint) { Channel.LOG.info("Channel set byte_speed_limit to " + byteSpeed + (byteSpeed <= 0 ? ", No bps activated." : ".")); Channel.LOG.info("Channel set record_speed_limit to " + recordSpeed + (recordSpeed <= 0 ? ", No tps activated." : ".")); isFirstPrint = false; } } this.taskGroupId = configuration.getInt( CoreConstant.DATAX_CORE_CONTAINER_TASKGROUP_ID); this.capacity = capacity; this.byteSpeed = byteSpeed; this.recordSpeed = recordSpeed; this.flowControlInterval = configuration.getLong( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_FLOWCONTROLINTERVAL, 1000); //channel的queue默认大小为8M,原来为64M this.byteCapacity = configuration.getInt( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_CAPACITY_BYTE, 8 * 1024 * 1024); this.configuration = configuration; } public void close() { this.isClosed = true; } public void open() { this.isClosed = false; } public boolean isClosed() { return isClosed; } public int getTaskGroupId() { return this.taskGroupId; } public int getCapacity() { return capacity; } public long getByteSpeed() { return byteSpeed; } public Configuration getConfiguration() { return this.configuration; } public void setCommunication(final Communication communication) { this.currentCommunication = communication; this.lastCommunication.reset(); } public void push(final Record r) { Validate.notNull(r, "record不能为空."); this.doPush(r); this.statPush(1L, r.getByteSize()); } public void pushTerminate(final TerminateRecord r) { Validate.notNull(r, "record不能为空."); this.doPush(r); // // 对 stage + 1 // currentCommunication.setLongCounter(CommunicationTool.STAGE, // currentCommunication.getLongCounter(CommunicationTool.STAGE) + 1); } public void pushAll(final Collection rs) { Validate.notNull(rs); Validate.noNullElements(rs); this.doPushAll(rs); this.statPush(rs.size(), this.getByteSize(rs)); } public Record pull() { Record record = this.doPull(); this.statPull(1L, record.getByteSize()); return record; } public void pullAll(final Collection rs) { Validate.notNull(rs); this.doPullAll(rs); this.statPull(rs.size(), this.getByteSize(rs)); } protected abstract void doPush(Record r); protected abstract void doPushAll(Collection rs); protected abstract Record doPull(); protected abstract void doPullAll(Collection rs); public abstract int size(); public abstract boolean isEmpty(); public abstract void clear(); private long getByteSize(final Collection rs) { long size = 0; for (final Record each : rs) { size += each.getByteSize(); } return size; } private void statPush(long recordSize, long byteSize) { currentCommunication.increaseCounter(CommunicationTool.READ_SUCCEED_RECORDS, recordSize); currentCommunication.increaseCounter(CommunicationTool.READ_SUCCEED_BYTES, byteSize); //在读的时候进行统计waitCounter即可,因为写(pull)的时候可能正在阻塞,但读的时候已经能读到这个阻塞的counter数 currentCommunication.setLongCounter(CommunicationTool.WAIT_READER_TIME, waitReaderTime); currentCommunication.setLongCounter(CommunicationTool.WAIT_WRITER_TIME, waitWriterTime); boolean isChannelByteSpeedLimit = (this.byteSpeed > 0); boolean isChannelRecordSpeedLimit = (this.recordSpeed > 0); if (!isChannelByteSpeedLimit && !isChannelRecordSpeedLimit) { return; } long lastTimestamp = lastCommunication.getTimestamp(); long nowTimestamp = System.currentTimeMillis(); long interval = nowTimestamp - lastTimestamp; if (interval - this.flowControlInterval >= 0) { long byteLimitSleepTime = 0; long recordLimitSleepTime = 0; if (isChannelByteSpeedLimit) { long currentByteSpeed = (CommunicationTool.getTotalReadBytes(currentCommunication) - CommunicationTool.getTotalReadBytes(lastCommunication)) * 1000 / interval; if (currentByteSpeed > this.byteSpeed) { // 计算根据byteLimit得到的休眠时间 byteLimitSleepTime = currentByteSpeed * interval / this.byteSpeed - interval; } } if (isChannelRecordSpeedLimit) { long currentRecordSpeed = (CommunicationTool.getTotalReadRecords(currentCommunication) - CommunicationTool.getTotalReadRecords(lastCommunication)) * 1000 / interval; if (currentRecordSpeed > this.recordSpeed) { // 计算根据recordLimit得到的休眠时间 recordLimitSleepTime = currentRecordSpeed * interval / this.recordSpeed - interval; } } // 休眠时间取较大值 long sleepTime = byteLimitSleepTime < recordLimitSleepTime ? recordLimitSleepTime : byteLimitSleepTime; if (sleepTime > 0) { try { Thread.sleep(sleepTime); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } lastCommunication.setLongCounter(CommunicationTool.READ_SUCCEED_BYTES, currentCommunication.getLongCounter(CommunicationTool.READ_SUCCEED_BYTES)); lastCommunication.setLongCounter(CommunicationTool.READ_FAILED_BYTES, currentCommunication.getLongCounter(CommunicationTool.READ_FAILED_BYTES)); lastCommunication.setLongCounter(CommunicationTool.READ_SUCCEED_RECORDS, currentCommunication.getLongCounter(CommunicationTool.READ_SUCCEED_RECORDS)); lastCommunication.setLongCounter(CommunicationTool.READ_FAILED_RECORDS, currentCommunication.getLongCounter(CommunicationTool.READ_FAILED_RECORDS)); lastCommunication.setTimestamp(nowTimestamp); } } private void statPull(long recordSize, long byteSize) { currentCommunication.increaseCounter( CommunicationTool.WRITE_RECEIVED_RECORDS, recordSize); currentCommunication.increaseCounter( CommunicationTool.WRITE_RECEIVED_BYTES, byteSize); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/channel/memory/MemoryChannel.java ================================================ package com.alibaba.datax.core.transport.channel.memory; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.transport.channel.Channel; import com.alibaba.datax.core.transport.record.TerminateRecord; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.CoreConstant; import java.util.Collection; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.ReentrantLock; /** * 内存Channel的具体实现,底层其实是一个ArrayBlockingQueue * */ public class MemoryChannel extends Channel { private int bufferSize = 0; private AtomicInteger memoryBytes = new AtomicInteger(0); private ArrayBlockingQueue queue = null; private ReentrantLock lock; private Condition notSufficient, notEmpty; public MemoryChannel(final Configuration configuration) { super(configuration); this.queue = new ArrayBlockingQueue(this.getCapacity()); this.bufferSize = configuration.getInt(CoreConstant.DATAX_CORE_TRANSPORT_EXCHANGER_BUFFERSIZE); lock = new ReentrantLock(); notSufficient = lock.newCondition(); notEmpty = lock.newCondition(); } @Override public void close() { super.close(); try { this.queue.put(TerminateRecord.get()); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); } } @Override public void clear(){ this.queue.clear(); } @Override protected void doPush(Record r) { try { long startTime = System.nanoTime(); this.queue.put(r); waitWriterTime += System.nanoTime() - startTime; memoryBytes.addAndGet(r.getMemorySize()); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); } } @Override protected void doPushAll(Collection rs) { try { long startTime = System.nanoTime(); lock.lockInterruptibly(); int bytes = getRecordBytes(rs); while (memoryBytes.get() + bytes > this.byteCapacity || rs.size() > this.queue.remainingCapacity()) { notSufficient.await(200L, TimeUnit.MILLISECONDS); } this.queue.addAll(rs); waitWriterTime += System.nanoTime() - startTime; memoryBytes.addAndGet(bytes); notEmpty.signalAll(); } catch (InterruptedException e) { throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, e); } finally { lock.unlock(); } } @Override protected Record doPull() { try { long startTime = System.nanoTime(); Record r = this.queue.take(); waitReaderTime += System.nanoTime() - startTime; memoryBytes.addAndGet(-r.getMemorySize()); return r; } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new IllegalStateException(e); } } @Override protected void doPullAll(Collection rs) { assert rs != null; rs.clear(); try { long startTime = System.nanoTime(); lock.lockInterruptibly(); while (this.queue.drainTo(rs, bufferSize) <= 0) { notEmpty.await(200L, TimeUnit.MILLISECONDS); } waitReaderTime += System.nanoTime() - startTime; int bytes = getRecordBytes(rs); memoryBytes.addAndGet(-bytes); notSufficient.signalAll(); } catch (InterruptedException e) { throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, e); } finally { lock.unlock(); } } private int getRecordBytes(Collection rs){ int bytes = 0; for(Record r : rs){ bytes += r.getMemorySize(); } return bytes; } @Override public int size() { return this.queue.size(); } @Override public boolean isEmpty() { return this.queue.isEmpty(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/exchanger/BufferedRecordExchanger.java ================================================ package com.alibaba.datax.core.transport.exchanger; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.transport.channel.Channel; import com.alibaba.datax.core.transport.record.TerminateRecord; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.CoreConstant; import org.apache.commons.lang.Validate; import java.util.ArrayList; import java.util.List; import java.util.concurrent.atomic.AtomicInteger; public class BufferedRecordExchanger implements RecordSender, RecordReceiver { private final Channel channel; private final Configuration configuration; private final List buffer; private int bufferSize ; protected final int byteCapacity; private final AtomicInteger memoryBytes = new AtomicInteger(0); private int bufferIndex = 0; private static Class RECORD_CLASS; private volatile boolean shutdown = false; private final TaskPluginCollector pluginCollector; @SuppressWarnings("unchecked") public BufferedRecordExchanger(final Channel channel, final TaskPluginCollector pluginCollector) { assert null != channel; assert null != channel.getConfiguration(); this.channel = channel; this.pluginCollector = pluginCollector; this.configuration = channel.getConfiguration(); this.bufferSize = configuration .getInt(CoreConstant.DATAX_CORE_TRANSPORT_EXCHANGER_BUFFERSIZE); this.buffer = new ArrayList(bufferSize); //channel的queue默认大小为8M,原来为64M this.byteCapacity = configuration.getInt( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_CAPACITY_BYTE, 8 * 1024 * 1024); try { BufferedRecordExchanger.RECORD_CLASS = ((Class) Class .forName(configuration.getString( CoreConstant.DATAX_CORE_TRANSPORT_RECORD_CLASS, "com.alibaba.datax.core.transport.record.DefaultRecord"))); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, e); } } @Override public Record createRecord() { try { return BufferedRecordExchanger.RECORD_CLASS.newInstance(); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, e); } } @Override public void sendToWriter(Record record) { if(shutdown){ throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } Validate.notNull(record, "record不能为空."); if (record.getMemorySize() > this.byteCapacity) { this.pluginCollector.collectDirtyRecord(record, new Exception(String.format("单条记录超过大小限制,当前限制为:%s", this.byteCapacity))); return; } boolean isFull = (this.bufferIndex >= this.bufferSize || this.memoryBytes.get() + record.getMemorySize() > this.byteCapacity); if (isFull) { flush(); } this.buffer.add(record); this.bufferIndex++; memoryBytes.addAndGet(record.getMemorySize()); } @Override public void flush() { if(shutdown){ throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } this.channel.pushAll(this.buffer); this.buffer.clear(); this.bufferIndex = 0; this.memoryBytes.set(0); } @Override public void terminate() { if(shutdown){ throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } flush(); this.channel.pushTerminate(TerminateRecord.get()); } @Override public Record getFromReader() { if(shutdown){ throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } boolean isEmpty = (this.bufferIndex >= this.buffer.size()); if (isEmpty) { receive(); } Record record = this.buffer.get(this.bufferIndex++); if (record instanceof TerminateRecord) { record = null; } return record; } @Override public void shutdown(){ shutdown = true; try{ buffer.clear(); channel.clear(); }catch(Throwable t){ t.printStackTrace(); } } private void receive() { this.channel.pullAll(this.buffer); this.bufferIndex = 0; this.bufferSize = this.buffer.size(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/exchanger/BufferedRecordTransformerExchanger.java ================================================ package com.alibaba.datax.core.transport.exchanger; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.transport.channel.Channel; import com.alibaba.datax.core.transport.record.TerminateRecord; import com.alibaba.datax.core.transport.transformer.TransformerExecution; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.CoreConstant; import org.apache.commons.lang.Validate; import java.util.ArrayList; import java.util.List; import java.util.concurrent.atomic.AtomicInteger; public class BufferedRecordTransformerExchanger extends TransformerExchanger implements RecordSender, RecordReceiver { private final Channel channel; private final Configuration configuration; private final List buffer; private int bufferSize; protected final int byteCapacity; private final AtomicInteger memoryBytes = new AtomicInteger(0); private int bufferIndex = 0; private static Class RECORD_CLASS; private volatile boolean shutdown = false; @SuppressWarnings("unchecked") public BufferedRecordTransformerExchanger(final int taskGroupId, final int taskId, final Channel channel, final Communication communication, final TaskPluginCollector pluginCollector, final List tInfoExecs) { super(taskGroupId, taskId, communication, tInfoExecs, pluginCollector); assert null != channel; assert null != channel.getConfiguration(); this.channel = channel; this.configuration = channel.getConfiguration(); this.bufferSize = configuration .getInt(CoreConstant.DATAX_CORE_TRANSPORT_EXCHANGER_BUFFERSIZE); this.buffer = new ArrayList(bufferSize); //channel的queue默认大小为8M,原来为64M this.byteCapacity = configuration.getInt( CoreConstant.DATAX_CORE_TRANSPORT_CHANNEL_CAPACITY_BYTE, 8 * 1024 * 1024); try { BufferedRecordTransformerExchanger.RECORD_CLASS = ((Class) Class .forName(configuration.getString( CoreConstant.DATAX_CORE_TRANSPORT_RECORD_CLASS, "com.alibaba.datax.core.transport.record.DefaultRecord"))); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, e); } } @Override public Record createRecord() { try { return BufferedRecordTransformerExchanger.RECORD_CLASS.newInstance(); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, e); } } @Override public void sendToWriter(Record record) { if (shutdown) { throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } Validate.notNull(record, "record不能为空."); record = doTransformer(record); if(record == null){ return; } if (record.getMemorySize() > this.byteCapacity) { this.pluginCollector.collectDirtyRecord(record, new Exception(String.format("单条记录超过大小限制,当前限制为:%s", this.byteCapacity))); return; } boolean isFull = (this.bufferIndex >= this.bufferSize || this.memoryBytes.get() + record.getMemorySize() > this.byteCapacity); if (isFull) { flush(); } this.buffer.add(record); this.bufferIndex++; memoryBytes.addAndGet(record.getMemorySize()); } @Override public void flush() { if (shutdown) { throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } this.channel.pushAll(this.buffer); //和channel的统计保持同步 doStat(); this.buffer.clear(); this.bufferIndex = 0; this.memoryBytes.set(0); } @Override public void terminate() { if (shutdown) { throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } flush(); this.channel.pushTerminate(TerminateRecord.get()); } @Override public Record getFromReader() { if (shutdown) { throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } boolean isEmpty = (this.bufferIndex >= this.buffer.size()); if (isEmpty) { receive(); } Record record = this.buffer.get(this.bufferIndex++); if (record instanceof TerminateRecord) { record = null; } return record; } @Override public void shutdown() { shutdown = true; try { buffer.clear(); channel.clear(); } catch (Throwable t) { t.printStackTrace(); } } private void receive() { this.channel.pullAll(this.buffer); this.bufferIndex = 0; this.bufferSize = this.buffer.size(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/exchanger/RecordExchanger.java ================================================ /** * (C) 2010-2014 Alibaba Group Holding Limited. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.alibaba.datax.core.transport.exchanger; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.transport.channel.Channel; import com.alibaba.datax.core.transport.record.TerminateRecord; import com.alibaba.datax.core.transport.transformer.TransformerExecution; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.CoreConstant; import java.util.List; public class RecordExchanger extends TransformerExchanger implements RecordSender, RecordReceiver { private Channel channel; private Configuration configuration; private static Class RECORD_CLASS; private volatile boolean shutdown = false; @SuppressWarnings("unchecked") public RecordExchanger(final int taskGroupId, final int taskId,final Channel channel, final Communication communication,List transformerExecs, final TaskPluginCollector pluginCollector) { super(taskGroupId,taskId,communication,transformerExecs, pluginCollector); assert channel != null; this.channel = channel; this.configuration = channel.getConfiguration(); try { RecordExchanger.RECORD_CLASS = (Class) Class .forName(configuration.getString( CoreConstant.DATAX_CORE_TRANSPORT_RECORD_CLASS, "com.alibaba.datax.core.transport.record.DefaultRecord")); } catch (ClassNotFoundException e) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, e); } } @Override public Record getFromReader() { if(shutdown){ throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } Record record = this.channel.pull(); return (record instanceof TerminateRecord ? null : record); } @Override public Record createRecord() { try { return RECORD_CLASS.newInstance(); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.CONFIG_ERROR, e); } } @Override public void sendToWriter(Record record) { if(shutdown){ throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } record = doTransformer(record); if (record == null) { return; } this.channel.push(record); //和channel的统计保持同步 doStat(); } @Override public void flush() { } @Override public void terminate() { if(shutdown){ throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } this.channel.pushTerminate(TerminateRecord.get()); //和channel的统计保持同步 doStat(); } @Override public void shutdown(){ shutdown = true; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/exchanger/TransformerExchanger.java ================================================ package com.alibaba.datax.core.transport.exchanger; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.core.transport.transformer.TransformerErrorCode; import com.alibaba.datax.core.transport.transformer.TransformerExecution; import com.alibaba.datax.core.util.container.ClassLoaderSwapper; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; /** * no comments. * Created by liqiang on 16/3/9. */ public abstract class TransformerExchanger { private static final Logger LOG = LoggerFactory.getLogger(TransformerExchanger.class); protected final TaskPluginCollector pluginCollector; protected final int taskGroupId; protected final int taskId; protected final Communication currentCommunication; private long totalExaustedTime = 0; private long totalFilterRecords = 0; private long totalSuccessRecords = 0; private long totalFailedRecords = 0; private List transformerExecs; private ClassLoaderSwapper classLoaderSwapper = ClassLoaderSwapper .newCurrentThreadClassLoaderSwapper(); public TransformerExchanger(int taskGroupId, int taskId, Communication communication, List transformerExecs, final TaskPluginCollector pluginCollector) { this.transformerExecs = transformerExecs; this.pluginCollector = pluginCollector; this.taskGroupId = taskGroupId; this.taskId = taskId; this.currentCommunication = communication; } public Record doTransformer(Record record) { if (transformerExecs == null || transformerExecs.size() == 0) { return record; } Record result = record; long diffExaustedTime = 0; String errorMsg = null; boolean failed = false; for (TransformerExecution transformerInfoExec : transformerExecs) { long startTs = System.nanoTime(); if (transformerInfoExec.getClassLoader() != null) { classLoaderSwapper.setCurrentThreadClassLoader(transformerInfoExec.getClassLoader()); } /** * 延迟检查transformer参数的有效性,直接抛出异常,不作为脏数据 * 不需要在插件中检查参数的有效性。但参数的个数等和插件相关的参数,在插件内部检查 */ if (!transformerInfoExec.isChecked()) { if (transformerInfoExec.getColumnIndex() != null && transformerInfoExec.getColumnIndex() >= record.getColumnNumber()) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, String.format("columnIndex[%s] out of bound[%s]. name=%s", transformerInfoExec.getColumnIndex(), record.getColumnNumber(), transformerInfoExec.getTransformerName())); } transformerInfoExec.setIsChecked(true); } try { result = transformerInfoExec.getTransformer().evaluate(result, transformerInfoExec.gettContext(), transformerInfoExec.getFinalParas()); } catch (Exception e) { errorMsg = String.format("transformer(%s) has Exception(%s)", transformerInfoExec.getTransformerName(), e.getMessage()); failed = true; //LOG.error(errorMsg, e); // transformerInfoExec.addFailedRecords(1); //脏数据不再进行后续transformer处理,按脏数据处理,并过滤该record。 break; } finally { if (transformerInfoExec.getClassLoader() != null) { classLoaderSwapper.restoreCurrentThreadClassLoader(); } } if (result == null) { /** * 这个null不能传到writer,必须消化掉 */ totalFilterRecords++; //transformerInfoExec.addFilterRecords(1); break; } long diff = System.nanoTime() - startTs; //transformerInfoExec.addExaustedTime(diff); diffExaustedTime += diff; //transformerInfoExec.addSuccessRecords(1); } totalExaustedTime += diffExaustedTime; if (failed) { totalFailedRecords++; this.pluginCollector.collectDirtyRecord(record, errorMsg); return null; } else { totalSuccessRecords++; return result; } } public void doStat() { /** * todo 对于多个transformer时,各个transformer的单独统计进行显示。最后再汇总整个transformer的时间消耗. * 暂时不统计。 */ // if (transformers.size() > 1) { // for (ransformerInfoExec transformerInfoExec : transformers) { // currentCommunication.setLongCounter(CommunicationTool.TRANSFORMER_NAME_PREFIX + transformerInfoExec.getTransformerName(), transformerInfoExec.getExaustedTime()); // } // } currentCommunication.setLongCounter(CommunicationTool.TRANSFORMER_SUCCEED_RECORDS, totalSuccessRecords); currentCommunication.setLongCounter(CommunicationTool.TRANSFORMER_FAILED_RECORDS, totalFailedRecords); currentCommunication.setLongCounter(CommunicationTool.TRANSFORMER_FILTER_RECORDS, totalFilterRecords); currentCommunication.setLongCounter(CommunicationTool.TRANSFORMER_USED_TIME, totalExaustedTime); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/record/DefaultRecord.java ================================================ package com.alibaba.datax.core.transport.record; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.core.util.ClassSize; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.fastjson2.JSON; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * Created by jingxing on 14-8-24. */ public class DefaultRecord implements Record { private static final int RECORD_AVERGAE_COLUMN_NUMBER = 16; private List columns; private int byteSize; // 首先是Record本身需要的内存 private int memorySize = ClassSize.DefaultRecordHead; private Map meta; public DefaultRecord() { this.columns = new ArrayList(RECORD_AVERGAE_COLUMN_NUMBER); } @Override public void addColumn(Column column) { columns.add(column); incrByteSize(column); } @Override public Column getColumn(int i) { if (i < 0 || i >= columns.size()) { return null; } return columns.get(i); } @Override public void setColumn(int i, final Column column) { if (i < 0) { throw DataXException.asDataXException(FrameworkErrorCode.ARGUMENT_ERROR, "不能给index小于0的column设置值"); } if (i >= columns.size()) { expandCapacity(i + 1); } decrByteSize(getColumn(i)); this.columns.set(i, column); incrByteSize(getColumn(i)); } @Override public String toString() { Map json = new HashMap(); json.put("size", this.getColumnNumber()); json.put("data", this.columns); return JSON.toJSONString(json); } @Override public int getColumnNumber() { return this.columns.size(); } @Override public int getByteSize() { return byteSize; } public int getMemorySize(){ return memorySize; } @Override public void setMeta(Map meta) { this.meta = meta; } @Override public Map getMeta() { return this.meta; } private void decrByteSize(final Column column) { if (null == column) { return; } byteSize -= column.getByteSize(); //内存的占用是column对象的头 再加实际大小 memorySize = memorySize - ClassSize.ColumnHead - column.getByteSize(); } private void incrByteSize(final Column column) { if (null == column) { return; } byteSize += column.getByteSize(); //内存的占用是column对象的头 再加实际大小 memorySize = memorySize + ClassSize.ColumnHead + column.getByteSize(); } private void expandCapacity(int totalSize) { if (totalSize <= 0) { return; } int needToExpand = totalSize - columns.size(); while (needToExpand-- > 0) { this.columns.add(null); } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/record/TerminateRecord.java ================================================ package com.alibaba.datax.core.transport.record; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import java.util.Map; /** * 作为标示 生产者已经完成生产的标志 * */ public class TerminateRecord implements Record { private final static TerminateRecord SINGLE = new TerminateRecord(); private TerminateRecord() { } public static TerminateRecord get() { return SINGLE; } @Override public void addColumn(Column column) { } @Override public Column getColumn(int i) { return null; } @Override public int getColumnNumber() { return 0; } @Override public int getByteSize() { return 0; } @Override public int getMemorySize() { return 0; } @Override public void setMeta(Map meta) { } @Override public Map getMeta() { return null; } @Override public void setColumn(int i, Column column) { return; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/ComplexTransformerProxy.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.transformer.ComplexTransformer; import com.alibaba.datax.transformer.Transformer; import java.util.Map; /** * no comments. * Created by liqiang on 16/3/8. */ public class ComplexTransformerProxy extends ComplexTransformer { private Transformer realTransformer; public ComplexTransformerProxy(Transformer transformer) { setTransformerName(transformer.getTransformerName()); this.realTransformer = transformer; } @Override public Record evaluate(Record record, Map tContext, Object... paras) { return this.realTransformer.evaluate(record, paras); } public Transformer getRealTransformer() { return realTransformer; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/DigestTransformer.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.transformer.Transformer; import org.apache.commons.codec.digest.DigestUtils; import org.apache.commons.lang.StringUtils; import java.util.Arrays; /** * no comments. * * @author XuDaojie * @since 2021-08-16 */ public class DigestTransformer extends Transformer { private static final String MD5 = "md5"; private static final String SHA1 = "sha1"; private static final String TO_UPPER_CASE = "toUpperCase"; private static final String TO_LOWER_CASE = "toLowerCase"; public DigestTransformer() { setTransformerName("dx_digest"); } @Override public Record evaluate(Record record, Object... paras) { int columnIndex; String type; String charType; try { if (paras.length != 3) { throw new RuntimeException("dx_digest paras length must be 3"); } columnIndex = (Integer) paras[0]; type = (String) paras[1]; charType = (String) paras[2]; if (!StringUtils.equalsIgnoreCase(MD5, type) && !StringUtils.equalsIgnoreCase(SHA1, type)) { throw new RuntimeException("dx_digest paras index 1 must be md5 or sha1"); } if (!StringUtils.equalsIgnoreCase(TO_UPPER_CASE, charType) && !StringUtils.equalsIgnoreCase(TO_LOWER_CASE, charType)) { throw new RuntimeException("dx_digest paras index 2 must be toUpperCase or toLowerCase"); } } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "paras:" + Arrays.asList(paras) + " => " + e.getMessage()); } Column column = record.getColumn(columnIndex); try { String oriValue = column.asString(); // 如果字段为空,作为空字符串处理 if (oriValue == null) { oriValue = ""; } String newValue; if (MD5.equals(type)) { newValue = DigestUtils.md5Hex(oriValue); } else { newValue = DigestUtils.sha1Hex(oriValue); } if (TO_UPPER_CASE.equals(charType)) { newValue = newValue.toUpperCase(); } else { newValue = newValue.toLowerCase(); } record.setColumn(columnIndex, new StringColumn(newValue)); } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_RUN_EXCEPTION, e.getMessage(), e); } return record; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/FilterTransformer.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.transformer.Transformer; import org.apache.commons.lang3.StringUtils; import java.util.Arrays; /** * no comments. * Created by liqiang on 16/3/4. */ public class FilterTransformer extends Transformer { public FilterTransformer() { setTransformerName("dx_filter"); } @Override public Record evaluate(Record record, Object... paras) { int columnIndex; String code; String value; try { if (paras.length != 3) { throw new RuntimeException("dx_filter paras must be 3"); } columnIndex = (Integer) paras[0]; code = (String) paras[1]; value = (String) paras[2]; if (StringUtils.isEmpty(value)) { throw new RuntimeException("dx_filter para 2 can't be null"); } } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "paras:" + Arrays.asList(paras).toString() + " => " + e.getMessage()); } Column column = record.getColumn(columnIndex); try { if (code.equalsIgnoreCase("like")) { return doLike(record, value, column); } else if (code.equalsIgnoreCase("not like")) { return doNotLike(record, value, column); } else if (code.equalsIgnoreCase(">")) { return doGreat(record, value, column, false); } else if (code.equalsIgnoreCase("<")) { return doLess(record, value, column, false); } else if (code.equalsIgnoreCase("=") || code.equalsIgnoreCase("==")) { return doEqual(record, value, column); } else if (code.equalsIgnoreCase("!=")) { return doNotEqual(record, value, column); } else if (code.equalsIgnoreCase(">=")) { return doGreat(record, value, column, true); } else if (code.equalsIgnoreCase("<=")) { return doLess(record, value, column, true); } else { throw new RuntimeException("dx_filter can't support code:" + code); } } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_RUN_EXCEPTION, e.getMessage(), e); } } private Record doGreat(Record record, String value, Column column, boolean hasEqual) { //如果字段为空,直接不参与比较。即空也属于无穷小 if(column.getRawData() == null){ return record; } if (column instanceof DoubleColumn) { Double ori = column.asDouble(); double val = Double.parseDouble(value); if (hasEqual) { if (ori >= val) { return null; } else { return record; } } else { if (ori > val) { return null; } else { return record; } } } else if (column instanceof LongColumn || column instanceof DateColumn) { Long ori = column.asLong(); long val = Long.parseLong(value); if (hasEqual) { if (ori >= val) { return null; } else { return record; } } else { if (ori > val) { return null; } else { return record; } } } else if (column instanceof StringColumn || column instanceof BytesColumn || column instanceof BoolColumn) { String ori = column.asString(); if (hasEqual) { if (ori.compareTo(value) >= 0) { return null; } else { return record; } } else { if (ori.compareTo(value) > 0) { return null; } else { return record; } } } else { throw new RuntimeException(">=,> can't support this columnType:" + column.getClass().getSimpleName()); } } private Record doLess(Record record, String value, Column column, boolean hasEqual) { //如果字段为空,直接不参与比较。即空也属于无穷大 if(column.getRawData() == null){ return record; } if (column instanceof DoubleColumn) { Double ori = column.asDouble(); double val = Double.parseDouble(value); if (hasEqual) { if (ori <= val) { return null; } else { return record; } } else { if (ori < val) { return null; } else { return record; } } } else if (column instanceof LongColumn || column instanceof DateColumn) { Long ori = column.asLong(); long val = Long.parseLong(value); if (hasEqual) { if (ori <= val) { return null; } else { return record; } } else { if (ori < val) { return null; } else { return record; } } } else if (column instanceof StringColumn || column instanceof BytesColumn || column instanceof BoolColumn) { String ori = column.asString(); if (hasEqual) { if (ori.compareTo(value) <= 0) { return null; } else { return record; } } else { if (ori.compareTo(value) < 0) { return null; } else { return record; } } } else { throw new RuntimeException("<=,< can't support this columnType:" + column.getClass().getSimpleName()); } } /** * DateColumn将比较long值,StringColumn,ByteColumn以及BooleanColumn比较其String值 * * @param record * @param value * @param column * @return 如果相等,则过滤。 */ private Record doEqual(Record record, String value, Column column) { //如果字段为空,只比较目标字段为"null",否则null字段均不过滤 if(column.getRawData() == null){ if(value.equalsIgnoreCase("null")){ return null; }else { return record; } } if (column instanceof DoubleColumn) { Double ori = column.asDouble(); double val = Double.parseDouble(value); if (ori == val) { return null; } else { return record; } } else if (column instanceof LongColumn || column instanceof DateColumn) { Long ori = column.asLong(); long val = Long.parseLong(value); if (ori == val) { return null; } else { return record; } } else if (column instanceof StringColumn || column instanceof BytesColumn || column instanceof BoolColumn) { String ori = column.asString(); if (ori.compareTo(value) == 0) { return null; } else { return record; } } else { throw new RuntimeException("== can't support this columnType:" + column.getClass().getSimpleName()); } } /** * DateColumn将比较long值,StringColumn,ByteColumn以及BooleanColumn比较其String值 * * @param record * @param value * @param column * @return 如果不相等,则过滤。 */ private Record doNotEqual(Record record, String value, Column column) { //如果字段为空,只比较目标字段为"null", 否则null字段均过滤。 if(column.getRawData() == null){ if(value.equalsIgnoreCase("null")){ return record; }else { return null; } } if (column instanceof DoubleColumn) { Double ori = column.asDouble(); double val = Double.parseDouble(value); if (ori != val) { return null; } else { return record; } } else if (column instanceof LongColumn || column instanceof DateColumn) { Long ori = column.asLong(); long val = Long.parseLong(value); if (ori != val) { return null; } else { return record; } } else if (column instanceof StringColumn || column instanceof BytesColumn || column instanceof BoolColumn) { String ori = column.asString(); if (ori.compareTo(value) != 0) { return null; } else { return record; } } else { throw new RuntimeException("== can't support this columnType:" + column.getClass().getSimpleName()); } } private Record doLike(Record record, String value, Column column) { String orivalue = column.asString(); if (orivalue !=null && orivalue.matches(value)) { return null; } else { return record; } } private Record doNotLike(Record record, String value, Column column) { String orivalue = column.asString(); if (orivalue !=null && orivalue.matches(value)) { return record; } else { return null; } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/GroovyTransformer.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.transformer.Transformer; import groovy.lang.GroovyClassLoader; import org.apache.commons.lang3.StringUtils; import org.codehaus.groovy.control.CompilationFailedException; import java.util.Arrays; import java.util.List; /** * no comments. * Created by liqiang on 16/3/4. */ public class GroovyTransformer extends Transformer { public GroovyTransformer() { setTransformerName("dx_groovy"); } private Transformer groovyTransformer; @Override public Record evaluate(Record record, Object... paras) { if (groovyTransformer == null) { //全局唯一 if (paras.length < 1 || paras.length > 2) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "dx_groovy paras must be 1 or 2 . now paras is: " + Arrays.asList(paras).toString()); } synchronized (this) { if (groovyTransformer == null) { String code = (String) paras[0]; @SuppressWarnings("unchecked") List extraPackage = paras.length == 2 ? (List) paras[1] : null; initGroovyTransformer(code, extraPackage); } } } return this.groovyTransformer.evaluate(record); } private void initGroovyTransformer(String code, List extraPackage) { GroovyClassLoader loader = new GroovyClassLoader(GroovyTransformer.class.getClassLoader()); String groovyRule = getGroovyRule(code, extraPackage); Class groovyClass; try { groovyClass = loader.parseClass(groovyRule); } catch (CompilationFailedException cfe) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_GROOVY_INIT_EXCEPTION, cfe); } try { Object t = groovyClass.newInstance(); if (!(t instanceof Transformer)) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_GROOVY_INIT_EXCEPTION, "datax bug! contact askdatax"); } this.groovyTransformer = (Transformer) t; } catch (Throwable ex) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_GROOVY_INIT_EXCEPTION, ex); } } private String getGroovyRule(String expression, List extraPackagesStrList) { StringBuffer sb = new StringBuffer(); if(extraPackagesStrList!=null) { for (String extraPackagesStr : extraPackagesStrList) { if (StringUtils.isNotEmpty(extraPackagesStr)) { sb.append(extraPackagesStr); } } } sb.append("import static com.alibaba.datax.core.transport.transformer.GroovyTransformerStaticUtil.*;"); sb.append("import com.alibaba.datax.common.element.*;"); sb.append("import com.alibaba.datax.common.exception.DataXException;"); sb.append("import com.alibaba.datax.transformer.Transformer;"); sb.append("import java.util.*;"); sb.append("public class RULE extends Transformer").append("{"); sb.append("public Record evaluate(Record record, Object... paras) {"); sb.append(expression); sb.append("}}"); return sb.toString(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/GroovyTransformerStaticUtil.java ================================================ package com.alibaba.datax.core.transport.transformer; import org.apache.commons.codec.digest.DigestUtils; /** * GroovyTransformer的帮助类,供groovy代码使用,必须全是static的方法 * Created by liqiang on 16/3/4. */ public class GroovyTransformerStaticUtil { public static String md5(final String data) { return DigestUtils.md5Hex(data); } public static String sha1(final String data) { return DigestUtils.sha1Hex(data); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/PadTransformer.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.transformer.Transformer; import java.util.Arrays; /** * no comments. * Created by liqiang on 16/3/4. */ public class PadTransformer extends Transformer { public PadTransformer() { setTransformerName("dx_pad"); } @Override public Record evaluate(Record record, Object... paras) { int columnIndex; String padType; int length; String padString; try { if (paras.length != 4) { throw new RuntimeException("dx_pad paras must be 4"); } columnIndex = (Integer) paras[0]; padType = (String) paras[1]; length = Integer.valueOf((String) paras[2]); padString = (String) paras[3]; } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "paras:" + Arrays.asList(paras).toString() + " => " + e.getMessage()); } Column column = record.getColumn(columnIndex); try { String oriValue = column.asString(); //如果字段为空,作为空字符串处理 if(oriValue==null){ oriValue = ""; } String newValue; if (!padType.equalsIgnoreCase("r") && !padType.equalsIgnoreCase("l")) { throw new RuntimeException(String.format("dx_pad first para(%s) support l or r", padType)); } if (length <= oriValue.length()) { newValue = oriValue.substring(0, length); } else { newValue = doPad(padType, oriValue, length, padString); } record.setColumn(columnIndex, new StringColumn(newValue)); } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_RUN_EXCEPTION, e.getMessage(),e); } return record; } private String doPad(String padType, String oriValue, int length, String padString) { String finalPad = ""; int NeedLength = length - oriValue.length(); while (NeedLength > 0) { if (NeedLength >= padString.length()) { finalPad += padString; NeedLength -= padString.length(); } else { finalPad += padString.substring(0, NeedLength); NeedLength = 0; } } if (padType.equalsIgnoreCase("l")) { return finalPad + oriValue; } else { return oriValue + finalPad; } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/ReplaceTransformer.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.transformer.Transformer; import java.util.Arrays; /** * no comments. * Created by liqiang on 16/3/4. */ public class ReplaceTransformer extends Transformer { public ReplaceTransformer() { setTransformerName("dx_replace"); } @Override public Record evaluate(Record record, Object... paras) { int columnIndex; int startIndex; int length; String replaceString; try { if (paras.length != 4) { throw new RuntimeException("dx_replace paras must be 4"); } columnIndex = (Integer) paras[0]; startIndex = Integer.valueOf((String) paras[1]); length = Integer.valueOf((String) paras[2]); replaceString = (String) paras[3]; } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "paras:" + Arrays.asList(paras).toString() + " => " + e.getMessage()); } Column column = record.getColumn(columnIndex); try { String oriValue = column.asString(); //如果字段为空,跳过replace处理 if(oriValue == null){ return record; } String newValue; if (startIndex > oriValue.length()) { throw new RuntimeException(String.format("dx_replace startIndex(%s) out of range(%s)", startIndex, oriValue.length())); } if (startIndex + length >= oriValue.length()) { newValue = oriValue.substring(0, startIndex) + replaceString; } else { newValue = oriValue.substring(0, startIndex) + replaceString + oriValue.substring(startIndex + length, oriValue.length()); } record.setColumn(columnIndex, new StringColumn(newValue)); } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_RUN_EXCEPTION, e.getMessage(),e); } return record; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/SubstrTransformer.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.transformer.Transformer; import java.util.Arrays; /** * no comments. * Created by liqiang on 16/3/4. */ public class SubstrTransformer extends Transformer { public SubstrTransformer() { setTransformerName("dx_substr"); } @Override public Record evaluate(Record record, Object... paras) { int columnIndex; int startIndex; int length; try { if (paras.length != 3) { throw new RuntimeException("dx_substr paras must be 3"); } columnIndex = (Integer) paras[0]; startIndex = Integer.valueOf((String) paras[1]); length = Integer.valueOf((String) paras[2]); } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "paras:" + Arrays.asList(paras).toString() + " => " + e.getMessage()); } Column column = record.getColumn(columnIndex); try { String oriValue = column.asString(); //如果字段为空,跳过subStr处理 if(oriValue == null){ return record; } String newValue; if (startIndex > oriValue.length()) { throw new RuntimeException(String.format("dx_substr startIndex(%s) out of range(%s)", startIndex, oriValue.length())); } if (startIndex + length >= oriValue.length()) { newValue = oriValue.substring(startIndex, oriValue.length()); } else { newValue = oriValue.substring(startIndex, startIndex + length); } record.setColumn(columnIndex, new StringColumn(newValue)); } catch (Exception e) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_RUN_EXCEPTION, e.getMessage(),e); } return record; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerErrorCode.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.spi.ErrorCode; public enum TransformerErrorCode implements ErrorCode { //重复命名 TRANSFORMER_NAME_ERROR("TransformerErrorCode-01","Transformer name illegal"), TRANSFORMER_DUPLICATE_ERROR("TransformerErrorCode-02","Transformer name has existed"), TRANSFORMER_NOTFOUND_ERROR("TransformerErrorCode-03","Transformer name not found"), TRANSFORMER_CONFIGURATION_ERROR("TransformerErrorCode-04","Transformer configuration error"), TRANSFORMER_ILLEGAL_PARAMETER("TransformerErrorCode-05","Transformer parameter illegal"), TRANSFORMER_RUN_EXCEPTION("TransformerErrorCode-06","Transformer run exception"), TRANSFORMER_GROOVY_INIT_EXCEPTION("TransformerErrorCode-07","Transformer Groovy init exception"), ; private final String code; private final String description; private TransformerErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerExecution.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.transformer.ComplexTransformer; import java.util.Map; /** * 每个func对应一个实例. * Created by liqiang on 16/3/16. */ public class TransformerExecution { private Object[] finalParas; private final TransformerExecutionParas transformerExecutionParas; private final TransformerInfo transformerInfo; public TransformerExecution(TransformerInfo transformerInfo ,TransformerExecutionParas transformerExecutionParas) { this.transformerExecutionParas = transformerExecutionParas; this.transformerInfo = transformerInfo; } /** * 以下是动态统计信息,暂时未用 */ private long exaustedTime = 0; private long successRecords = 0; private long failedRecords = 0; private long filterRecords = 0; /** * 参数采取延迟检查 */ private boolean isChecked = false; public void genFinalParas() { /** * groovy不支持传参 */ if (transformerInfo.getTransformer().getTransformerName().equals("dx_groovy")) { finalParas = new Object[2]; finalParas[0] = transformerExecutionParas.getCode(); finalParas[1] = transformerExecutionParas.getExtraPackage(); return; } /** * 其他function,按照columnIndex和para的顺序,如果columnIndex为空,跳过conlumnIndex */ if (transformerExecutionParas.getColumnIndex() != null) { if (transformerExecutionParas.getParas() != null) { finalParas = new Object[transformerExecutionParas.getParas().length + 1]; System.arraycopy(transformerExecutionParas.getParas(), 0, finalParas, 1, transformerExecutionParas.getParas().length); } else { finalParas = new Object[1]; } finalParas[0] = transformerExecutionParas.getColumnIndex(); } else { if (transformerExecutionParas.getParas() != null) { finalParas = transformerExecutionParas.getParas(); } else { finalParas = null; } } } public Object[] getFinalParas() { return finalParas; } public long getExaustedTime() { return exaustedTime; } public long getSuccessRecords() { return successRecords; } public long getFailedRecords() { return failedRecords; } public long getFilterRecords() { return filterRecords; } public void setIsChecked(boolean isChecked) { this.isChecked = isChecked; } public boolean isChecked() { return isChecked; } /** * 一些代理方法 */ public ClassLoader getClassLoader() { return transformerInfo.getClassLoader(); } public Integer getColumnIndex() { return transformerExecutionParas.getColumnIndex(); } public String getTransformerName() { return transformerInfo.getTransformer().getTransformerName(); } public ComplexTransformer getTransformer() { return transformerInfo.getTransformer(); } public Map gettContext() { return transformerExecutionParas.gettContext(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerExecutionParas.java ================================================ package com.alibaba.datax.core.transport.transformer; import java.util.List; import java.util.Map; /** * no comments. * Created by liqiang on 16/3/16. */ public class TransformerExecutionParas { /** * 以下是function参数 */ private Integer columnIndex; private String[] paras; private Map tContext; private String code; private List extraPackage; public Integer getColumnIndex() { return columnIndex; } public String[] getParas() { return paras; } public Map gettContext() { return tContext; } public String getCode() { return code; } public List getExtraPackage() { return extraPackage; } public void setColumnIndex(Integer columnIndex) { this.columnIndex = columnIndex; } public void setParas(String[] paras) { this.paras = paras; } public void settContext(Map tContext) { this.tContext = tContext; } public void setCode(String code) { this.code = code; } public void setExtraPackage(List extraPackage) { this.extraPackage = extraPackage; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerInfo.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.transformer.ComplexTransformer; /** * 单实例. * Created by liqiang on 16/3/9. */ public class TransformerInfo { /** * function基本信息 */ private ComplexTransformer transformer; private ClassLoader classLoader; private boolean isNative; public ComplexTransformer getTransformer() { return transformer; } public ClassLoader getClassLoader() { return classLoader; } public boolean isNative() { return isNative; } public void setTransformer(ComplexTransformer transformer) { this.transformer = transformer; } public void setClassLoader(ClassLoader classLoader) { this.classLoader = classLoader; } public void setIsNative(boolean isNative) { this.isNative = isNative; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/transport/transformer/TransformerRegistry.java ================================================ package com.alibaba.datax.core.transport.transformer; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.core.util.container.JarLoader; import com.alibaba.datax.transformer.ComplexTransformer; import com.alibaba.datax.transformer.Transformer; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.File; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * no comments. * Created by liqiang on 16/3/3. */ public class TransformerRegistry { private static final Logger LOG = LoggerFactory.getLogger(TransformerRegistry.class); private static Map registedTransformer = new HashMap(); static { /** * add native transformer * local storage and from server will be delay load. */ registTransformer(new SubstrTransformer()); registTransformer(new PadTransformer()); registTransformer(new ReplaceTransformer()); registTransformer(new FilterTransformer()); registTransformer(new GroovyTransformer()); registTransformer(new DigestTransformer()); } public static void loadTransformerFromLocalStorage() { //add local_storage transformer loadTransformerFromLocalStorage(null); } public static void loadTransformerFromLocalStorage(List transformers) { String[] paths = new File(CoreConstant.DATAX_STORAGE_TRANSFORMER_HOME).list(); if (null == paths) { return; } for (final String each : paths) { try { if (transformers == null || transformers.contains(each)) { loadTransformer(each); } } catch (Exception e) { LOG.error(String.format("skip transformer(%s) loadTransformer has Exception(%s)", each, e.getMessage()), e); } } } public static void loadTransformer(String each) { String transformerPath = CoreConstant.DATAX_STORAGE_TRANSFORMER_HOME + File.separator + each; Configuration transformerConfiguration; try { transformerConfiguration = loadTransFormerConfig(transformerPath); } catch (Exception e) { LOG.error(String.format("skip transformer(%s),load transformer.json error, path = %s, ", each, transformerPath), e); return; } String className = transformerConfiguration.getString("class"); if (StringUtils.isEmpty(className)) { LOG.error(String.format("skip transformer(%s),class not config, path = %s, config = %s", each, transformerPath, transformerConfiguration.beautify())); return; } String funName = transformerConfiguration.getString("name"); if (!each.equals(funName)) { LOG.warn(String.format("transformer(%s) name not match transformer.json config name[%s], will ignore json's name, path = %s, config = %s", each, funName, transformerPath, transformerConfiguration.beautify())); } JarLoader jarLoader = new JarLoader(new String[]{transformerPath}); try { Class transformerClass = jarLoader.loadClass(className); Object transformer = transformerClass.newInstance(); if (ComplexTransformer.class.isAssignableFrom(transformer.getClass())) { ((ComplexTransformer) transformer).setTransformerName(each); registComplexTransformer((ComplexTransformer) transformer, jarLoader, false); } else if (Transformer.class.isAssignableFrom(transformer.getClass())) { ((Transformer) transformer).setTransformerName(each); registTransformer((Transformer) transformer, jarLoader, false); } else { LOG.error(String.format("load Transformer class(%s) error, path = %s", className, transformerPath)); } } catch (Exception e) { //错误funciton跳过 LOG.error(String.format("skip transformer(%s),load Transformer class error, path = %s ", each, transformerPath), e); } } private static Configuration loadTransFormerConfig(String transformerPath) { return Configuration.from(new File(transformerPath + File.separator + "transformer.json")); } public static TransformerInfo getTransformer(String transformerName) { TransformerInfo result = registedTransformer.get(transformerName); //if (result == null) { //todo 再尝试从disk读取 //} return result; } public static synchronized void registTransformer(Transformer transformer) { registTransformer(transformer, null, true); } public static synchronized void registTransformer(Transformer transformer, ClassLoader classLoader, boolean isNative) { checkName(transformer.getTransformerName(), isNative); if (registedTransformer.containsKey(transformer.getTransformerName())) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_DUPLICATE_ERROR, " name=" + transformer.getTransformerName()); } registedTransformer.put(transformer.getTransformerName(), buildTransformerInfo(new ComplexTransformerProxy(transformer), isNative, classLoader)); } public static synchronized void registComplexTransformer(ComplexTransformer complexTransformer, ClassLoader classLoader, boolean isNative) { checkName(complexTransformer.getTransformerName(), isNative); if (registedTransformer.containsKey(complexTransformer.getTransformerName())) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_DUPLICATE_ERROR, " name=" + complexTransformer.getTransformerName()); } registedTransformer.put(complexTransformer.getTransformerName(), buildTransformerInfo(complexTransformer, isNative, classLoader)); } private static void checkName(String functionName, boolean isNative) { boolean checkResult = true; if (isNative) { if (!functionName.startsWith("dx_")) { checkResult = false; } } else { if (functionName.startsWith("dx_")) { checkResult = false; } } if (!checkResult) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_NAME_ERROR, " name=" + functionName + ": isNative=" + isNative); } } private static TransformerInfo buildTransformerInfo(ComplexTransformer complexTransformer, boolean isNative, ClassLoader classLoader) { TransformerInfo transformerInfo = new TransformerInfo(); transformerInfo.setClassLoader(classLoader); transformerInfo.setIsNative(isNative); transformerInfo.setTransformer(complexTransformer); return transformerInfo; } public static List getAllSuportTransformer() { return new ArrayList(registedTransformer.keySet()); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/ClassSize.java ================================================ package com.alibaba.datax.core.util; /** * Created by liqiang on 15/12/12. */ public class ClassSize { public static final int DefaultRecordHead; public static final int ColumnHead; //objectHead的大小 public static final int REFERENCE; public static final int OBJECT; public static final int ARRAY; public static final int ARRAYLIST; static { //only 64位 REFERENCE = 8; OBJECT = 2 * REFERENCE; ARRAY = align(3 * REFERENCE); // 16+8+24+16 ARRAYLIST = align(OBJECT + align(REFERENCE) + align(ARRAY) + (2 * Long.SIZE / Byte.SIZE)); // 8+64+8 DefaultRecordHead = align(align(REFERENCE) + ClassSize.ARRAYLIST + 2 * Integer.SIZE / Byte.SIZE); //16+4 ColumnHead = align(2 * REFERENCE + Integer.SIZE / Byte.SIZE); } public static int align(int num) { return (int)(align((long)num)); } public static long align(long num) { //The 7 comes from that the alignSize is 8 which is the number of bytes //stored and sent together return ((num + 7) >> 3) << 3; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/ClassUtil.java ================================================ package com.alibaba.datax.core.util; import java.lang.reflect.Constructor; public final class ClassUtil { /** * 通过反射构造类对象 * * @param className * 反射的类名称 * @param t * 反射类的类型Class对象 * @param args * 构造参数 * * */ @SuppressWarnings({ "rawtypes", "unchecked" }) public static T instantiate(String className, Class t, Object... args) { try { Constructor constructor = (Constructor) Class.forName(className) .getConstructor(ClassUtil.toClassType(args)); return (T) constructor.newInstance(args); } catch (Exception e) { throw new IllegalArgumentException(e); } } private static Class[] toClassType(Object[] args) { Class[] clazzs = new Class[args.length]; for (int i = 0, length = args.length; i < length; i++) { clazzs[i] = args[i].getClass(); } return clazzs; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/ConfigParser.java ================================================ package com.alibaba.datax.core.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.util.container.CoreConstant; import org.apache.commons.io.FileUtils; import org.apache.commons.lang.StringUtils; import org.apache.http.client.methods.HttpGet; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.File; import java.io.IOException; import java.net.URL; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.Set; public final class ConfigParser { private static final Logger LOG = LoggerFactory.getLogger(ConfigParser.class); /** * 指定Job配置路径,ConfigParser会解析Job、Plugin、Core全部信息,并以Configuration返回 */ public static Configuration parse(final String jobPath) { Configuration configuration = ConfigParser.parseJobConfig(jobPath); configuration.merge( ConfigParser.parseCoreConfig(CoreConstant.DATAX_CONF_PATH), false); // todo config优化,只捕获需要的plugin String readerPluginName = configuration.getString( CoreConstant.DATAX_JOB_CONTENT_READER_NAME); String writerPluginName = configuration.getString( CoreConstant.DATAX_JOB_CONTENT_WRITER_NAME); String preHandlerName = configuration.getString( CoreConstant.DATAX_JOB_PREHANDLER_PLUGINNAME); String postHandlerName = configuration.getString( CoreConstant.DATAX_JOB_POSTHANDLER_PLUGINNAME); Set pluginList = new HashSet(); pluginList.add(readerPluginName); pluginList.add(writerPluginName); if(StringUtils.isNotEmpty(preHandlerName)) { pluginList.add(preHandlerName); } if(StringUtils.isNotEmpty(postHandlerName)) { pluginList.add(postHandlerName); } try { configuration.merge(parsePluginConfig(new ArrayList(pluginList)), false); }catch (Exception e){ //吞掉异常,保持log干净。这里message足够。 LOG.warn(String.format("插件[%s,%s]加载失败,1s后重试... Exception:%s ", readerPluginName, writerPluginName, e.getMessage())); try { Thread.sleep(1000); } catch (InterruptedException e1) { // } configuration.merge(parsePluginConfig(new ArrayList(pluginList)), false); } return configuration; } private static Configuration parseCoreConfig(final String path) { return Configuration.from(new File(path)); } public static Configuration parseJobConfig(final String path) { String jobContent = getJobContent(path); Configuration config = Configuration.from(jobContent); return SecretUtil.decryptSecretKey(config); } private static String getJobContent(String jobResource) { String jobContent; boolean isJobResourceFromHttp = jobResource.trim().toLowerCase().startsWith("http"); if (isJobResourceFromHttp) { //设置httpclient的 HTTP_TIMEOUT_INMILLIONSECONDS Configuration coreConfig = ConfigParser.parseCoreConfig(CoreConstant.DATAX_CONF_PATH); int httpTimeOutInMillionSeconds = coreConfig.getInt( CoreConstant.DATAX_CORE_DATAXSERVER_TIMEOUT, 5000); HttpClientUtil.setHttpTimeoutInMillionSeconds(httpTimeOutInMillionSeconds); HttpClientUtil httpClientUtil = new HttpClientUtil(); try { URL url = new URL(jobResource); HttpGet httpGet = HttpClientUtil.getGetRequest(); httpGet.setURI(url.toURI()); jobContent = httpClientUtil.executeAndGetWithFailedRetry(httpGet, 1, 1000l); } catch (Exception e) { throw DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR, "获取作业配置信息失败:" + jobResource, e); } } else { // jobResource 是本地文件绝对路径 try { jobContent = FileUtils.readFileToString(new File(jobResource)); } catch (IOException e) { throw DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR, "获取作业配置信息失败:" + jobResource, e); } } if (jobContent == null) { throw DataXException.asDataXException(FrameworkErrorCode.CONFIG_ERROR, "获取作业配置信息失败:" + jobResource); } return jobContent; } public static Configuration parsePluginConfig(List wantPluginNames) { Configuration configuration = Configuration.newDefault(); Set replicaCheckPluginSet = new HashSet(); int complete = 0; for (final String each : ConfigParser .getDirAsList(CoreConstant.DATAX_PLUGIN_READER_HOME)) { Configuration eachReaderConfig = ConfigParser.parseOnePluginConfig(each, "reader", replicaCheckPluginSet, wantPluginNames); if(eachReaderConfig!=null) { configuration.merge(eachReaderConfig, true); complete += 1; } } for (final String each : ConfigParser .getDirAsList(CoreConstant.DATAX_PLUGIN_WRITER_HOME)) { Configuration eachWriterConfig = ConfigParser.parseOnePluginConfig(each, "writer", replicaCheckPluginSet, wantPluginNames); if(eachWriterConfig!=null) { configuration.merge(eachWriterConfig, true); complete += 1; } } if (wantPluginNames != null && wantPluginNames.size() > 0 && wantPluginNames.size() != complete) { throw DataXException.asDataXException(FrameworkErrorCode.PLUGIN_INIT_ERROR, "插件加载失败,未完成指定插件加载:" + wantPluginNames); } return configuration; } public static Configuration parseOnePluginConfig(final String path, final String type, Set pluginSet, List wantPluginNames) { String filePath = path + File.separator + "plugin.json"; Configuration configuration = Configuration.from(new File(filePath)); String pluginPath = configuration.getString("path"); String pluginName = configuration.getString("name"); if(!pluginSet.contains(pluginName)) { pluginSet.add(pluginName); } else { throw DataXException.asDataXException(FrameworkErrorCode.PLUGIN_INIT_ERROR, "插件加载失败,存在重复插件:" + filePath); } //不是想要的插件,返回null if (wantPluginNames != null && wantPluginNames.size() > 0 && !wantPluginNames.contains(pluginName)) { return null; } boolean isDefaultPath = StringUtils.isBlank(pluginPath); if (isDefaultPath) { configuration.set("path", path); configuration.set("loadType","jarLoader"); } Configuration result = Configuration.newDefault(); result.set( String.format("plugin.%s.%s", type, pluginName), configuration.getInternal()); return result; } private static List getDirAsList(String path) { List result = new ArrayList(); String[] paths = new File(path).list(); if (null == paths) { return result; } for (final String each : paths) { result.add(path + File.separator + each); } return result; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/ConfigurationValidate.java ================================================ package com.alibaba.datax.core.util; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang.Validate; /** * Created by jingxing on 14-9-16. * * 对配置文件做整体检查 */ public class ConfigurationValidate { public static void doValidate(Configuration allConfig) { Validate.isTrue(allConfig!=null, ""); coreValidate(allConfig); pluginValidate(allConfig); jobValidate(allConfig); } private static void coreValidate(Configuration allconfig) { return; } private static void pluginValidate(Configuration allConfig) { return; } private static void jobValidate(Configuration allConfig) { return; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/ErrorRecordChecker.java ================================================ package com.alibaba.datax.core.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.statistics.communication.Communication; import com.alibaba.datax.core.statistics.communication.CommunicationTool; import com.alibaba.datax.core.util.container.CoreConstant; import org.apache.commons.lang3.Validate; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * 检查任务是否到达错误记录限制。有检查条数(recordLimit)和百分比(percentageLimit)两种方式。 * 1. errorRecord表示出错条数不能大于限制数,当超过时任务失败。比如errorRecord为0表示不容许任何脏数据。 * 2. errorPercentage表示出错比例,在任务结束时校验。 * 3. errorRecord优先级高于errorPercentage。 */ public final class ErrorRecordChecker { private static final Logger LOG = LoggerFactory .getLogger(ErrorRecordChecker.class); private Long recordLimit; private Double percentageLimit; public ErrorRecordChecker(Configuration configuration) { this(configuration.getLong(CoreConstant.DATAX_JOB_SETTING_ERRORLIMIT_RECORD), configuration.getDouble(CoreConstant.DATAX_JOB_SETTING_ERRORLIMIT_PERCENT)); } public ErrorRecordChecker(Long rec, Double percentage) { recordLimit = rec; percentageLimit = percentage; if (percentageLimit != null) { Validate.isTrue(0.0 <= percentageLimit && percentageLimit <= 1.0, "脏数据百分比限制应该在[0.0, 1.0]之间"); } if (recordLimit != null) { Validate.isTrue(recordLimit >= 0, "脏数据条数现在应该为非负整数"); // errorRecord优先级高于errorPercentage. percentageLimit = null; } } public void checkRecordLimit(Communication communication) { if (recordLimit == null) { return; } long errorNumber = CommunicationTool.getTotalErrorRecords(communication); if (recordLimit < errorNumber) { LOG.debug( String.format("Error-limit set to %d, error count check.", recordLimit)); throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_DIRTY_DATA_LIMIT_EXCEED, String.format("脏数据条数检查不通过,限制是[%d]条,但实际上捕获了[%d]条.", recordLimit, errorNumber)); } } public void checkPercentageLimit(Communication communication) { if (percentageLimit == null) { return; } LOG.debug(String.format( "Error-limit set to %f, error percent check.", percentageLimit)); long total = CommunicationTool.getTotalReadRecords(communication); long error = CommunicationTool.getTotalErrorRecords(communication); if (total > 0 && ((double) error / (double) total) > percentageLimit) { throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_DIRTY_DATA_LIMIT_EXCEED, String.format("脏数据百分比检查不通过,限制是[%f],但实际上捕获到[%f].", percentageLimit, ((double) error / (double) total))); } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/ExceptionTracker.java ================================================ package com.alibaba.datax.core.util; import java.io.PrintWriter; import java.io.StringWriter; public class ExceptionTracker { public static final int STRING_BUFFER = 4096; public static String trace(Throwable ex) { StringWriter sw = new StringWriter(STRING_BUFFER); PrintWriter pw = new PrintWriter(sw); ex.printStackTrace(pw); return sw.toString(); } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/FrameworkErrorCode.java ================================================ package com.alibaba.datax.core.util; import com.alibaba.datax.common.spi.ErrorCode; /** * TODO: 根据现有日志数据分析各类错误,进行细化。 * *

请不要格式化本类代码

*/ public enum FrameworkErrorCode implements ErrorCode { INSTALL_ERROR("Framework-00", "DataX引擎安装错误, 请联系您的运维解决 ."), ARGUMENT_ERROR("Framework-01", "DataX引擎运行错误,该问题通常是由于内部编程错误引起,请联系DataX开发团队解决 ."), RUNTIME_ERROR("Framework-02", "DataX引擎运行过程出错,具体原因请参看DataX运行结束时的错误诊断信息 ."), CONFIG_ERROR("Framework-03", "DataX引擎配置错误,该问题通常是由于DataX安装错误引起,请联系您的运维解决 ."), SECRET_ERROR("Framework-04", "DataX引擎加解密出错,该问题通常是由于DataX密钥配置错误引起,请联系您的运维解决 ."), HOOK_LOAD_ERROR("Framework-05", "加载外部Hook出现错误,通常是由于DataX安装引起的"), HOOK_FAIL_ERROR("Framework-06", "执行外部Hook出现错误"), PLUGIN_INSTALL_ERROR("Framework-10", "DataX插件安装错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 ."), PLUGIN_NOT_FOUND("Framework-11", "DataX插件配置错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 ."), PLUGIN_INIT_ERROR("Framework-12", "DataX插件初始化错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 ."), PLUGIN_RUNTIME_ERROR("Framework-13", "DataX插件运行时出错, 具体原因请参看DataX运行结束时的错误诊断信息 ."), PLUGIN_DIRTY_DATA_LIMIT_EXCEED("Framework-14", "DataX传输脏数据超过用户预期,该错误通常是由于源端数据存在较多业务脏数据导致,请仔细检查DataX汇报的脏数据日志信息, 或者您可以适当调大脏数据阈值 ."), PLUGIN_SPLIT_ERROR("Framework-15", "DataX插件切分出错, 该问题通常是由于DataX各个插件编程错误引起,请联系DataX开发团队解决"), KILL_JOB_TIMEOUT_ERROR("Framework-16", "kill 任务超时,请联系PE解决"), START_TASKGROUP_ERROR("Framework-17", "taskGroup启动失败,请联系DataX开发团队解决"), CALL_DATAX_SERVICE_FAILED("Framework-18", "请求 DataX Service 出错."), CALL_REMOTE_FAILED("Framework-19", "远程调用失败"), KILLED_EXIT_VALUE("Framework-143", "Job 收到了 Kill 命令."); private final String code; private final String description; private FrameworkErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } /** * 通过 "Framework-143" 来标示 任务是 Killed 状态 */ public int toExitValue() { if (this == FrameworkErrorCode.KILLED_EXIT_VALUE) { return 143; } else { return 1; } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/HttpClientUtil.java ================================================ package com.alibaba.datax.core.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.RetryUtil; import org.apache.http.Consts; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.HttpStatus; import org.apache.http.auth.AuthScope; import org.apache.http.auth.UsernamePasswordCredentials; import org.apache.http.client.CredentialsProvider; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.*; import org.apache.http.impl.client.BasicCredentialsProvider; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClientBuilder; import org.apache.http.util.EntityUtils; import java.io.IOException; import java.util.Properties; import java.util.concurrent.Callable; import java.util.concurrent.ThreadPoolExecutor; public class HttpClientUtil { private static CredentialsProvider provider; private CloseableHttpClient httpClient; private volatile static HttpClientUtil clientUtil; //构建httpclient的时候一定要设置这两个参数。淘宝很多生产故障都由此引起 private static int HTTP_TIMEOUT_INMILLIONSECONDS = 5000; private static final int POOL_SIZE = 20; private static ThreadPoolExecutor asyncExecutor = RetryUtil.createThreadPoolExecutor(); public static void setHttpTimeoutInMillionSeconds(int httpTimeoutInMillionSeconds) { HTTP_TIMEOUT_INMILLIONSECONDS = httpTimeoutInMillionSeconds; } public static synchronized HttpClientUtil getHttpClientUtil() { if (null == clientUtil) { synchronized (HttpClientUtil.class) { if (null == clientUtil) { clientUtil = new HttpClientUtil(); } } } return clientUtil; } public HttpClientUtil() { Properties prob = SecretUtil.getSecurityProperties(); HttpClientUtil.setBasicAuth(prob.getProperty("auth.user"),prob.getProperty("auth.pass")); initApacheHttpClient(); } public void destroy() { destroyApacheHttpClient(); } public static void setBasicAuth(String username, String password) { provider = new BasicCredentialsProvider(); provider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(username,password)); } // 创建包含connection pool与超时设置的client private void initApacheHttpClient() { RequestConfig requestConfig = RequestConfig.custom().setSocketTimeout(HTTP_TIMEOUT_INMILLIONSECONDS) .setConnectTimeout(HTTP_TIMEOUT_INMILLIONSECONDS).setConnectionRequestTimeout(HTTP_TIMEOUT_INMILLIONSECONDS) .setStaleConnectionCheckEnabled(true).build(); if(null == provider) { httpClient = HttpClientBuilder.create().setMaxConnTotal(POOL_SIZE).setMaxConnPerRoute(POOL_SIZE) .setDefaultRequestConfig(requestConfig).build(); } else { httpClient = HttpClientBuilder.create().setMaxConnTotal(POOL_SIZE).setMaxConnPerRoute(POOL_SIZE) .setDefaultRequestConfig(requestConfig).setDefaultCredentialsProvider(provider).build(); } } private void destroyApacheHttpClient() { try { if (httpClient != null) { httpClient.close(); httpClient = null; } } catch (IOException e) { e.printStackTrace(); } } public static HttpGet getGetRequest() { return new HttpGet(); } public static HttpPost getPostRequest() { return new HttpPost(); } public static HttpPut getPutRequest() { return new HttpPut(); } public static HttpDelete getDeleteRequest() { return new HttpDelete(); } public String executeAndGet(HttpRequestBase httpRequestBase) throws Exception { HttpResponse response; String entiStr = ""; try { response = httpClient.execute(httpRequestBase); if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) { System.err.println("请求地址:" + httpRequestBase.getURI() + ", 请求方法:" + httpRequestBase.getMethod() + ",STATUS CODE = " + response.getStatusLine().getStatusCode()); if (httpRequestBase != null) { httpRequestBase.abort(); } throw new Exception("Response Status Code : " + response.getStatusLine().getStatusCode()); } else { HttpEntity entity = response.getEntity(); if (entity != null) { entiStr = EntityUtils.toString(entity, Consts.UTF_8); } else { throw new Exception("Response Entity Is Null"); } } } catch (Exception e) { throw e; } return entiStr; } public String executeAndGetWithRetry(final HttpRequestBase httpRequestBase, final int retryTimes, final long retryInterval) { try { return RetryUtil.asyncExecuteWithRetry(new Callable() { @Override public String call() throws Exception { return executeAndGet(httpRequestBase); } }, retryTimes, retryInterval, true, HTTP_TIMEOUT_INMILLIONSECONDS + 1000, asyncExecutor); } catch (Exception e) { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, e); } } public String executeAndGetWithFailedRetry(final HttpRequestBase httpRequestBase, final int retryTimes, final long retryInterval){ try { return RetryUtil.asyncExecuteWithRetry(new Callable() { @Override public String call() throws Exception { String result = executeAndGet(httpRequestBase); if(result!=null && result.startsWith("{\"result\":-1")){ throw DataXException.asDataXException(FrameworkErrorCode.CALL_REMOTE_FAILED, "远程接口返回-1,将重试"); } return result; } }, retryTimes, retryInterval, true, HTTP_TIMEOUT_INMILLIONSECONDS + 1000, asyncExecutor); } catch (Exception e) { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, e); } } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/LocalStrings.properties ================================================ configparser.1=\u63D2\u4EF6[{0},{1}]\u52A0\u8F7D\u5931\u8D25\uFF0C1s\u540E\u91CD\u8BD5... Exception:{2} configparser.2=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.3=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.4=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.5=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25\uFF0C\u672A\u5B8C\u6210\u6307\u5B9A\u63D2\u4EF6\u52A0\u8F7D:{0} configparser.6=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25,\u5B58\u5728\u91CD\u590D\u63D2\u4EF6:{0} dataxserviceutil.1=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38NoSuchAlgorithmException, [{0}] dataxserviceutil.2=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38InvalidKeyException, [{0}] dataxserviceutil.3=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38UnsupportedEncodingException, [{0}] errorrecordchecker.1=\u810F\u6570\u636E\u767E\u5206\u6BD4\u9650\u5236\u5E94\u8BE5\u5728[0.0, 1.0]\u4E4B\u95F4 errorrecordchecker.2=\u810F\u6570\u636E\u6761\u6570\u73B0\u5728\u5E94\u8BE5\u4E3A\u975E\u8D1F\u6574\u6570 errorrecordchecker.3=\u810F\u6570\u636E\u6761\u6570\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\u6761\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u4E86[{1}]\u6761. errorrecordchecker.4=\u810F\u6570\u636E\u767E\u5206\u6BD4\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u5230[{1}]. errorcode.install_error=DataX\u5F15\u64CE\u5B89\u88C5\u9519\u8BEF, \u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.argument_error=DataX\u5F15\u64CE\u8FD0\u884C\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8E\u5185\u90E8\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 . errorcode.runtime_error=DataX\u5F15\u64CE\u8FD0\u884C\u8FC7\u7A0B\u51FA\u9519\uFF0C\u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.config_error=DataX\u5F15\u64CE\u914D\u7F6E\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.secret_error=DataX\u5F15\u64CE\u52A0\u89E3\u5BC6\u51FA\u9519\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.hook_load_error=\u52A0\u8F7D\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF\uFF0C\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u5F15\u8D77\u7684 errorcode.hook_fail_error=\u6267\u884C\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF errorcode.plugin_install_error=DataX\u63D2\u4EF6\u5B89\u88C5\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_not_found=DataX\u63D2\u4EF6\u914D\u7F6E\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_init_error=DataX\u63D2\u4EF6\u521D\u59CB\u5316\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_runtime_error=DataX\u63D2\u4EF6\u8FD0\u884C\u65F6\u51FA\u9519, \u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.plugin_dirty_data_limit_exceed=DataX\u4F20\u8F93\u810F\u6570\u636E\u8D85\u8FC7\u7528\u6237\u9884\u671F\uFF0C\u8BE5\u9519\u8BEF\u901A\u5E38\u662F\u7531\u4E8E\u6E90\u7AEF\u6570\u636E\u5B58\u5728\u8F83\u591A\u4E1A\u52A1\u810F\u6570\u636E\u5BFC\u81F4\uFF0C\u8BF7\u4ED4\u7EC6\u68C0\u67E5DataX\u6C47\u62A5\u7684\u810F\u6570\u636E\u65E5\u5FD7\u4FE1\u606F, \u6216\u8005\u60A8\u53EF\u4EE5\u9002\u5F53\u8C03\u5927\u810F\u6570\u636E\u9608\u503C . errorcode.plugin_split_error=DataX\u63D2\u4EF6\u5207\u5206\u51FA\u9519, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5404\u4E2A\u63D2\u4EF6\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.kill_job_timeout_error=kill \u4EFB\u52A1\u8D85\u65F6\uFF0C\u8BF7\u8054\u7CFBPE\u89E3\u51B3 errorcode.start_taskgroup_error=taskGroup\u542F\u52A8\u5931\u8D25,\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.call_datax_service_failed=\u8BF7\u6C42 DataX Service \u51FA\u9519. errorcode.call_remote_failed=\u8FDC\u7A0B\u8C03\u7528\u5931\u8D25 errorcode.killed_exit_value=Job \u6536\u5230\u4E86 Kill \u547D\u4EE4. httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1}, STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 secretutil.1=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.2=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.3=rsa\u52A0\u5BC6\u51FA\u9519 secretutil.4=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.5=3\u91CDDES\u52A0\u5BC6\u51FA\u9519 secretutil.6=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.7=\u6784\u5EFA\u4E09\u91CDDES\u5BC6\u5319\u51FA\u9519 secretutil.8=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u5BC6\u94A5\u7684\u914D\u7F6E\u6587\u4EF6 secretutil.9=\u8BFB\u53D6\u52A0\u89E3\u5BC6\u914D\u7F6E\u6587\u4EF6\u51FA\u9519 secretutil.10=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.11=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.12=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.13=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.14=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C[{0}]\u5B58\u5728\u5BC6\u94A5\u4E3A\u7A7A\u7684\u60C5\u51B5 secretutil.15=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u516C\u79C1\u94A5\u5BF9\u5B58\u5728\u4E3A\u7A7A\u7684\u60C5\u51B5\uFF0C\u7248\u672C[{0}] secretutil.16=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u52A0\u89E3\u5BC6\u914D\u7F6E ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/LocalStrings_en_US.properties ================================================ configparser.1=Failed to load the plug-in [{0},{1}]. We will retry in 1s... Exception: {2} configparser.2=Failed to obtain the job configuration information: {0} configparser.3=Failed to obtain the job configuration information: {0} configparser.4=Failed to obtain the job configuration information: {0} configparser.5=Failed to load the plug-in. Loading of the specific plug-in:{0} is not completed configparser.6=Failed to load the plug-in. A duplicate plug-in: {0} exists dataxserviceutil.1=Exception in creating signature. NoSuchAlgorithmException, [{0}] dataxserviceutil.2=Exception in creating signature. InvalidKeyException, [{0}] dataxserviceutil.3=Exception in creating signature. UnsupportedEncodingException, [{0}] errorrecordchecker.1=The percentage of dirty data should be limited to within [0.0, 1.0] errorrecordchecker.2=The number of dirty data entries should now be a nonnegative integer errorrecordchecker.3=Check for the number of dirty data entries has not passed. The limit is [{0}] entries, but [{1}] entries have been captured. errorrecordchecker.4=Check for the percentage of dirty data has not passed. The limit is [{0}], but [{1}] of dirty data has been captured. errorcode.install_error=Error in installing DataX engine. Please contact your O&M team to solve the problem. errorcode.argument_error=Error in running DataX engine. This problem is generally caused by an internal programming error. Please contact the DataX developer team to solve the problem. errorcode.runtime_error=The DataX engine encountered an error during running. For the specific cause, refer to the error diagnosis after DataX stops running. errorcode.config_error=Error in DataX engine configuration. This problem is generally caused by a DataX installation error. Please contact your O&M team to solve the problem. errorcode.secret_error=Error in DataX engine encryption or decryption. This problem is generally caused by a DataX key configuration error. Please contact your O&M team to solve the problem. errorcode.hook_load_error=Error in loading the external hook. This problem is generally caused by the DataX installation. errorcode.hook_fail_error=Error in executing the external hook errorcode.plugin_install_error=Error in installing DataX plug-in. This problem is generally caused by a DataX installation error. Please contact your O&M team to solve the problem. errorcode.plugin_not_found=Error in DataX plug-in configuration. This problem is generally caused by a DataX installation error. Please contact your O&M team to solve the problem. errorcode.plugin_init_error=Error in DataX plug-in initialization. This problem is generally caused by a DataX installation error. Please contact your O&M team to solve the problem. errorcode.plugin_runtime_error=The DataX plug-in encountered an error during running. For the specific cause, refer to the error diagnosis after DataX stops running. errorcode.plugin_dirty_data_limit_exceed=The dirty data transmitted by DataX exceeds user expectations. This error often occurs when a lot dirty data exists in the source data. Please carefully check the dirty data log information reported by DataX, or you can tune up the dirty data threshold value. errorcode.plugin_split_error=Error in DataX plug-in slicing. This problem is generally caused by a programming error in some DataX plug-in. Please contact the DataX developer team to solve the problem. errorcode.kill_job_timeout_error=The kill task times out. Please contact the PE to solve the problem errorcode.start_taskgroup_error=Failed to start the task group. Please contact the DataX developer team to solve the problem errorcode.call_datax_service_failed=Error in requesting DataX Service. errorcode.call_remote_failed=Remote call failure errorcode.killed_exit_value=The job has received a Kill command. httpclientutil.1=Request address: {0}. Request method: {1}. STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=The remote interface returns -1. We will try again secretutil.1=System programing error. Unsupported encryption type secretutil.2=System programing error. Unsupported encryption type secretutil.3=RSA encryption error secretutil.4=RSA decryption error secretutil.5=Triple DES encryption error secretutil.6=RSA decryption error secretutil.7=Error in building Triple DES key secretutil.8=DataX configuration requires encryption and decryption, but unable to find the key configuration file secretutil.9=Error in reading the encryption and decryption configuration file secretutil.10=The version of the DataX-configured key is [{0}], but there is no configuration in the system. Error in task key configuration. The key version you configured does not exist secretutil.11=The version of the DataX-configured key is [{0}], but there is no configuration in the system. There may be an error in task key configuration, or a problem in system maintenance secretutil.12=The version of the DataX-configured key is [{0}], but there is no configuration in the system. Error in task key configuration. The key version you configured does not exist secretutil.13=The version of the DataX-configured key is [{0}], but there is no configuration in the system. There may be an error in task key configuration, or a problem in system maintenance secretutil.14=DataX configuration requires encryption and decryption, but some key in the configured key version [{0}] is empty secretutil.15=DataX configuration requires encryption and decryption, but some configured public/private key pairs are empty and the version is [{0}] secretutil.16=DataX configuration requires encryption and decryption, but the encryption and decryption configuration cannot be found ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/LocalStrings_ja_JP.properties ================================================ configparser.1=\u63D2\u4EF6[{0},{1}]\u52A0\u8F7D\u5931\u8D25\uFF0C1s\u540E\u91CD\u8BD5... Exception:{2} configparser.2=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.3=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.4=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.5=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25\uFF0C\u672A\u5B8C\u6210\u6307\u5B9A\u63D2\u4EF6\u52A0\u8F7D:{0} configparser.6=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25,\u5B58\u5728\u91CD\u590D\u63D2\u4EF6:{0} dataxserviceutil.1=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38NoSuchAlgorithmException, [{0}] dataxserviceutil.2=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38InvalidKeyException, [{0}] dataxserviceutil.3=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38UnsupportedEncodingException, [{0}] errorrecordchecker.1=\u810F\u6570\u636E\u767E\u5206\u6BD4\u9650\u5236\u5E94\u8BE5\u5728[0.0, 1.0]\u4E4B\u95F4 errorrecordchecker.2=\u810F\u6570\u636E\u6761\u6570\u73B0\u5728\u5E94\u8BE5\u4E3A\u975E\u8D1F\u6574\u6570 errorrecordchecker.3=\u810F\u6570\u636E\u6761\u6570\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\u6761\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u4E86[{1}]\u6761. errorrecordchecker.4=\u810F\u6570\u636E\u767E\u5206\u6BD4\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u5230[{1}]. errorcode.install_error=DataX\u5F15\u64CE\u5B89\u88C5\u9519\u8BEF, \u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.argument_error=DataX\u5F15\u64CE\u8FD0\u884C\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8E\u5185\u90E8\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 . errorcode.runtime_error=DataX\u5F15\u64CE\u8FD0\u884C\u8FC7\u7A0B\u51FA\u9519\uFF0C\u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.config_error=DataX\u5F15\u64CE\u914D\u7F6E\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.secret_error=DataX\u5F15\u64CE\u52A0\u89E3\u5BC6\u51FA\u9519\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.hook_load_error=\u52A0\u8F7D\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF\uFF0C\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u5F15\u8D77\u7684 errorcode.hook_fail_error=\u6267\u884C\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF errorcode.plugin_install_error=DataX\u63D2\u4EF6\u5B89\u88C5\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_not_found=DataX\u63D2\u4EF6\u914D\u7F6E\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_init_error=DataX\u63D2\u4EF6\u521D\u59CB\u5316\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_runtime_error=DataX\u63D2\u4EF6\u8FD0\u884C\u65F6\u51FA\u9519, \u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.plugin_dirty_data_limit_exceed=DataX\u4F20\u8F93\u810F\u6570\u636E\u8D85\u8FC7\u7528\u6237\u9884\u671F\uFF0C\u8BE5\u9519\u8BEF\u901A\u5E38\u662F\u7531\u4E8E\u6E90\u7AEF\u6570\u636E\u5B58\u5728\u8F83\u591A\u4E1A\u52A1\u810F\u6570\u636E\u5BFC\u81F4\uFF0C\u8BF7\u4ED4\u7EC6\u68C0\u67E5DataX\u6C47\u62A5\u7684\u810F\u6570\u636E\u65E5\u5FD7\u4FE1\u606F, \u6216\u8005\u60A8\u53EF\u4EE5\u9002\u5F53\u8C03\u5927\u810F\u6570\u636E\u9608\u503C . errorcode.plugin_split_error=DataX\u63D2\u4EF6\u5207\u5206\u51FA\u9519, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5404\u4E2A\u63D2\u4EF6\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.kill_job_timeout_error=kill \u4EFB\u52A1\u8D85\u65F6\uFF0C\u8BF7\u8054\u7CFBPE\u89E3\u51B3 errorcode.start_taskgroup_error=taskGroup\u542F\u52A8\u5931\u8D25,\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.call_datax_service_failed=\u8BF7\u6C42 DataX Service \u51FA\u9519. errorcode.call_remote_failed=\u8FDC\u7A0B\u8C03\u7528\u5931\u8D25 errorcode.killed_exit_value=Job \u6536\u5230\u4E86 Kill \u547D\u4EE4. httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 secretutil.1=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.2=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.3=rsa\u52A0\u5BC6\u51FA\u9519 secretutil.4=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.5=3\u91CDDES\u52A0\u5BC6\u51FA\u9519 secretutil.6=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.7=\u6784\u5EFA\u4E09\u91CDDES\u5BC6\u5319\u51FA\u9519 secretutil.8=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u5BC6\u94A5\u7684\u914D\u7F6E\u6587\u4EF6 secretutil.9=\u8BFB\u53D6\u52A0\u89E3\u5BC6\u914D\u7F6E\u6587\u4EF6\u51FA\u9519 secretutil.10=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.11=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.12=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.13=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.14=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C[{0}]\u5B58\u5728\u5BC6\u94A5\u4E3A\u7A7A\u7684\u60C5\u51B5 secretutil.15=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u516C\u79C1\u94A5\u5BF9\u5B58\u5728\u4E3A\u7A7A\u7684\u60C5\u51B5\uFF0C\u7248\u672C[{0}] secretutil.16=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u52A0\u89E3\u5BC6\u914D\u7F6E ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/LocalStrings_zh_CN.properties ================================================ configparser.1=\u63D2\u4EF6[{0},{1}]\u52A0\u8F7D\u5931\u8D25\uFF0C1s\u540E\u91CD\u8BD5... Exception:{2} configparser.2=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.3=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.4=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.5=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25\uFF0C\u672A\u5B8C\u6210\u6307\u5B9A\u63D2\u4EF6\u52A0\u8F7D:{0} configparser.6=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25,\u5B58\u5728\u91CD\u590D\u63D2\u4EF6:{0} dataxserviceutil.1=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38NoSuchAlgorithmException, [{0}] dataxserviceutil.2=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38InvalidKeyException, [{0}] dataxserviceutil.3=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38UnsupportedEncodingException, [{0}] errorrecordchecker.1=\u810F\u6570\u636E\u767E\u5206\u6BD4\u9650\u5236\u5E94\u8BE5\u5728[0.0, 1.0]\u4E4B\u95F4 errorrecordchecker.2=\u810F\u6570\u636E\u6761\u6570\u73B0\u5728\u5E94\u8BE5\u4E3A\u975E\u8D1F\u6574\u6570 errorrecordchecker.3=\u810F\u6570\u636E\u6761\u6570\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\u6761\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u4E86[{1}]\u6761. errorrecordchecker.4=\u810F\u6570\u636E\u767E\u5206\u6BD4\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u5230[{1}]. errorcode.install_error=DataX\u5F15\u64CE\u5B89\u88C5\u9519\u8BEF, \u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.argument_error=DataX\u5F15\u64CE\u8FD0\u884C\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8E\u5185\u90E8\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 . errorcode.runtime_error=DataX\u5F15\u64CE\u8FD0\u884C\u8FC7\u7A0B\u51FA\u9519\uFF0C\u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.config_error=DataX\u5F15\u64CE\u914D\u7F6E\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.secret_error=DataX\u5F15\u64CE\u52A0\u89E3\u5BC6\u51FA\u9519\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.hook_load_error=\u52A0\u8F7D\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF\uFF0C\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u5F15\u8D77\u7684 errorcode.hook_fail_error=\u6267\u884C\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF errorcode.plugin_install_error=DataX\u63D2\u4EF6\u5B89\u88C5\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_not_found=DataX\u63D2\u4EF6\u914D\u7F6E\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_init_error=DataX\u63D2\u4EF6\u521D\u59CB\u5316\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_runtime_error=DataX\u63D2\u4EF6\u8FD0\u884C\u65F6\u51FA\u9519, \u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.plugin_dirty_data_limit_exceed=DataX\u4F20\u8F93\u810F\u6570\u636E\u8D85\u8FC7\u7528\u6237\u9884\u671F\uFF0C\u8BE5\u9519\u8BEF\u901A\u5E38\u662F\u7531\u4E8E\u6E90\u7AEF\u6570\u636E\u5B58\u5728\u8F83\u591A\u4E1A\u52A1\u810F\u6570\u636E\u5BFC\u81F4\uFF0C\u8BF7\u4ED4\u7EC6\u68C0\u67E5DataX\u6C47\u62A5\u7684\u810F\u6570\u636E\u65E5\u5FD7\u4FE1\u606F, \u6216\u8005\u60A8\u53EF\u4EE5\u9002\u5F53\u8C03\u5927\u810F\u6570\u636E\u9608\u503C . errorcode.plugin_split_error=DataX\u63D2\u4EF6\u5207\u5206\u51FA\u9519, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5404\u4E2A\u63D2\u4EF6\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.kill_job_timeout_error=kill \u4EFB\u52A1\u8D85\u65F6\uFF0C\u8BF7\u8054\u7CFBPE\u89E3\u51B3 errorcode.start_taskgroup_error=taskGroup\u542F\u52A8\u5931\u8D25,\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.call_datax_service_failed=\u8BF7\u6C42 DataX Service \u51FA\u9519. errorcode.call_remote_failed=\u8FDC\u7A0B\u8C03\u7528\u5931\u8D25 errorcode.killed_exit_value=Job \u6536\u5230\u4E86 Kill \u547D\u4EE4. httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 secretutil.1=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.2=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.3=rsa\u52A0\u5BC6\u51FA\u9519 secretutil.4=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.5=3\u91CDDES\u52A0\u5BC6\u51FA\u9519 secretutil.6=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.7=\u6784\u5EFA\u4E09\u91CDDES\u5BC6\u5319\u51FA\u9519 secretutil.8=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u5BC6\u94A5\u7684\u914D\u7F6E\u6587\u4EF6 secretutil.9=\u8BFB\u53D6\u52A0\u89E3\u5BC6\u914D\u7F6E\u6587\u4EF6\u51FA\u9519 secretutil.10=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.11=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.12=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.13=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.14=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C[{0}]\u5B58\u5728\u5BC6\u94A5\u4E3A\u7A7A\u7684\u60C5\u51B5 secretutil.15=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u516C\u79C1\u94A5\u5BF9\u5B58\u5728\u4E3A\u7A7A\u7684\u60C5\u51B5\uFF0C\u7248\u672C[{0}] secretutil.16=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u52A0\u89E3\u5BC6\u914D\u7F6E ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/LocalStrings_zh_HK.properties ================================================ configparser.1=\u63D2\u4EF6[{0},{1}]\u52A0\u8F7D\u5931\u8D25\uFF0C1s\u540E\u91CD\u8BD5... Exception:{2} configparser.2=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.3=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.4=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.5=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25\uFF0C\u672A\u5B8C\u6210\u6307\u5B9A\u63D2\u4EF6\u52A0\u8F7D:{0} configparser.6=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25,\u5B58\u5728\u91CD\u590D\u63D2\u4EF6:{0} dataxserviceutil.1=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38NoSuchAlgorithmException, [{0}] dataxserviceutil.2=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38InvalidKeyException, [{0}] dataxserviceutil.3=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38UnsupportedEncodingException, [{0}] errorrecordchecker.1=\u810F\u6570\u636E\u767E\u5206\u6BD4\u9650\u5236\u5E94\u8BE5\u5728[0.0, 1.0]\u4E4B\u95F4 errorrecordchecker.2=\u810F\u6570\u636E\u6761\u6570\u73B0\u5728\u5E94\u8BE5\u4E3A\u975E\u8D1F\u6574\u6570 errorrecordchecker.3=\u810F\u6570\u636E\u6761\u6570\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\u6761\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u4E86[{1}]\u6761. errorrecordchecker.4=\u810F\u6570\u636E\u767E\u5206\u6BD4\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u5230[{1}]. errorcode.install_error=DataX\u5F15\u64CE\u5B89\u88C5\u9519\u8BEF, \u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.argument_error=DataX\u5F15\u64CE\u8FD0\u884C\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8E\u5185\u90E8\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 . errorcode.runtime_error=DataX\u5F15\u64CE\u8FD0\u884C\u8FC7\u7A0B\u51FA\u9519\uFF0C\u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.config_error=DataX\u5F15\u64CE\u914D\u7F6E\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.secret_error=DataX\u5F15\u64CE\u52A0\u89E3\u5BC6\u51FA\u9519\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.hook_load_error=\u52A0\u8F7D\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF\uFF0C\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u5F15\u8D77\u7684 errorcode.hook_fail_error=\u6267\u884C\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF errorcode.plugin_install_error=DataX\u63D2\u4EF6\u5B89\u88C5\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_not_found=DataX\u63D2\u4EF6\u914D\u7F6E\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_init_error=DataX\u63D2\u4EF6\u521D\u59CB\u5316\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_runtime_error=DataX\u63D2\u4EF6\u8FD0\u884C\u65F6\u51FA\u9519, \u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.plugin_dirty_data_limit_exceed=DataX\u4F20\u8F93\u810F\u6570\u636E\u8D85\u8FC7\u7528\u6237\u9884\u671F\uFF0C\u8BE5\u9519\u8BEF\u901A\u5E38\u662F\u7531\u4E8E\u6E90\u7AEF\u6570\u636E\u5B58\u5728\u8F83\u591A\u4E1A\u52A1\u810F\u6570\u636E\u5BFC\u81F4\uFF0C\u8BF7\u4ED4\u7EC6\u68C0\u67E5DataX\u6C47\u62A5\u7684\u810F\u6570\u636E\u65E5\u5FD7\u4FE1\u606F, \u6216\u8005\u60A8\u53EF\u4EE5\u9002\u5F53\u8C03\u5927\u810F\u6570\u636E\u9608\u503C . errorcode.plugin_split_error=DataX\u63D2\u4EF6\u5207\u5206\u51FA\u9519, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5404\u4E2A\u63D2\u4EF6\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.kill_job_timeout_error=kill \u4EFB\u52A1\u8D85\u65F6\uFF0C\u8BF7\u8054\u7CFBPE\u89E3\u51B3 errorcode.start_taskgroup_error=taskGroup\u542F\u52A8\u5931\u8D25,\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.call_datax_service_failed=\u8BF7\u6C42 DataX Service \u51FA\u9519. errorcode.call_remote_failed=\u8FDC\u7A0B\u8C03\u7528\u5931\u8D25 errorcode.killed_exit_value=Job \u6536\u5230\u4E86 Kill \u547D\u4EE4. httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 secretutil.1=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.2=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.3=rsa\u52A0\u5BC6\u51FA\u9519 secretutil.4=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.5=3\u91CDDES\u52A0\u5BC6\u51FA\u9519 secretutil.6=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.7=\u6784\u5EFA\u4E09\u91CDDES\u5BC6\u5319\u51FA\u9519 secretutil.8=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u5BC6\u94A5\u7684\u914D\u7F6E\u6587\u4EF6 secretutil.9=\u8BFB\u53D6\u52A0\u89E3\u5BC6\u914D\u7F6E\u6587\u4EF6\u51FA\u9519 secretutil.10=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.11=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.12=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.13=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.14=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C[{0}]\u5B58\u5728\u5BC6\u94A5\u4E3A\u7A7A\u7684\u60C5\u51B5 secretutil.15=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u516C\u79C1\u94A5\u5BF9\u5B58\u5728\u4E3A\u7A7A\u7684\u60C5\u51B5\uFF0C\u7248\u672C[{0}] secretutil.16=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u52A0\u89E3\u5BC6\u914D\u7F6E configparser.1=\u5916\u639B\u7A0B\u5F0F[{0},{1}]\u8F09\u5165\u5931\u6557\uFF0C1s\u5F8C\u91CD\u8A66... Exception:{2} configparser.2=\u7372\u53D6\u4F5C\u696D\u914D\u7F6E\u8CC7\u8A0A\u5931\u6557:{0} configparser.3=\u7372\u53D6\u4F5C\u696D\u914D\u7F6E\u8CC7\u8A0A\u5931\u6557:{0} configparser.4=\u7372\u53D6\u4F5C\u696D\u914D\u7F6E\u8CC7\u8A0A\u5931\u6557:{0} configparser.5=\u5916\u639B\u7A0B\u5F0F\u8F09\u5165\u5931\u6557\uFF0C\u672A\u5B8C\u6210\u6307\u5B9A\u5916\u639B\u7A0B\u5F0F\u8F09\u5165:{0} configparser.6=\u5916\u639B\u7A0B\u5F0F\u8F09\u5165\u5931\u6557,\u5B58\u5728\u91CD\u8907\u5916\u639B\u7A0B\u5F0F:{0} dataxserviceutil.1=\u5EFA\u7ACB\u7C3D\u540D\u7570\u5E38NoSuchAlgorithmException, [{0}] dataxserviceutil.2=\u5EFA\u7ACB\u7C3D\u540D\u7570\u5E38InvalidKeyException, [{0}] dataxserviceutil.3=\u5EFA\u7ACB\u7C3D\u540D\u7570\u5E38UnsupportedEncodingException, [{0}] errorrecordchecker.1=\u9AD2\u6578\u64DA\u767E\u5206\u6BD4\u9650\u5236\u61C9\u8A72\u5728[0.0, 1.0]\u4E4B\u9593 errorrecordchecker.2=\u9AD2\u6578\u64DA\u689D\u6578\u73FE\u5728\u61C9\u8A72\u70BA\u975E\u8CA0\u6574\u6578 errorrecordchecker.3=\u9AD2\u6578\u64DA\u689D\u6578\u6AA2\u67E5\u4E0D\u901A\u904E\uFF0C\u9650\u5236\u662F[{0}]\u689D\uFF0C\u4F46\u5BE6\u969B\u4E0A\u6355\u7372\u4E86[{1}]\u689D. errorrecordchecker.4=\u9AD2\u6578\u64DA\u767E\u5206\u6BD4\u6AA2\u67E5\u4E0D\u901A\u904E\uFF0C\u9650\u5236\u662F[{0}]\uFF0C\u4F46\u5BE6\u969B\u4E0A\u6355\u7372\u5230[{1}]. errorcode.install_error=DataX\u5F15\u64CE\u5B89\u88DD\u932F\u8AA4, \u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.argument_error=DataX\u5F15\u64CE\u904B\u884C\u932F\u8AA4\uFF0C\u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BC\u5167\u90E8\u7DE8\u7A0B\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61DataX\u958B\u767C\u5718\u968A\u89E3\u6C7A . errorcode.runtime_error=DataX\u5F15\u64CE\u904B\u884C\u904E\u7A0B\u51FA\u932F\uFF0C\u5177\u9AD4\u539F\u56E0\u8ACB\u53C3\u770BDataX\u904B\u884C\u7D50\u675F\u6642\u7684\u932F\u8AA4\u8A3A\u65B7\u8CC7\u8A0A . errorcode.config_error=DataX\u5F15\u64CE\u914D\u7F6E\u932F\u8AA4\uFF0C\u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.secret_error=DataX\u5F15\u64CE\u52A0\u89E3\u5BC6\u51FA\u932F\uFF0C\u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.hook_load_error=\u8F09\u5165\u5916\u90E8Hook\u51FA\u73FE\u932F\u8AA4\uFF0C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u5F15\u8D77\u7684 errorcode.hook_fail_error=\u57F7\u884C\u5916\u90E8Hook\u51FA\u73FE\u932F\u8AA4 errorcode.plugin_install_error=DataX\u5916\u639B\u7A0B\u5F0F\u5B89\u88DD\u932F\u8AA4, \u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.plugin_not_found=DataX\u5916\u639B\u7A0B\u5F0F\u914D\u7F6E\u932F\u8AA4, \u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.plugin_init_error=DataX\u5916\u639B\u7A0B\u5F0F\u521D\u59CB\u5316\u932F\u8AA4, \u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.plugin_runtime_error=DataX\u5916\u639B\u7A0B\u5F0F\u904B\u884C\u6642\u51FA\u932F, \u5177\u9AD4\u539F\u56E0\u8ACB\u53C3\u770BDataX\u904B\u884C\u7D50\u675F\u6642\u7684\u932F\u8AA4\u8A3A\u65B7\u8CC7\u8A0A . errorcode.plugin_dirty_data_limit_exceed=DataX\u50B3\u8F38\u9AD2\u6578\u64DA\u8D85\u904E\u7528\u6236\u9810\u671F\uFF0C\u8A72\u932F\u8AA4\u901A\u5E38\u662F\u7531\u65BC\u6E90\u7AEF\u6578\u64DA\u5B58\u5728\u8F03\u591A\u696D\u52D9\u9AD2\u6578\u64DA\u5C0E\u81F4\uFF0C\u8ACB\u4ED4\u7D30\u6AA2\u67E5DataX\u5F59\u5831\u7684\u9AD2\u6578\u64DA\u65E5\u8A8C\u8CC7\u8A0A, \u6216\u8005\u60A8\u53EF\u4EE5\u9069\u7576\u8ABF\u5927\u9AD2\u6578\u64DA\u95BE\u503C . errorcode.plugin_split_error=DataX\u5916\u639B\u7A0B\u5F0F\u5207\u5206\u51FA\u932F, \u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5404\u500B\u5916\u639B\u7A0B\u5F0F\u7DE8\u7A0B\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61DataX\u958B\u767C\u5718\u968A\u89E3\u6C7A errorcode.kill_job_timeout_error=kill \u4EFB\u52D9\u903E\u6642\uFF0C\u8ACB\u806F\u7D61PE\u89E3\u6C7A errorcode.start_taskgroup_error=taskGroup\u555F\u52D5\u5931\u6557,\u8ACB\u806F\u7D61DataX\u958B\u767C\u5718\u968A\u89E3\u6C7A errorcode.call_datax_service_failed=\u8ACB\u6C42 DataX Service \u51FA\u932F. errorcode.call_remote_failed=\u9060\u7A0B\u8ABF\u7528\u5931\u6557 errorcode.killed_exit_value=Job \u6536\u5230\u4E86 Kill \u547D\u4EE4. httpclientutil.1=\u8ACB\u6C42\u5730\u5740\uFF1A{0}, \u8ACB\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u9060\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C07\u91CD\u8A66 secretutil.1=\u7CFB\u7D71\u7DE8\u7A0B\u932F\u8AA4,\u4E0D\u652F\u63F4\u7684\u52A0\u5BC6\u985E\u578B secretutil.2=\u7CFB\u7D71\u7DE8\u7A0B\u932F\u8AA4,\u4E0D\u652F\u63F4\u7684\u52A0\u5BC6\u985E\u578B secretutil.3=rsa\u52A0\u5BC6\u51FA\u932F secretutil.4=rsa\u89E3\u5BC6\u51FA\u932F secretutil.5=3\u91CDDES\u52A0\u5BC6\u51FA\u932F secretutil.6=rsa\u89E3\u5BC6\u51FA\u932F secretutil.7=\u69CB\u5EFA\u4E09\u91CDDES\u5BC6\u5319\u51FA\u932F secretutil.8=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u7121\u6CD5\u627E\u5230\u5BC6\u9470\u7684\u914D\u7F6E\u6A94\u6848 secretutil.9=\u8B80\u53D6\u52A0\u89E3\u5BC6\u914D\u7F6E\u6A94\u6848\u51FA\u932F secretutil.10=DataX\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C\u70BA[{0}]\uFF0C\u4F46\u5728\u7CFB\u7D71\u4E2D\u6C92\u6709\u914D\u7F6E\uFF0C\u4EFB\u52D9\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C secretutil.11=DataX\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C\u70BA[{0}]\uFF0C\u4F46\u5728\u7CFB\u7D71\u4E2D\u6C92\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52D9\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7D71\u7DAD\u8B77\u554F\u984C secretutil.12=DataX\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C\u70BA[{0}]\uFF0C\u4F46\u5728\u7CFB\u7D71\u4E2D\u6C92\u6709\u914D\u7F6E\uFF0C\u4EFB\u52D9\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C secretutil.13=DataX\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C\u70BA[{0}]\uFF0C\u4F46\u5728\u7CFB\u7D71\u4E2D\u6C92\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52D9\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7D71\u7DAD\u8B77\u554F\u984C secretutil.14=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C[{0}]\u5B58\u5728\u5BC6\u9470\u70BA\u7A7A\u7684\u60C5\u6CC1 secretutil.15=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u516C\u79C1\u9470\u5C0D\u5B58\u5728\u70BA\u7A7A\u7684\u60C5\u6CC1\uFF0C\u7248\u672C[{0}] secretutil.16=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u7121\u6CD5\u627E\u5230\u52A0\u89E3\u5BC6\u914D\u7F6E ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/LocalStrings_zh_TW.properties ================================================ configparser.1=\u63D2\u4EF6[{0},{1}]\u52A0\u8F7D\u5931\u8D25\uFF0C1s\u540E\u91CD\u8BD5... Exception:{2} configparser.2=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.3=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.4=\u83B7\u53D6\u4F5C\u4E1A\u914D\u7F6E\u4FE1\u606F\u5931\u8D25:{0} configparser.5=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25\uFF0C\u672A\u5B8C\u6210\u6307\u5B9A\u63D2\u4EF6\u52A0\u8F7D:{0} configparser.6=\u63D2\u4EF6\u52A0\u8F7D\u5931\u8D25,\u5B58\u5728\u91CD\u590D\u63D2\u4EF6:{0} dataxserviceutil.1=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38NoSuchAlgorithmException, [{0}] dataxserviceutil.2=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38InvalidKeyException, [{0}] dataxserviceutil.3=\u521B\u5EFA\u7B7E\u540D\u5F02\u5E38UnsupportedEncodingException, [{0}] errorrecordchecker.1=\u810F\u6570\u636E\u767E\u5206\u6BD4\u9650\u5236\u5E94\u8BE5\u5728[0.0, 1.0]\u4E4B\u95F4 errorrecordchecker.2=\u810F\u6570\u636E\u6761\u6570\u73B0\u5728\u5E94\u8BE5\u4E3A\u975E\u8D1F\u6574\u6570 errorrecordchecker.3=\u810F\u6570\u636E\u6761\u6570\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\u6761\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u4E86[{1}]\u6761. errorrecordchecker.4=\u810F\u6570\u636E\u767E\u5206\u6BD4\u68C0\u67E5\u4E0D\u901A\u8FC7\uFF0C\u9650\u5236\u662F[{0}]\uFF0C\u4F46\u5B9E\u9645\u4E0A\u6355\u83B7\u5230[{1}]. errorcode.install_error=DataX\u5F15\u64CE\u5B89\u88C5\u9519\u8BEF, \u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.argument_error=DataX\u5F15\u64CE\u8FD0\u884C\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8E\u5185\u90E8\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 . errorcode.runtime_error=DataX\u5F15\u64CE\u8FD0\u884C\u8FC7\u7A0B\u51FA\u9519\uFF0C\u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.config_error=DataX\u5F15\u64CE\u914D\u7F6E\u9519\u8BEF\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.secret_error=DataX\u5F15\u64CE\u52A0\u89E3\u5BC6\u51FA\u9519\uFF0C\u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.hook_load_error=\u52A0\u8F7D\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF\uFF0C\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u5F15\u8D77\u7684 errorcode.hook_fail_error=\u6267\u884C\u5916\u90E8Hook\u51FA\u73B0\u9519\u8BEF errorcode.plugin_install_error=DataX\u63D2\u4EF6\u5B89\u88C5\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_not_found=DataX\u63D2\u4EF6\u914D\u7F6E\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_init_error=DataX\u63D2\u4EF6\u521D\u59CB\u5316\u9519\u8BEF, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5B89\u88C5\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFB\u60A8\u7684\u8FD0\u7EF4\u89E3\u51B3 . errorcode.plugin_runtime_error=DataX\u63D2\u4EF6\u8FD0\u884C\u65F6\u51FA\u9519, \u5177\u4F53\u539F\u56E0\u8BF7\u53C2\u770BDataX\u8FD0\u884C\u7ED3\u675F\u65F6\u7684\u9519\u8BEF\u8BCA\u65AD\u4FE1\u606F . errorcode.plugin_dirty_data_limit_exceed=DataX\u4F20\u8F93\u810F\u6570\u636E\u8D85\u8FC7\u7528\u6237\u9884\u671F\uFF0C\u8BE5\u9519\u8BEF\u901A\u5E38\u662F\u7531\u4E8E\u6E90\u7AEF\u6570\u636E\u5B58\u5728\u8F83\u591A\u4E1A\u52A1\u810F\u6570\u636E\u5BFC\u81F4\uFF0C\u8BF7\u4ED4\u7EC6\u68C0\u67E5DataX\u6C47\u62A5\u7684\u810F\u6570\u636E\u65E5\u5FD7\u4FE1\u606F, \u6216\u8005\u60A8\u53EF\u4EE5\u9002\u5F53\u8C03\u5927\u810F\u6570\u636E\u9608\u503C . errorcode.plugin_split_error=DataX\u63D2\u4EF6\u5207\u5206\u51FA\u9519, \u8BE5\u95EE\u9898\u901A\u5E38\u662F\u7531\u4E8EDataX\u5404\u4E2A\u63D2\u4EF6\u7F16\u7A0B\u9519\u8BEF\u5F15\u8D77\uFF0C\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.kill_job_timeout_error=kill \u4EFB\u52A1\u8D85\u65F6\uFF0C\u8BF7\u8054\u7CFBPE\u89E3\u51B3 errorcode.start_taskgroup_error=taskGroup\u542F\u52A8\u5931\u8D25,\u8BF7\u8054\u7CFBDataX\u5F00\u53D1\u56E2\u961F\u89E3\u51B3 errorcode.call_datax_service_failed=\u8BF7\u6C42 DataX Service \u51FA\u9519. errorcode.call_remote_failed=\u8FDC\u7A0B\u8C03\u7528\u5931\u8D25 errorcode.killed_exit_value=Job \u6536\u5230\u4E86 Kill \u547D\u4EE4. httpclientutil.1=\u8BF7\u6C42\u5730\u5740\uFF1A{0}, \u8BF7\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u8FDC\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C06\u91CD\u8BD5 secretutil.1=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.2=\u7CFB\u7EDF\u7F16\u7A0B\u9519\u8BEF,\u4E0D\u652F\u6301\u7684\u52A0\u5BC6\u7C7B\u578B secretutil.3=rsa\u52A0\u5BC6\u51FA\u9519 secretutil.4=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.5=3\u91CDDES\u52A0\u5BC6\u51FA\u9519 secretutil.6=rsa\u89E3\u5BC6\u51FA\u9519 secretutil.7=\u6784\u5EFA\u4E09\u91CDDES\u5BC6\u5319\u51FA\u9519 secretutil.8=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u5BC6\u94A5\u7684\u914D\u7F6E\u6587\u4EF6 secretutil.9=\u8BFB\u53D6\u52A0\u89E3\u5BC6\u914D\u7F6E\u6587\u4EF6\u51FA\u9519 secretutil.10=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.11=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.12=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C secretutil.13=DataX\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C\u4E3A[{0}]\uFF0C\u4F46\u5728\u7CFB\u7EDF\u4E2D\u6CA1\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52A1\u5BC6\u94A5\u914D\u7F6E\u9519\u8BEF\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7EDF\u7EF4\u62A4\u95EE\u9898 secretutil.14=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u5BC6\u94A5\u7248\u672C[{0}]\u5B58\u5728\u5BC6\u94A5\u4E3A\u7A7A\u7684\u60C5\u51B5 secretutil.15=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u516C\u79C1\u94A5\u5BF9\u5B58\u5728\u4E3A\u7A7A\u7684\u60C5\u51B5\uFF0C\u7248\u672C[{0}] secretutil.16=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u65E0\u6CD5\u627E\u5230\u52A0\u89E3\u5BC6\u914D\u7F6E configparser.1=\u5916\u639B\u7A0B\u5F0F[{0},{1}]\u8F09\u5165\u5931\u6557\uFF0C1s\u5F8C\u91CD\u8A66... Exception:{2} configparser.2=\u7372\u53D6\u4F5C\u696D\u914D\u7F6E\u8CC7\u8A0A\u5931\u6557:{0} configparser.3=\u7372\u53D6\u4F5C\u696D\u914D\u7F6E\u8CC7\u8A0A\u5931\u6557:{0} configparser.4=\u7372\u53D6\u4F5C\u696D\u914D\u7F6E\u8CC7\u8A0A\u5931\u6557:{0} configparser.5=\u5916\u639B\u7A0B\u5F0F\u8F09\u5165\u5931\u6557\uFF0C\u672A\u5B8C\u6210\u6307\u5B9A\u5916\u639B\u7A0B\u5F0F\u8F09\u5165:{0} configparser.6=\u5916\u639B\u7A0B\u5F0F\u8F09\u5165\u5931\u6557,\u5B58\u5728\u91CD\u8907\u5916\u639B\u7A0B\u5F0F:{0} dataxserviceutil.1=\u5EFA\u7ACB\u7C3D\u540D\u7570\u5E38NoSuchAlgorithmException, [{0}] dataxserviceutil.2=\u5EFA\u7ACB\u7C3D\u540D\u7570\u5E38InvalidKeyException, [{0}] dataxserviceutil.3=\u5EFA\u7ACB\u7C3D\u540D\u7570\u5E38UnsupportedEncodingException, [{0}] errorrecordchecker.1=\u9AD2\u6578\u64DA\u767E\u5206\u6BD4\u9650\u5236\u61C9\u8A72\u5728[0.0, 1.0]\u4E4B\u9593 errorrecordchecker.2=\u9AD2\u6578\u64DA\u689D\u6578\u73FE\u5728\u61C9\u8A72\u70BA\u975E\u8CA0\u6574\u6578 errorrecordchecker.3=\u9AD2\u6578\u64DA\u689D\u6578\u6AA2\u67E5\u4E0D\u901A\u904E\uFF0C\u9650\u5236\u662F[{0}]\u689D\uFF0C\u4F46\u5BE6\u969B\u4E0A\u6355\u7372\u4E86[{1}]\u689D. errorrecordchecker.4=\u9AD2\u6578\u64DA\u767E\u5206\u6BD4\u6AA2\u67E5\u4E0D\u901A\u904E\uFF0C\u9650\u5236\u662F[{0}]\uFF0C\u4F46\u5BE6\u969B\u4E0A\u6355\u7372\u5230[{1}]. errorcode.install_error=DataX\u5F15\u64CE\u5B89\u88DD\u932F\u8AA4, \u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.argument_error=DataX\u5F15\u64CE\u904B\u884C\u932F\u8AA4\uFF0C\u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BC\u5167\u90E8\u7DE8\u7A0B\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61DataX\u958B\u767C\u5718\u968A\u89E3\u6C7A . errorcode.runtime_error=DataX\u5F15\u64CE\u904B\u884C\u904E\u7A0B\u51FA\u932F\uFF0C\u5177\u9AD4\u539F\u56E0\u8ACB\u53C3\u770BDataX\u904B\u884C\u7D50\u675F\u6642\u7684\u932F\u8AA4\u8A3A\u65B7\u8CC7\u8A0A . errorcode.config_error=DataX\u5F15\u64CE\u914D\u7F6E\u932F\u8AA4\uFF0C\u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.secret_error=DataX\u5F15\u64CE\u52A0\u89E3\u5BC6\u51FA\u932F\uFF0C\u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.hook_load_error=\u8F09\u5165\u5916\u90E8Hook\u51FA\u73FE\u932F\u8AA4\uFF0C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u5F15\u8D77\u7684 errorcode.hook_fail_error=\u57F7\u884C\u5916\u90E8Hook\u51FA\u73FE\u932F\u8AA4 errorcode.plugin_install_error=DataX\u5916\u639B\u7A0B\u5F0F\u5B89\u88DD\u932F\u8AA4, \u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.plugin_not_found=DataX\u5916\u639B\u7A0B\u5F0F\u914D\u7F6E\u932F\u8AA4, \u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.plugin_init_error=DataX\u5916\u639B\u7A0B\u5F0F\u521D\u59CB\u5316\u932F\u8AA4, \u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5B89\u88DD\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61\u60A8\u7684\u904B\u7DAD\u89E3\u6C7A . errorcode.plugin_runtime_error=DataX\u5916\u639B\u7A0B\u5F0F\u904B\u884C\u6642\u51FA\u932F, \u5177\u9AD4\u539F\u56E0\u8ACB\u53C3\u770BDataX\u904B\u884C\u7D50\u675F\u6642\u7684\u932F\u8AA4\u8A3A\u65B7\u8CC7\u8A0A . errorcode.plugin_dirty_data_limit_exceed=DataX\u50B3\u8F38\u9AD2\u6578\u64DA\u8D85\u904E\u7528\u6236\u9810\u671F\uFF0C\u8A72\u932F\u8AA4\u901A\u5E38\u662F\u7531\u65BC\u6E90\u7AEF\u6578\u64DA\u5B58\u5728\u8F03\u591A\u696D\u52D9\u9AD2\u6578\u64DA\u5C0E\u81F4\uFF0C\u8ACB\u4ED4\u7D30\u6AA2\u67E5DataX\u5F59\u5831\u7684\u9AD2\u6578\u64DA\u65E5\u8A8C\u8CC7\u8A0A, \u6216\u8005\u60A8\u53EF\u4EE5\u9069\u7576\u8ABF\u5927\u9AD2\u6578\u64DA\u95BE\u503C . errorcode.plugin_split_error=DataX\u5916\u639B\u7A0B\u5F0F\u5207\u5206\u51FA\u932F, \u8A72\u554F\u984C\u901A\u5E38\u662F\u7531\u65BCDataX\u5404\u500B\u5916\u639B\u7A0B\u5F0F\u7DE8\u7A0B\u932F\u8AA4\u5F15\u8D77\uFF0C\u8ACB\u806F\u7D61DataX\u958B\u767C\u5718\u968A\u89E3\u6C7A errorcode.kill_job_timeout_error=kill \u4EFB\u52D9\u903E\u6642\uFF0C\u8ACB\u806F\u7D61PE\u89E3\u6C7A errorcode.start_taskgroup_error=taskGroup\u555F\u52D5\u5931\u6557,\u8ACB\u806F\u7D61DataX\u958B\u767C\u5718\u968A\u89E3\u6C7A errorcode.call_datax_service_failed=\u8ACB\u6C42 DataX Service \u51FA\u932F. errorcode.call_remote_failed=\u9060\u7A0B\u8ABF\u7528\u5931\u6557 errorcode.killed_exit_value=Job \u6536\u5230\u4E86 Kill \u547D\u4EE4. httpclientutil.1=\u8ACB\u6C42\u5730\u5740\uFF1A{0}, \u8ACB\u6C42\u65B9\u6CD5\uFF1A{1},STATUS CODE = {2}, Response Entity: {3} httpclientutil.2=\u9060\u7A0B\u63A5\u53E3\u8FD4\u56DE-1,\u5C07\u91CD\u8A66 secretutil.1=\u7CFB\u7D71\u7DE8\u7A0B\u932F\u8AA4,\u4E0D\u652F\u63F4\u7684\u52A0\u5BC6\u985E\u578B secretutil.2=\u7CFB\u7D71\u7DE8\u7A0B\u932F\u8AA4,\u4E0D\u652F\u63F4\u7684\u52A0\u5BC6\u985E\u578B secretutil.3=rsa\u52A0\u5BC6\u51FA\u932F secretutil.4=rsa\u89E3\u5BC6\u51FA\u932F secretutil.5=3\u91CDDES\u52A0\u5BC6\u51FA\u932F secretutil.6=rsa\u89E3\u5BC6\u51FA\u932F secretutil.7=\u69CB\u5EFA\u4E09\u91CDDES\u5BC6\u5319\u51FA\u932F secretutil.8=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u7121\u6CD5\u627E\u5230\u5BC6\u9470\u7684\u914D\u7F6E\u6A94\u6848 secretutil.9=\u8B80\u53D6\u52A0\u89E3\u5BC6\u914D\u7F6E\u6A94\u6848\u51FA\u932F secretutil.10=DataX\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C\u70BA[{0}]\uFF0C\u4F46\u5728\u7CFB\u7D71\u4E2D\u6C92\u6709\u914D\u7F6E\uFF0C\u4EFB\u52D9\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C secretutil.11=DataX\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C\u70BA[{0}]\uFF0C\u4F46\u5728\u7CFB\u7D71\u4E2D\u6C92\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52D9\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7D71\u7DAD\u8B77\u554F\u984C secretutil.12=DataX\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C\u70BA[{0}]\uFF0C\u4F46\u5728\u7CFB\u7D71\u4E2D\u6C92\u6709\u914D\u7F6E\uFF0C\u4EFB\u52D9\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\uFF0C\u4E0D\u5B58\u5728\u60A8\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C secretutil.13=DataX\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C\u70BA[{0}]\uFF0C\u4F46\u5728\u7CFB\u7D71\u4E2D\u6C92\u6709\u914D\u7F6E\uFF0C\u53EF\u80FD\u662F\u4EFB\u52D9\u5BC6\u9470\u914D\u7F6E\u932F\u8AA4\uFF0C\u4E5F\u53EF\u80FD\u662F\u7CFB\u7D71\u7DAD\u8B77\u554F\u984C secretutil.14=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u5BC6\u9470\u7248\u672C[{0}]\u5B58\u5728\u5BC6\u9470\u70BA\u7A7A\u7684\u60C5\u6CC1 secretutil.15=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u914D\u7F6E\u7684\u516C\u79C1\u9470\u5C0D\u5B58\u5728\u70BA\u7A7A\u7684\u60C5\u6CC1\uFF0C\u7248\u672C[{0}] secretutil.16=DataX\u914D\u7F6E\u8981\u6C42\u52A0\u89E3\u5BC6\uFF0C\u4F46\u7121\u6CD5\u627E\u5230\u52A0\u89E3\u5BC6\u914D\u7F6E ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/SecretUtil.java ================================================ package com.alibaba.datax.core.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.util.container.CoreConstant; import org.apache.commons.codec.binary.Base64; import org.apache.commons.lang.StringUtils; import org.apache.commons.lang3.tuple.ImmutableTriple; import org.apache.commons.lang3.tuple.Triple; import javax.crypto.Cipher; import javax.crypto.SecretKey; import javax.crypto.spec.SecretKeySpec; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStream; import java.security.Key; import java.security.KeyFactory; import java.security.KeyPair; import java.security.KeyPairGenerator; import java.security.interfaces.RSAPrivateKey; import java.security.interfaces.RSAPublicKey; import java.security.spec.PKCS8EncodedKeySpec; import java.security.spec.X509EncodedKeySpec; import java.util.HashMap; import java.util.Map; import java.util.Properties; /** * Created by jingxing on 14/12/15. */ public class SecretUtil { private static Properties properties; //RSA Key:keyVersion value:left:privateKey, right:publicKey, middle: type //DESede Key: keyVersion value:left:keyContent, right:keyContent, middle: type private static Map> versionKeyMap; private static final String ENCODING = "UTF-8"; public static final String KEY_ALGORITHM_RSA = "RSA"; public static final String KEY_ALGORITHM_3DES = "DESede"; private static final String CIPHER_ALGORITHM_3DES = "DESede/ECB/PKCS5Padding"; private static final Base64 base64 = new Base64(); /** * BASE64加密 * * @param plaintextBytes * @return * @throws Exception */ public static String encryptBASE64(byte[] plaintextBytes) throws Exception { return new String(base64.encode(plaintextBytes), ENCODING); } /** * BASE64解密 * * @param cipherText * @return * @throws Exception */ public static byte[] decryptBASE64(String cipherText) { return base64.decode(cipherText); } /** * 加密
* @param data 裸的原始数据 * @param key 经过base64加密的公钥(RSA)或者裸密钥(3DES) * */ public static String encrypt(String data, String key, String method) { if (SecretUtil.KEY_ALGORITHM_RSA.equals(method)) { return SecretUtil.encryptRSA(data, key); } else if (SecretUtil.KEY_ALGORITHM_3DES.equals(method)) { return SecretUtil.encrypt3DES(data, key); } else { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, String.format("系统编程错误,不支持的加密类型", method)); } } /** * 解密
* @param data 已经经过base64加密的密文 * @param key 已经经过base64加密私钥(RSA)或者裸密钥(3DES) * */ public static String decrypt(String data, String key, String method) { if (SecretUtil.KEY_ALGORITHM_RSA.equals(method)) { return SecretUtil.decryptRSA(data, key); } else if (SecretUtil.KEY_ALGORITHM_3DES.equals(method)) { return SecretUtil.decrypt3DES(data, key); } else { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, String.format("系统编程错误,不支持的加密类型", method)); } } /** * 加密
* 用公钥加密 encryptByPublicKey * * @param data 裸的原始数据 * @param key 经过base64加密的公钥 * @return 结果也采用base64加密 * @throws Exception */ public static String encryptRSA(String data, String key) { try { // 对公钥解密,公钥被base64加密过 byte[] keyBytes = decryptBASE64(key); // 取得公钥 X509EncodedKeySpec x509KeySpec = new X509EncodedKeySpec(keyBytes); KeyFactory keyFactory = KeyFactory.getInstance(KEY_ALGORITHM_RSA); Key publicKey = keyFactory.generatePublic(x509KeySpec); // 对数据加密 Cipher cipher = Cipher.getInstance(keyFactory.getAlgorithm()); cipher.init(Cipher.ENCRYPT_MODE, publicKey); return encryptBASE64(cipher.doFinal(data.getBytes(ENCODING))); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, "rsa加密出错", e); } } /** * 解密
* 用私钥解密 * * @param data 已经经过base64加密的密文 * @param key 已经经过base64加密私钥 * @return * @throws Exception */ public static String decryptRSA(String data, String key) { try { // 对密钥解密 byte[] keyBytes = decryptBASE64(key); // 取得私钥 PKCS8EncodedKeySpec pkcs8KeySpec = new PKCS8EncodedKeySpec(keyBytes); KeyFactory keyFactory = KeyFactory.getInstance(KEY_ALGORITHM_RSA); Key privateKey = keyFactory.generatePrivate(pkcs8KeySpec); // 对数据解密 Cipher cipher = Cipher.getInstance(keyFactory.getAlgorithm()); cipher.init(Cipher.DECRYPT_MODE, privateKey); return new String(cipher.doFinal(decryptBASE64(data)), ENCODING); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, "rsa解密出错", e); } } /** * 初始化密钥 for RSA ALGORITHM * * @return * @throws Exception */ public static String[] initKey() throws Exception { KeyPairGenerator keyPairGen = KeyPairGenerator .getInstance(KEY_ALGORITHM_RSA); keyPairGen.initialize(1024); KeyPair keyPair = keyPairGen.generateKeyPair(); // 公钥 RSAPublicKey publicKey = (RSAPublicKey) keyPair.getPublic(); // 私钥 RSAPrivateKey privateKey = (RSAPrivateKey) keyPair.getPrivate(); String[] publicAndPrivateKey = { encryptBASE64(publicKey.getEncoded()), encryptBASE64(privateKey.getEncoded())}; return publicAndPrivateKey; } /** * 加密 DESede
* 用密钥加密 * * @param data 裸的原始数据 * @param key 加密的密钥 * @return 结果也采用base64加密 * @throws Exception */ public static String encrypt3DES(String data, String key) { try { // 生成密钥 SecretKey desKey = new SecretKeySpec(build3DesKey(key), KEY_ALGORITHM_3DES); // 对数据加密 Cipher cipher = Cipher.getInstance(CIPHER_ALGORITHM_3DES); cipher.init(Cipher.ENCRYPT_MODE, desKey); return encryptBASE64(cipher.doFinal(data.getBytes(ENCODING))); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, "3重DES加密出错", e); } } /** * 解密
* 用密钥解密 * * @param data 已经经过base64加密的密文 * @param key 解密的密钥 * @return * @throws Exception */ public static String decrypt3DES(String data, String key) { try { // 生成密钥 SecretKey desKey = new SecretKeySpec(build3DesKey(key), KEY_ALGORITHM_3DES); // 对数据解密 Cipher cipher = Cipher.getInstance(CIPHER_ALGORITHM_3DES); cipher.init(Cipher.DECRYPT_MODE, desKey); return new String(cipher.doFinal(decryptBASE64(data)), ENCODING); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, "rsa解密出错", e); } } /** * 根据字符串生成密钥字节数组 * * @param keyStr * 密钥字符串 * @return key 符合DESede标准的24byte数组 */ private static byte[] build3DesKey(String keyStr) { try { // 声明一个24位的字节数组,默认里面都是0,warn: 字符串0(48)和数组默认值0不一样,统一字符串0(48) byte[] key = "000000000000000000000000".getBytes(ENCODING); byte[] temp = keyStr.getBytes(ENCODING); if (key.length > temp.length) { // 如果temp不够24位,则拷贝temp数组整个长度的内容到key数组中 System.arraycopy(temp, 0, key, 0, temp.length); } else { // 如果temp大于24位,则拷贝temp数组24个长度的内容到key数组中 System.arraycopy(temp, 0, key, 0, key.length); } return key; } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, "构建三重DES密匙出错", e); } } public static synchronized Properties getSecurityProperties() { if (properties == null) { InputStream secretStream = null; try { secretStream = new FileInputStream( CoreConstant.DATAX_SECRET_PATH); } catch (FileNotFoundException e) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, "DataX配置要求加解密,但无法找到密钥的配置文件"); } properties = new Properties(); try { properties.load(secretStream); secretStream.close(); } catch (IOException e) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, "读取加解密配置文件出错", e); } } return properties; } public static Configuration encryptSecretKey(Configuration configuration) { String keyVersion = configuration .getString(CoreConstant.DATAX_JOB_SETTING_KEYVERSION); // 没有设置keyVersion,表示不用解密 if (StringUtils.isBlank(keyVersion)) { return configuration; } Map> versionKeyMap = getPrivateKeyMap(); if (null == versionKeyMap.get(keyVersion)) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, String.format("DataX配置的密钥版本为[%s],但在系统中没有配置,任务密钥配置错误,不存在您配置的密钥版本", keyVersion)); } String key = versionKeyMap.get(keyVersion).getRight(); String method = versionKeyMap.get(keyVersion).getMiddle(); // keyVersion要求的私钥没有配置 if (StringUtils.isBlank(key)) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, String.format("DataX配置的密钥版本为[%s],但在系统中没有配置,可能是任务密钥配置错误,也可能是系统维护问题", keyVersion)); } String tempEncrptedData = null; for (String path : configuration.getSecretKeyPathSet()) { tempEncrptedData = SecretUtil.encrypt(configuration.getString(path), key, method); int lastPathIndex = path.lastIndexOf(".") + 1; String lastPathKey = path.substring(lastPathIndex); String newPath = path.substring(0, lastPathIndex) + "*" + lastPathKey; configuration.set(newPath, tempEncrptedData); configuration.remove(path); } return configuration; } public static Configuration decryptSecretKey(Configuration config) { String keyVersion = config .getString(CoreConstant.DATAX_JOB_SETTING_KEYVERSION); // 没有设置keyVersion,表示不用解密 if (StringUtils.isBlank(keyVersion)) { return config; } Map> versionKeyMap = getPrivateKeyMap(); if (null == versionKeyMap.get(keyVersion)) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, String.format("DataX配置的密钥版本为[%s],但在系统中没有配置,任务密钥配置错误,不存在您配置的密钥版本", keyVersion)); } String decryptKey = versionKeyMap.get(keyVersion).getLeft(); String method = versionKeyMap.get(keyVersion).getMiddle(); // keyVersion要求的私钥没有配置 if (StringUtils.isBlank(decryptKey)) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, String.format("DataX配置的密钥版本为[%s],但在系统中没有配置,可能是任务密钥配置错误,也可能是系统维护问题", keyVersion)); } // 对包含*号key解密处理 for (String key : config.getKeys()) { int lastPathIndex = key.lastIndexOf(".") + 1; String lastPathKey = key.substring(lastPathIndex); if (lastPathKey.length() > 1 && lastPathKey.charAt(0) == '*' && lastPathKey.charAt(1) != '*') { Object value = config.get(key); if (value instanceof String) { String newKey = key.substring(0, lastPathIndex) + lastPathKey.substring(1); config.set(newKey, SecretUtil.decrypt((String) value, decryptKey, method)); config.addSecretKeyPath(newKey); config.remove(key); } } } return config; } private static synchronized Map> getPrivateKeyMap() { if (versionKeyMap == null) { versionKeyMap = new HashMap>(); Properties properties = SecretUtil.getSecurityProperties(); String[] serviceUsernames = new String[] { CoreConstant.LAST_SERVICE_USERNAME, CoreConstant.CURRENT_SERVICE_USERNAME }; String[] servicePasswords = new String[] { CoreConstant.LAST_SERVICE_PASSWORD, CoreConstant.CURRENT_SERVICE_PASSWORD }; for (int i = 0; i < serviceUsernames.length; i++) { String serviceUsername = properties .getProperty(serviceUsernames[i]); if (StringUtils.isNotBlank(serviceUsername)) { String servicePassword = properties .getProperty(servicePasswords[i]); if (StringUtils.isNotBlank(servicePassword)) { versionKeyMap.put(serviceUsername, ImmutableTriple.of( servicePassword, SecretUtil.KEY_ALGORITHM_3DES, servicePassword)); } else { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, String.format( "DataX配置要求加解密,但配置的密钥版本[%s]存在密钥为空的情况", serviceUsername)); } } } String[] keyVersions = new String[] { CoreConstant.LAST_KEYVERSION, CoreConstant.CURRENT_KEYVERSION }; String[] privateKeys = new String[] { CoreConstant.LAST_PRIVATEKEY, CoreConstant.CURRENT_PRIVATEKEY }; String[] publicKeys = new String[] { CoreConstant.LAST_PUBLICKEY, CoreConstant.CURRENT_PUBLICKEY }; for (int i = 0; i < keyVersions.length; i++) { String keyVersion = properties.getProperty(keyVersions[i]); if (StringUtils.isNotBlank(keyVersion)) { String privateKey = properties.getProperty(privateKeys[i]); String publicKey = properties.getProperty(publicKeys[i]); if (StringUtils.isNotBlank(privateKey) && StringUtils.isNotBlank(publicKey)) { versionKeyMap.put(keyVersion, ImmutableTriple.of( privateKey, SecretUtil.KEY_ALGORITHM_RSA, publicKey)); } else { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, String.format( "DataX配置要求加解密,但配置的公私钥对存在为空的情况,版本[%s]", keyVersion)); } } } } if (versionKeyMap.size() <= 0) { throw DataXException.asDataXException( FrameworkErrorCode.SECRET_ERROR, "DataX配置要求加解密,但无法找到加解密配置"); } return versionKeyMap; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/TransformerUtil.java ================================================ package com.alibaba.datax.core.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.transport.transformer.*; import com.alibaba.datax.core.util.container.CoreConstant; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; /** * no comments. * Created by liqiang on 16/3/9. */ public class TransformerUtil { private static final Logger LOG = LoggerFactory.getLogger(TransformerUtil.class); public static List buildTransformerInfo(Configuration taskConfig) { List tfConfigs = taskConfig.getListConfiguration(CoreConstant.JOB_TRANSFORMER); if (tfConfigs == null || tfConfigs.size() == 0) { return null; } List result = new ArrayList(); List functionNames = new ArrayList(); for (Configuration configuration : tfConfigs) { String functionName = configuration.getString("name"); if (StringUtils.isEmpty(functionName)) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_CONFIGURATION_ERROR, "config=" + configuration.toJSON()); } if (functionName.equals("dx_groovy") && functionNames.contains("dx_groovy")) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_CONFIGURATION_ERROR, "dx_groovy can be invoke once only."); } functionNames.add(functionName); } /** * 延迟load 第三方插件的function,并按需load */ LOG.info(String.format(" user config tranformers [%s], loading...", functionNames)); TransformerRegistry.loadTransformerFromLocalStorage(functionNames); int i = 0; for (Configuration configuration : tfConfigs) { String functionName = configuration.getString("name"); TransformerInfo transformerInfo = TransformerRegistry.getTransformer(functionName); if (transformerInfo == null) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_NOTFOUND_ERROR, "name=" + functionName); } /** * 具体的UDF对应一个paras */ TransformerExecutionParas transformerExecutionParas = new TransformerExecutionParas(); /** * groovy function仅仅只有code */ if (!functionName.equals("dx_groovy") && !functionName.equals("dx_fackGroovy")) { Integer columnIndex = configuration.getInt(CoreConstant.TRANSFORMER_PARAMETER_COLUMNINDEX); if (columnIndex == null) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "columnIndex must be set by UDF:name=" + functionName); } transformerExecutionParas.setColumnIndex(columnIndex); List paras = configuration.getList(CoreConstant.TRANSFORMER_PARAMETER_PARAS, String.class); if (paras != null && paras.size() > 0) { transformerExecutionParas.setParas(paras.toArray(new String[0])); } } else { String code = configuration.getString(CoreConstant.TRANSFORMER_PARAMETER_CODE); if (StringUtils.isEmpty(code)) { throw DataXException.asDataXException(TransformerErrorCode.TRANSFORMER_ILLEGAL_PARAMETER, "groovy code must be set by UDF:name=" + functionName); } transformerExecutionParas.setCode(code); List extraPackage = configuration.getList(CoreConstant.TRANSFORMER_PARAMETER_EXTRAPACKAGE, String.class); if (extraPackage != null && extraPackage.size() > 0) { transformerExecutionParas.setExtraPackage(extraPackage); } } transformerExecutionParas.settContext(configuration.getMap(CoreConstant.TRANSFORMER_PARAMETER_CONTEXT)); TransformerExecution transformerExecution = new TransformerExecution(transformerInfo, transformerExecutionParas); transformerExecution.genFinalParas(); result.add(transformerExecution); i++; LOG.info(String.format(" %s of transformer init success. name=%s, isNative=%s parameter = %s" , i, transformerInfo.getTransformer().getTransformerName() , transformerInfo.isNative(), configuration.getConfiguration("parameter"))); } return result; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/container/ClassLoaderSwapper.java ================================================ package com.alibaba.datax.core.util.container; /** * Created by jingxing on 14-8-29. * * 为避免jar冲突,比如hbase可能有多个版本的读写依赖jar包,JobContainer和TaskGroupContainer * 就需要脱离当前classLoader去加载这些jar包,执行完成后,又退回到原来classLoader上继续执行接下来的代码 */ public final class ClassLoaderSwapper { private ClassLoader storeClassLoader = null; private ClassLoaderSwapper() { } public static ClassLoaderSwapper newCurrentThreadClassLoaderSwapper() { return new ClassLoaderSwapper(); } /** * 保存当前classLoader,并将当前线程的classLoader设置为所给classLoader * * @param * @return */ public ClassLoader setCurrentThreadClassLoader(ClassLoader classLoader) { this.storeClassLoader = Thread.currentThread().getContextClassLoader(); Thread.currentThread().setContextClassLoader(classLoader); return this.storeClassLoader; } /** * 将当前线程的类加载器设置为保存的类加载 * @return */ public ClassLoader restoreCurrentThreadClassLoader() { ClassLoader classLoader = Thread.currentThread() .getContextClassLoader(); Thread.currentThread().setContextClassLoader(this.storeClassLoader); return classLoader; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/container/CoreConstant.java ================================================ package com.alibaba.datax.core.util.container; import org.apache.commons.lang.StringUtils; import java.io.File; /** * Created by jingxing on 14-8-25. */ public class CoreConstant { // --------------------------- 全局使用的变量(最好按照逻辑顺序,调整下成员变量顺序) // -------------------------------- public static final String DATAX_CORE_CONTAINER_TASKGROUP_CHANNEL = "core.container.taskGroup.channel"; public static final String DATAX_CORE_CONTAINER_MODEL = "core.container.model"; public static final String DATAX_CORE_CONTAINER_JOB_ID = "core.container.job.id"; public static final String DATAX_CORE_CONTAINER_TRACE_ENABLE = "core.container.trace.enable"; public static final String DATAX_CORE_CONTAINER_JOB_MODE = "core.container.job.mode"; public static final String DATAX_CORE_CONTAINER_JOB_REPORTINTERVAL = "core.container.job.reportInterval"; public static final String DATAX_CORE_CONTAINER_JOB_SLEEPINTERVAL = "core.container.job.sleepInterval"; public static final String DATAX_CORE_CONTAINER_TASKGROUP_ID = "core.container.taskGroup.id"; public static final String DATAX_CORE_CONTAINER_TASKGROUP_SLEEPINTERVAL = "core.container.taskGroup.sleepInterval"; public static final String DATAX_CORE_CONTAINER_TASKGROUP_REPORTINTERVAL = "core.container.taskGroup.reportInterval"; public static final String DATAX_CORE_CONTAINER_TASK_FAILOVER_MAXRETRYTIMES = "core.container.task.failOver.maxRetryTimes"; public static final String DATAX_CORE_CONTAINER_TASK_FAILOVER_RETRYINTERVALINMSEC = "core.container.task.failOver.retryIntervalInMsec"; public static final String DATAX_CORE_CONTAINER_TASK_FAILOVER_MAXWAITINMSEC = "core.container.task.failOver.maxWaitInMsec"; public static final String DATAX_CORE_DATAXSERVER_ADDRESS = "core.dataXServer.address"; public static final String DATAX_CORE_DSC_ADDRESS = "core.dsc.address"; public static final String DATAX_CORE_DATAXSERVER_TIMEOUT = "core.dataXServer.timeout"; public static final String DATAX_CORE_REPORT_DATAX_LOG = "core.dataXServer.reportDataxLog"; public static final String DATAX_CORE_REPORT_DATAX_PERFLOG = "core.dataXServer.reportPerfLog"; public static final String DATAX_CORE_TRANSPORT_CHANNEL_CLASS = "core.transport.channel.class"; public static final String DATAX_CORE_TRANSPORT_CHANNEL_CAPACITY = "core.transport.channel.capacity"; public static final String DATAX_CORE_TRANSPORT_CHANNEL_CAPACITY_BYTE = "core.transport.channel.byteCapacity"; public static final String DATAX_CORE_TRANSPORT_CHANNEL_SPEED_BYTE = "core.transport.channel.speed.byte"; public static final String DATAX_CORE_TRANSPORT_CHANNEL_SPEED_RECORD = "core.transport.channel.speed.record"; public static final String DATAX_CORE_TRANSPORT_CHANNEL_FLOWCONTROLINTERVAL = "core.transport.channel.flowControlInterval"; public static final String DATAX_CORE_TRANSPORT_EXCHANGER_BUFFERSIZE = "core.transport.exchanger.bufferSize"; public static final String DATAX_CORE_TRANSPORT_RECORD_CLASS = "core.transport.record.class"; public static final String DATAX_CORE_STATISTICS_COLLECTOR_PLUGIN_TASKCLASS = "core.statistics.collector.plugin.taskClass"; public static final String DATAX_CORE_STATISTICS_COLLECTOR_PLUGIN_MAXDIRTYNUM = "core.statistics.collector.plugin.maxDirtyNumber"; public static final String DATAX_JOB_CONTENT_READER_NAME = "job.content[0].reader.name"; public static final String DATAX_JOB_CONTENT_READER_PARAMETER = "job.content[0].reader.parameter"; public static final String DATAX_JOB_CONTENT_WRITER_NAME = "job.content[0].writer.name"; public static final String DATAX_JOB_CONTENT_WRITER_PARAMETER = "job.content[0].writer.parameter"; public static final String DATAX_JOB_JOBINFO = "job.jobInfo"; public static final String DATAX_JOB_CONTENT = "job.content"; public static final String DATAX_JOB_CONTENT_TRANSFORMER = "job.content[0].transformer"; public static final String DATAX_JOB_SETTING_KEYVERSION = "job.setting.keyVersion"; public static final String DATAX_JOB_SETTING_SPEED_BYTE = "job.setting.speed.byte"; public static final String DATAX_JOB_SETTING_SPEED_RECORD = "job.setting.speed.record"; public static final String DATAX_JOB_SETTING_SPEED_CHANNEL = "job.setting.speed.channel"; public static final String DATAX_JOB_SETTING_ERRORLIMIT = "job.setting.errorLimit"; public static final String DATAX_JOB_SETTING_ERRORLIMIT_RECORD = "job.setting.errorLimit.record"; public static final String DATAX_JOB_SETTING_ERRORLIMIT_PERCENT = "job.setting.errorLimit.percentage"; public static final String DATAX_JOB_SETTING_DRYRUN = "job.setting.dryRun"; public static final String DATAX_JOB_PREHANDLER_PLUGINTYPE = "job.preHandler.pluginType"; public static final String DATAX_JOB_PREHANDLER_PLUGINNAME = "job.preHandler.pluginName"; public static final String DATAX_JOB_POSTHANDLER_PLUGINTYPE = "job.postHandler.pluginType"; public static final String DATAX_JOB_POSTHANDLER_PLUGINNAME = "job.postHandler.pluginName"; // ----------------------------- 局部使用的变量 public static final String JOB_WRITER = "writer"; public static final String JOB_READER = "reader"; public static final String JOB_TRANSFORMER = "transformer"; public static final String JOB_READER_NAME = "reader.name"; public static final String JOB_READER_PARAMETER = "reader.parameter"; public static final String JOB_WRITER_NAME = "writer.name"; public static final String JOB_WRITER_PARAMETER = "writer.parameter"; public static final String TRANSFORMER_PARAMETER_COLUMNINDEX = "parameter.columnIndex"; public static final String TRANSFORMER_PARAMETER_PARAS = "parameter.paras"; public static final String TRANSFORMER_PARAMETER_CONTEXT = "parameter.context"; public static final String TRANSFORMER_PARAMETER_CODE = "parameter.code"; public static final String TRANSFORMER_PARAMETER_EXTRAPACKAGE = "parameter.extraPackage"; public static final String TASK_ID = "taskId"; // ----------------------------- 安全模块变量 ------------------ public static final String LAST_KEYVERSION = "last.keyVersion"; public static final String LAST_PUBLICKEY = "last.publicKey"; public static final String LAST_PRIVATEKEY = "last.privateKey"; public static final String LAST_SERVICE_USERNAME = "last.service.username"; public static final String LAST_SERVICE_PASSWORD = "last.service.password"; public static final String CURRENT_KEYVERSION = "current.keyVersion"; public static final String CURRENT_PUBLICKEY = "current.publicKey"; public static final String CURRENT_PRIVATEKEY = "current.privateKey"; public static final String CURRENT_SERVICE_USERNAME = "current.service.username"; public static final String CURRENT_SERVICE_PASSWORD = "current.service.password"; // ----------------------------- 环境变量 --------------------------------- public static String DATAX_HOME = System.getProperty("datax.home"); public static String DATAX_CONF_PATH = StringUtils.join(new String[] { DATAX_HOME, "conf", "core.json" }, File.separator); public static String DATAX_CONF_LOG_PATH = StringUtils.join(new String[] { DATAX_HOME, "conf", "logback.xml" }, File.separator); public static String DATAX_SECRET_PATH = StringUtils.join(new String[] { DATAX_HOME, "conf", ".secret.properties" }, File.separator); public static String DATAX_PLUGIN_HOME = StringUtils.join(new String[] { DATAX_HOME, "plugin" }, File.separator); public static String DATAX_PLUGIN_READER_HOME = StringUtils.join( new String[] { DATAX_HOME, "plugin", "reader" }, File.separator); public static String DATAX_PLUGIN_WRITER_HOME = StringUtils.join( new String[] { DATAX_HOME, "plugin", "writer" }, File.separator); public static String DATAX_BIN_HOME = StringUtils.join(new String[] { DATAX_HOME, "bin" }, File.separator); public static String DATAX_JOB_HOME = StringUtils.join(new String[] { DATAX_HOME, "job" }, File.separator); public static String DATAX_STORAGE_TRANSFORMER_HOME = StringUtils.join( new String[] { DATAX_HOME, "local_storage", "transformer" }, File.separator); public static String DATAX_STORAGE_PLUGIN_READ_HOME = StringUtils.join( new String[] { DATAX_HOME, "local_storage", "plugin","reader" }, File.separator); public static String DATAX_STORAGE_PLUGIN_WRITER_HOME = StringUtils.join( new String[] { DATAX_HOME, "local_storage", "plugin","writer" }, File.separator); } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/container/JarLoader.java ================================================ package com.alibaba.datax.core.util.container; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.core.util.FrameworkErrorCode; import org.apache.commons.lang.StringUtils; import org.apache.commons.lang.Validate; import java.io.File; import java.io.FileFilter; import java.net.URL; import java.net.URLClassLoader; import java.util.ArrayList; import java.util.List; /** * 提供Jar隔离的加载机制,会把传入的路径、及其子路径、以及路径中的jar文件加入到class path。 */ public class JarLoader extends URLClassLoader{ public JarLoader(String[] paths) { this(paths, JarLoader.class.getClassLoader()); } public JarLoader(String[] paths, ClassLoader parent) { super(getURLs(paths), parent); } private static URL[] getURLs(String[] paths) { Validate.isTrue(null != paths && 0 != paths.length, "jar包路径不能为空."); List dirs = new ArrayList(); for (String path : paths) { dirs.add(path); JarLoader.collectDirs(path, dirs); } List urls = new ArrayList(); for (String path : dirs) { urls.addAll(doGetURLs(path)); } return urls.toArray(new URL[0]); } private static void collectDirs(String path, List collector) { if (null == path || StringUtils.isBlank(path)) { return; } File current = new File(path); if (!current.exists() || !current.isDirectory()) { return; } for (File child : current.listFiles()) { if (!child.isDirectory()) { continue; } collector.add(child.getAbsolutePath()); collectDirs(child.getAbsolutePath(), collector); } } private static List doGetURLs(final String path) { Validate.isTrue(!StringUtils.isBlank(path), "jar包路径不能为空."); File jarPath = new File(path); Validate.isTrue(jarPath.exists() && jarPath.isDirectory(), "jar包路径必须存在且为目录."); /* set filter */ FileFilter jarFilter = new FileFilter() { @Override public boolean accept(File pathname) { return pathname.getName().endsWith(".jar"); } }; /* iterate all jar */ File[] allJars = new File(path).listFiles(jarFilter); List jarURLs = new ArrayList(allJars.length); for (int i = 0; i < allJars.length; i++) { try { jarURLs.add(allJars[i].toURI().toURL()); } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_INIT_ERROR, "系统加载jar包出错", e); } } return jarURLs; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/core/util/container/LoadUtil.java ================================================ package com.alibaba.datax.core.util.container; import com.alibaba.datax.common.constant.PluginType; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.AbstractJobPlugin; import com.alibaba.datax.common.plugin.AbstractPlugin; import com.alibaba.datax.common.plugin.AbstractTaskPlugin; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.taskgroup.runner.AbstractRunner; import com.alibaba.datax.core.taskgroup.runner.ReaderRunner; import com.alibaba.datax.core.taskgroup.runner.WriterRunner; import com.alibaba.datax.core.util.FrameworkErrorCode; import org.apache.commons.lang3.StringUtils; import java.util.HashMap; import java.util.Map; /** * Created by jingxing on 14-8-24. *

* 插件加载器,大体上分reader、transformer(还未实现)和writer三中插件类型, * reader和writer在执行时又可能出现Job和Task两种运行时(加载的类不同) */ public class LoadUtil { private static final String pluginTypeNameFormat = "plugin.%s.%s"; private LoadUtil() { } private enum ContainerType { Job("Job"), Task("Task"); private String type; private ContainerType(String type) { this.type = type; } public String value() { return type; } } /** * 所有插件配置放置在pluginRegisterCenter中,为区别reader、transformer和writer,还能区别 * 具体pluginName,故使用pluginType.pluginName作为key放置在该map中 */ private static Configuration pluginRegisterCenter; /** * jarLoader的缓冲 */ private static Map jarLoaderCenter = new HashMap(); /** * 设置pluginConfigs,方便后面插件来获取 * * @param pluginConfigs */ public static void bind(Configuration pluginConfigs) { pluginRegisterCenter = pluginConfigs; } private static String generatePluginKey(PluginType pluginType, String pluginName) { return String.format(pluginTypeNameFormat, pluginType.toString(), pluginName); } private static Configuration getPluginConf(PluginType pluginType, String pluginName) { Configuration pluginConf = pluginRegisterCenter .getConfiguration(generatePluginKey(pluginType, pluginName)); if (null == pluginConf) { throw DataXException.asDataXException( FrameworkErrorCode.PLUGIN_INSTALL_ERROR, String.format("DataX不能找到插件[%s]的配置.", pluginName)); } return pluginConf; } /** * 加载JobPlugin,reader、writer都可能要加载 * * @param pluginType * @param pluginName * @return */ public static AbstractJobPlugin loadJobPlugin(PluginType pluginType, String pluginName) { Class clazz = LoadUtil.loadPluginClass( pluginType, pluginName, ContainerType.Job); try { AbstractJobPlugin jobPlugin = (AbstractJobPlugin) clazz .newInstance(); jobPlugin.setPluginConf(getPluginConf(pluginType, pluginName)); return jobPlugin; } catch (Exception e) { throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, String.format("DataX找到plugin[%s]的Job配置.", pluginName), e); } } /** * 加载taskPlugin,reader、writer都可能加载 * * @param pluginType * @param pluginName * @return */ public static AbstractTaskPlugin loadTaskPlugin(PluginType pluginType, String pluginName) { Class clazz = LoadUtil.loadPluginClass( pluginType, pluginName, ContainerType.Task); try { AbstractTaskPlugin taskPlugin = (AbstractTaskPlugin) clazz .newInstance(); taskPlugin.setPluginConf(getPluginConf(pluginType, pluginName)); return taskPlugin; } catch (Exception e) { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, String.format("DataX不能找plugin[%s]的Task配置.", pluginName), e); } } /** * 根据插件类型、名字和执行时taskGroupId加载对应运行器 * * @param pluginType * @param pluginName * @return */ public static AbstractRunner loadPluginRunner(PluginType pluginType, String pluginName) { AbstractTaskPlugin taskPlugin = LoadUtil.loadTaskPlugin(pluginType, pluginName); switch (pluginType) { case READER: return new ReaderRunner(taskPlugin); case WRITER: return new WriterRunner(taskPlugin); default: throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, String.format("插件[%s]的类型必须是[reader]或[writer]!", pluginName)); } } /** * 反射出具体plugin实例 * * @param pluginType * @param pluginName * @param pluginRunType * @return */ @SuppressWarnings("unchecked") private static synchronized Class loadPluginClass( PluginType pluginType, String pluginName, ContainerType pluginRunType) { Configuration pluginConf = getPluginConf(pluginType, pluginName); JarLoader jarLoader = LoadUtil.getJarLoader(pluginType, pluginName); try { return (Class) jarLoader .loadClass(pluginConf.getString("class") + "$" + pluginRunType.value()); } catch (Exception e) { throw DataXException.asDataXException(FrameworkErrorCode.RUNTIME_ERROR, e); } } public static synchronized JarLoader getJarLoader(PluginType pluginType, String pluginName) { Configuration pluginConf = getPluginConf(pluginType, pluginName); JarLoader jarLoader = jarLoaderCenter.get(generatePluginKey(pluginType, pluginName)); if (null == jarLoader) { String pluginPath = pluginConf.getString("path"); if (StringUtils.isBlank(pluginPath)) { throw DataXException.asDataXException( FrameworkErrorCode.RUNTIME_ERROR, String.format( "%s插件[%s]路径非法!", pluginType, pluginName)); } jarLoader = new JarLoader(new String[]{pluginPath}); jarLoaderCenter.put(generatePluginKey(pluginType, pluginName), jarLoader); } return jarLoader; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/dataxservice/face/domain/enums/EnumStrVal.java ================================================ package com.alibaba.datax.dataxservice.face.domain.enums; public interface EnumStrVal { public String value(); } ================================================ FILE: core/src/main/java/com/alibaba/datax/dataxservice/face/domain/enums/EnumVal.java ================================================ package com.alibaba.datax.dataxservice.face.domain.enums; public interface EnumVal { public int value(); } ================================================ FILE: core/src/main/java/com/alibaba/datax/dataxservice/face/domain/enums/ExecuteMode.java ================================================ package com.alibaba.datax.dataxservice.face.domain.enums; public enum ExecuteMode implements EnumStrVal { STANDALONE("standalone"), LOCAL("local"), DISTRIBUTE("distribute"); String value; ExecuteMode(String value) { this.value = value; } @Override public String value() { return value; } public String getValue() { return this.value; } public static boolean isLocal(String mode) { return equalsIgnoreCase(LOCAL.getValue(), mode); } public static boolean isDistribute(String mode) { return equalsIgnoreCase(DISTRIBUTE.getValue(), mode); } public static ExecuteMode toExecuteMode(String modeName) { for (ExecuteMode mode : ExecuteMode.values()) { if (mode.value().equals(modeName)) { return mode; } } throw new RuntimeException("no such mode :" + modeName); } private static boolean equalsIgnoreCase(String str1, String str2) { return str1 == null ? str2 == null : str1.equalsIgnoreCase(str2); } @Override public String toString() { return this.value; } } ================================================ FILE: core/src/main/java/com/alibaba/datax/dataxservice/face/domain/enums/State.java ================================================ package com.alibaba.datax.dataxservice.face.domain.enums; public enum State implements EnumVal { SUBMITTING(10), WAITING(20), RUNNING(30), KILLING(40), KILLED(50), FAILED(60), SUCCEEDED(70); /* 一定会被初始化的 */ int value; State(int value) { this.value = value; } @Override public int value() { return value; } public boolean isFinished() { return this == KILLED || this == FAILED || this == SUCCEEDED; } public boolean isRunning() { return !isFinished(); } } ================================================ FILE: core/src/main/job/job.json ================================================ { "job": { "setting": { "speed": { "channel": 2 }, "errorLimit": { "record": 0 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "value": "DataX", "type": "string" }, { "value": 1724154616370, "type": "long" }, { "value": "2024-01-01 00:00:00", "type": "date" }, { "value": true, "type": "bool" }, { "value": "TestRawData", "type": "bytes" } ], "sliceRecordCount": 100 } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } } ================================================ FILE: core/src/main/script/Readme.md ================================================ some script here. ================================================ FILE: databendwriter/doc/databendwriter-CN.md ================================================ # DataX DatabendWriter [简体中文](./databendwriter-CN.md) | [English](./databendwriter.md) ## 1 快速介绍 Databend Writer 是一个 DataX 的插件,用于从 DataX 中写入数据到 Databend 表中。 该插件基于[databend JDBC driver](https://github.com/databendcloud/databend-jdbc) ,它使用 [RESTful http protocol](https://databend.rs/doc/integrations/api/rest) 在开源的 databend 和 [databend cloud](https://app.databend.com/) 上执行查询。 在每个写入批次中,databend writer 将批量数据上传到内部的 S3 stage,然后执行相应的 insert SQL 将数据上传到 databend 表中。 为了最佳的用户体验,如果您使用的是 databend 社区版本,您应该尝试采用 [S3](https://aws.amazon.com/s3/)/[minio](https://min.io/)/[OSS](https://www.alibabacloud.com/product/object-storage-service) 作为其底层存储层,因为 它们支持预签名上传操作,否则您可能会在数据传输上浪费不必要的成本。 您可以在[文档](https://databend.rs/doc/deploy/deploying-databend)中了解更多详细信息 ## 2 实现原理 Databend Writer 将使用 DataX 从 DataX Reader 中获取生成的记录,并将记录批量插入到 databend 表中指定的列中。 ## 3 功能说明 ### 3.1 配置样例 * 以下配置将从内存中读取一些生成的数据,并将数据上传到databend表中 #### 准备工作 ```sql --- create table in databend drop table if exists datax.sample1; drop database if exists datax; create database if not exists datax; create table if not exsits datax.sample1(a string, b int64, c date, d timestamp, e bool, f string, g variant); ``` #### 配置样例 ```json { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1926-08-08 08:08:08", "type": "date" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" }, { "value": "{\"type\": \"variant\", \"value\": \"test\"}", "type": "string" } ], "sliceRecordCount": 10000 } }, "writer": { "name": "databendwriter", "parameter": { "writeMode": "replace", "onConflictColumn": ["id"], "username": "databend", "password": "databend", "column": ["a", "b", "c", "d", "e", "f", "g"], "batchSize": 1000, "preSql": [ ], "postSql": [ ], "connection": [ { "jdbcUrl": "jdbc:databend://localhost:8000/datax", "table": [ "sample1" ] } ] } } } ], "setting": { "speed": { "channel": 1 } } } } ``` ### 3.2 参数说明 * jdbcUrl * 描述: JDBC 数据源 url。请参阅仓库中的详细[文档](https://github.com/databendcloud/databend-jdbc) * 必选: 是 * 默认值: 无 * 示例: jdbc:databend://localhost:8000/datax * username * 描述: JDBC 数据源用户名 * 必选: 是 * 默认值: 无 * 示例: databend * password * 描述: JDBC 数据源密码 * 必选: 是 * 默认值: 无 * 示例: databend * table * 描述: 表名的集合,table应该包含column参数中的所有列。 * 必选: 是 * 默认值: 无 * 示例: ["sample1"] * column * 描述: 表中的列名集合,字段顺序应该与reader的record中的column类型对应 * 必选: 是 * 默认值: 无 * 示例: ["a", "b", "c", "d", "e", "f", "g"] * batchSize * 描述: 每个批次的记录数 * 必选: 否 * 默认值: 1000 * 示例: 1000 * preSql * 描述: 在写入数据之前执行的SQL语句 * 必选: 否 * 默认值: 无 * 示例: ["delete from datax.sample1"] * postSql * 描述: 在写入数据之后执行的SQL语句 * 必选: 否 * 默认值: 无 * 示例: ["select count(*) from datax.sample1"] * writeMode * 描述:写入模式,支持 insert 和 replace 两种模式,默认为 insert。若为 replace,务必填写 onConflictColumn 参数 * 必选:否 * 默认值:insert * 示例:"replace" * onConflictColumn * 描述:on conflict 字段,指定 writeMode 为 replace 后,需要此参数 * 必选:否 * 默认值:无 * 示例:["id","user"] ### 3.3 类型转化 DataX中的数据类型可以转换为databend中的相应数据类型。下表显示了两种类型之间的对应关系。 | DataX 内部类型 | Databend 数据类型 | |------------|-----------------------------------------------------------| | INT | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 | | LONG | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 | | STRING | STRING, VARCHAR | | DOUBLE | FLOAT, DOUBLE | | BOOL | BOOLEAN, BOOL | | DATE | DATE, TIMESTAMP | | BYTES | STRING, VARCHAR | ## 4 性能测试 ## 5 约束限制 目前,复杂数据类型支持不稳定,如果您想使用复杂数据类型,例如元组,数组,请检查databend和jdbc驱动程序的进一步版本。 ## FAQ ================================================ FILE: databendwriter/doc/databendwriter.md ================================================ # DataX DatabendWriter [简体中文](./databendwriter-CN.md) | [English](./databendwriter.md) ## 1 Introduction Databend Writer is a plugin for DataX to write data to Databend Table from dataX records. The plugin is based on [databend JDBC driver](https://github.com/databendcloud/databend-jdbc) which use [RESTful http protocol](https://databend.rs/doc/integrations/api/rest) to execute query on open source databend and [databend cloud](https://app.databend.com/). During each write batch, databend writer will upload batch data into internal S3 stage and execute corresponding insert SQL to upload data into databend table. For best user experience, if you are using databend community distribution, you should try to adopt [S3](https://aws.amazon.com/s3/)/[minio](https://min.io/)/[OSS](https://www.alibabacloud.com/product/object-storage-service) as its underlying storage layer since they support presign upload operation otherwise you may expend unneeded cost on data transfer. You could see more details on the [doc](https://databend.rs/doc/deploy/deploying-databend) ## 2 Detailed Implementation Databend Writer would use DataX to fetch records generated by DataX Reader, and then batch insert records to the designated columns for your databend table. ## 3 Features ### 3.1 Example Configurations * the following configuration would read some generated data in memory and upload data into databend table #### Preparation ```sql --- create table in databend drop table if exists datax.sample1; drop database if exists datax; create database if not exists datax; create table if not exsits datax.sample1(a string, b int64, c date, d timestamp, e bool, f string, g variant); ``` #### Configurations ```json { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1926-08-08 08:08:08", "type": "date" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" }, { "value": "{\"type\": \"variant\", \"value\": \"test\"}", "type": "string" } ], "sliceRecordCount": 10000 } }, "writer": { "name": "databendwriter", "parameter": { "username": "databend", "password": "databend", "column": ["a", "b", "c", "d", "e", "f", "g"], "batchSize": 1000, "preSql": [ ], "postSql": [ ], "connection": [ { "jdbcUrl": "jdbc:databend://localhost:8000/datax", "table": [ "sample1" ] } ] } } } ], "setting": { "speed": { "channel": 1 } } } } ``` ### 3.2 Configuration Description * jdbcUrl * Description: JDBC Data source url in Databend. Please take a look at repository for detailed [doc](https://github.com/databendcloud/databend-jdbc) * Required: yes * Default: none * Example: jdbc:databend://localhost:8000/datax * username * Description: Databend user name * Required: yes * Default: none * Example: databend * password * Description: Databend user password * Required: yes * Default: none * Example: databend * table * Description: A list of table names that should contain all of the columns in the column parameter. * Required: yes * Default: none * Example: ["sample1"] * column * Description: A list of column field names that should be inserted into the table. if you want to insert all column fields use `["*"]` instead. * Required: yes * Default: none * Example: ["a", "b", "c", "d", "e", "f", "g"] * batchSize * Description: The number of records to be inserted in each batch. * Required: no * Default: 1024 * preSql * Description: A list of SQL statements that will be executed before the write operation. * Required: no * Default: none * postSql * Description: A list of SQL statements that will be executed after the write operation. * Required: no * Default: none * writeMode * Description:The write mode, support `insert` and `replace` two mode. * Required:no * Default:insert * Example:"replace" * onConflictColumn * Description:On conflict fields list. * Required:no * Default:none * Example:["id","user"] ### 3.3 Type Convert Data types in datax can be converted to the corresponding data types in databend. The following table shows the correspondence between the two types. | DataX Type | Databend Type | |------------|-----------------------------------------------------------| | INT | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 | | LONG | TINYINT, INT8, SMALLINT, INT16, INT, INT32, BIGINT, INT64 | | STRING | STRING, VARCHAR | | DOUBLE | FLOAT, DOUBLE | | BOOL | BOOLEAN, BOOL | | DATE | DATE, TIMESTAMP | | BYTES | STRING, VARCHAR | ## 4 Performance Test ## 5 Restrictions Currently, complex data type support is not stable, if you want to use complex data type such as tuple, array, please check further release version of databend and jdbc driver. ## FAQ ================================================ FILE: databendwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 databendwriter databendwriter jar com.databend databend-jdbc 0.1.0 com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-common ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.google.guava guava junit junit test src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: databendwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/databendwriter target/ databendwriter-0.0.1-SNAPSHOT.jar plugin/writer/databendwriter false plugin/writer/databendwriter/libs ================================================ FILE: databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/DatabendWriter.java ================================================ package com.alibaba.datax.plugin.writer.databendwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.writer.databendwriter.util.DatabendWriterUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.List; import java.util.regex.Pattern; public class DatabendWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.Databend; public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig; private CommonRdbmsWriter.Job commonRdbmsWriterMaster; @Override public void init() throws DataXException { this.originalConfig = super.getPluginJobConf(); this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterMaster.init(this.originalConfig); // placeholder currently not supported by databend driver, needs special treatment DatabendWriterUtil.dealWriteMode(this.originalConfig); } @Override public void preCheck() { this.init(); this.commonRdbmsWriterMaster.writerPreCheck(this.originalConfig, DATABASE_TYPE); } @Override public void prepare() { this.commonRdbmsWriterMaster.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterMaster.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterSlave; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DataBaseType.Databend) { @Override protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException { try { if (column.getRawData() == null) { preparedStatement.setNull(columnIndex + 1, columnSqltype); return preparedStatement; } java.util.Date utilDate; switch (columnSqltype) { case Types.TINYINT: case Types.SMALLINT: case Types.INTEGER: preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); break; case Types.BIGINT: preparedStatement.setLong(columnIndex + 1, column.asLong()); break; case Types.DECIMAL: preparedStatement.setBigDecimal(columnIndex + 1, column.asBigDecimal()); break; case Types.FLOAT: case Types.REAL: preparedStatement.setFloat(columnIndex + 1, column.asDouble().floatValue()); break; case Types.DOUBLE: preparedStatement.setDouble(columnIndex + 1, column.asDouble()); break; case Types.DATE: java.sql.Date sqlDate = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "Date type conversion error: [%s]", column)); } if (null != utilDate) { sqlDate = new java.sql.Date(utilDate.getTime()); } preparedStatement.setDate(columnIndex + 1, sqlDate); break; case Types.TIME: java.sql.Time sqlTime = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "Date type conversion error: [%s]", column)); } if (null != utilDate) { sqlTime = new java.sql.Time(utilDate.getTime()); } preparedStatement.setTime(columnIndex + 1, sqlTime); break; case Types.TIMESTAMP: Timestamp sqlTimestamp = null; if (column instanceof StringColumn && column.asString() != null) { String timeStampStr = column.asString(); // JAVA TIMESTAMP 类型入参必须是 "2017-07-12 14:39:00.123566" 格式 String pattern = "^\\d+-\\d+-\\d+ \\d+:\\d+:\\d+.\\d+"; boolean isMatch = Pattern.matches(pattern, timeStampStr); if (isMatch) { sqlTimestamp = Timestamp.valueOf(timeStampStr); preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); break; } } try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "Date type conversion error: [%s]", column)); } if (null != utilDate) { sqlTimestamp = new Timestamp( utilDate.getTime()); } preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); break; case Types.BINARY: case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: preparedStatement.setBytes(columnIndex + 1, column .asBytes()); break; case Types.BOOLEAN: // warn: bit(1) -> Types.BIT 可使用setBoolean // warn: bit(>1) -> Types.VARBINARY 可使用setBytes case Types.BIT: if (this.dataBaseType == DataBaseType.MySql) { Boolean asBoolean = column.asBoolean(); if (asBoolean != null) { preparedStatement.setBoolean(columnIndex + 1, asBoolean); } else { preparedStatement.setNull(columnIndex + 1, Types.BIT); } } else { preparedStatement.setString(columnIndex + 1, column.asString()); } break; default: // cast variant / array into string is fine. preparedStatement.setString(columnIndex + 1, column.asString()); break; } return preparedStatement; } catch (DataXException e) { // fix类型转换或者溢出失败时,将具体哪一列打印出来 if (e.getErrorCode() == CommonErrorCode.CONVERT_NOT_SUPPORT || e.getErrorCode() == CommonErrorCode.CONVERT_OVER_FLOW) { throw DataXException .asDataXException( e.getErrorCode(), String.format( "type conversion error. columnName: [%s], columnType:[%d], columnJavaType: [%s]. please change the data type in given column field or do not sync on the column.", this.resultSetMetaData.getLeft() .get(columnIndex), this.resultSetMetaData.getMiddle() .get(columnIndex), this.resultSetMetaData.getRight() .get(columnIndex))); } else { throw e; } } } }; this.commonRdbmsWriterSlave.init(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); } @Override public void post() { this.commonRdbmsWriterSlave.post(this.writerSliceConfig); } @Override public void startWrite(RecordReceiver lineReceiver) { this.commonRdbmsWriterSlave.startWrite(lineReceiver, this.writerSliceConfig, this.getTaskPluginCollector()); } } } ================================================ FILE: databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/DatabendWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.databendwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum DatabendWriterErrorCode implements ErrorCode { CONF_ERROR("DatabendWriter-00", "配置错误."), WRITE_DATA_ERROR("DatabendWriter-01", "写入数据时失败."), ; private final String code; private final String description; private DatabendWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: databendwriter/src/main/java/com/alibaba/datax/plugin/writer/databendwriter/util/DatabendWriterUtil.java ================================================ package com.alibaba.datax.plugin.writer.databendwriter.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.writer.databendwriter.DatabendWriterErrorCode; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import javax.xml.crypto.Data; import java.util.List; import java.util.StringJoiner; public final class DatabendWriterUtil { private static final Logger LOG = LoggerFactory.getLogger(DatabendWriterUtil.class); private DatabendWriterUtil() { } public static void dealWriteMode(Configuration originalConfig) throws DataXException { List columns = originalConfig.getList(Key.COLUMN, String.class); List onConflictColumns = originalConfig.getList(Key.ONCONFLICT_COLUMN, String.class); StringBuilder writeDataSqlTemplate = new StringBuilder(); String jdbcUrl = originalConfig.getString(String.format("%s[0].%s", Constant.CONN_MARK, Key.JDBC_URL, String.class)); String writeMode = originalConfig.getString(Key.WRITE_MODE, "INSERT"); LOG.info("write mode is {}", writeMode); if (writeMode.toLowerCase().contains("replace")) { if (onConflictColumns == null || onConflictColumns.size() == 0) { throw DataXException .asDataXException( DatabendWriterErrorCode.CONF_ERROR, String.format( "Replace mode must has onConflictColumn config." )); } // for databend if you want to use replace mode, the writeMode should be: "writeMode": "replace" writeDataSqlTemplate.append("REPLACE INTO %s (") .append(StringUtils.join(columns, ",")).append(") ").append(onConFlictDoString(onConflictColumns)) .append(" VALUES"); LOG.info("Replace data [\n{}\n], which jdbcUrl like:[{}]", writeDataSqlTemplate, jdbcUrl); originalConfig.set(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK, writeDataSqlTemplate); } else { writeDataSqlTemplate.append("INSERT INTO %s"); StringJoiner columnString = new StringJoiner(","); for (String column : columns) { columnString.add(column); } writeDataSqlTemplate.append(String.format("(%s)", columnString)); writeDataSqlTemplate.append(" VALUES"); LOG.info("Insert data [\n{}\n], which jdbcUrl like:[{}]", writeDataSqlTemplate, jdbcUrl); originalConfig.set(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK, writeDataSqlTemplate); } } public static String onConFlictDoString(List conflictColumns) { return " ON " + "(" + StringUtils.join(conflictColumns, ",") + ") "; } } ================================================ FILE: databendwriter/src/main/resources/plugin.json ================================================ { "name": "databendwriter", "class": "com.alibaba.datax.plugin.writer.databendwriter.DatabendWriter", "description": "execute batch insert sql to write dataX data into databend", "developer": "databend" } ================================================ FILE: databendwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "databendwriter", "parameter": { "username": "username", "password": "password", "column": ["col1", "col2", "col3"], "connection": [ { "jdbcUrl": "jdbc:databend://:[/]", "table": "table1" } ], "preSql": [], "postSql": [], "maxBatchRows": 65536, "maxBatchSize": 134217728 } } ================================================ FILE: datahubreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 datahubreader 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.aliyun.datahub aliyun-sdk-datahub 2.21.6-public junit junit 4.12 test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: datahubreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin/reader/datahubreader target/ datahubreader-0.0.1-SNAPSHOT.jar plugin/reader/datahubreader false plugin/reader/datahubreader/libs runtime ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.datahubreader; public class Constant { public static String DATETIME_FORMAT = "yyyyMMddHHmmss"; public static String DATE_FORMAT = "yyyyMMdd"; } ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubClientHelper.java ================================================ package com.alibaba.datax.plugin.reader.datahubreader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import com.aliyun.datahub.client.DatahubClient; import com.aliyun.datahub.client.DatahubClientBuilder; import com.aliyun.datahub.client.auth.Account; import com.aliyun.datahub.client.auth.AliyunAccount; import com.aliyun.datahub.client.common.DatahubConfig; import com.aliyun.datahub.client.http.HttpConfig; import org.apache.commons.lang3.StringUtils; public class DatahubClientHelper { public static DatahubClient getDatahubClient(Configuration jobConfig) { String accessId = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_ID, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); String accessKey = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_KEY, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); String endpoint = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ENDPOINT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); Account account = new AliyunAccount(accessId, accessKey); // 是否开启二进制传输,服务端2.12版本开始支持 boolean enableBinary = jobConfig.getBool("enableBinary", false); DatahubConfig datahubConfig = new DatahubConfig(endpoint, account, enableBinary); // HttpConfig可不设置,不设置时采用默认值 // 读写数据推荐打开网络传输 LZ4压缩 HttpConfig httpConfig = null; String httpConfigStr = jobConfig.getString("httpConfig"); if (StringUtils.isNotBlank(httpConfigStr)) { httpConfig = JSON.parseObject(httpConfigStr, new TypeReference() { }); } DatahubClientBuilder builder = DatahubClientBuilder.newBuilder().setDatahubConfig(datahubConfig); if (null != httpConfig) { builder.setHttpConfig(httpConfig); } DatahubClient datahubClient = builder.build(); return datahubClient; } } ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReader.java ================================================ package com.alibaba.datax.plugin.reader.datahubreader; import java.text.ParseException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import com.aliyun.datahub.client.model.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.aliyun.datahub.client.DatahubClient; public class DatahubReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig; private Long beginTimestampMillis; private Long endTimestampMillis; DatahubClient datahubClient; @Override public void init() { LOG.info("datahub reader job init begin ..."); this.originalConfig = super.getPluginJobConf(); validateParameter(originalConfig); this.datahubClient = DatahubClientHelper.getDatahubClient(this.originalConfig); LOG.info("datahub reader job init end."); } private void validateParameter(Configuration conf){ conf.getNecessaryValue(Key.ENDPOINT,DatahubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.ACCESSKEYID,DatahubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.ACCESSKEYSECRET,DatahubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.PROJECT,DatahubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.TOPIC,DatahubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.COLUMN,DatahubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.BEGINDATETIME,DatahubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.ENDDATETIME,DatahubReaderErrorCode.REQUIRE_VALUE); int batchSize = this.originalConfig.getInt(Key.BATCHSIZE, 1024); if (batchSize > 10000) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid batchSize[" + batchSize + "] value (0,10000]!"); } String beginDateTime = this.originalConfig.getString(Key.BEGINDATETIME); if (beginDateTime != null) { try { beginTimestampMillis = DatahubReaderUtils.getUnixTimeFromDateTime(beginDateTime); } catch (ParseException e) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid beginDateTime[" + beginDateTime + "], format [yyyyMMddHHmmss]!"); } } if (beginTimestampMillis != null && beginTimestampMillis <= 0) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid beginTimestampMillis[" + beginTimestampMillis + "]!"); } String endDateTime = this.originalConfig.getString(Key.ENDDATETIME); if (endDateTime != null) { try { endTimestampMillis = DatahubReaderUtils.getUnixTimeFromDateTime(endDateTime); } catch (ParseException e) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid beginDateTime[" + endDateTime + "], format [yyyyMMddHHmmss]!"); } } if (endTimestampMillis != null && endTimestampMillis <= 0) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid endTimestampMillis[" + endTimestampMillis + "]!"); } if (beginTimestampMillis != null && endTimestampMillis != null && endTimestampMillis <= beginTimestampMillis) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "endTimestampMillis[" + endTimestampMillis + "] must bigger than beginTimestampMillis[" + beginTimestampMillis + "]!"); } } @Override public void prepare() { // create datahub client String project = originalConfig.getNecessaryValue(Key.PROJECT, DatahubReaderErrorCode.REQUIRE_VALUE); String topic = originalConfig.getNecessaryValue(Key.TOPIC, DatahubReaderErrorCode.REQUIRE_VALUE); RecordType recordType = null; try { DatahubClient client = DatahubClientHelper.getDatahubClient(this.originalConfig); GetTopicResult getTopicResult = client.getTopic(project, topic); recordType = getTopicResult.getRecordType(); } catch (Exception e) { LOG.warn("get topic type error: {}", e.getMessage()); } if (null != recordType) { if (recordType == RecordType.BLOB) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "DatahubReader only support 'Tuple' RecordType now, but your RecordType is 'BLOB'"); } } } @Override public void destroy() { } @Override public List split(int adviceNumber) { LOG.info("split() begin..."); List readerSplitConfigs = new ArrayList(); String project = this.originalConfig.getString(Key.PROJECT); String topic = this.originalConfig.getString(Key.TOPIC); List shardEntrys = DatahubReaderUtils.getShardsWithRetry(this.datahubClient, project, topic); if (shardEntrys == null || shardEntrys.isEmpty() || shardEntrys.size() == 0) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "Project [" + project + "] Topic [" + topic + "] has no shards, please check !"); } for (ShardEntry shardEntry : shardEntrys) { Configuration splitedConfig = this.originalConfig.clone(); splitedConfig.set(Key.SHARDID, shardEntry.getShardId()); readerSplitConfigs.add(splitedConfig); } LOG.info("split() ok and end..."); return readerSplitConfigs; } } public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration taskConfig; private String accessId; private String accessKey; private String endpoint; private String project; private String topic; private String shardId; private Long beginTimestampMillis; private Long endTimestampMillis; private int batchSize; private List columns; private RecordSchema schema; private String timeStampUnit; DatahubClient datahubClient; @Override public void init() { this.taskConfig = super.getPluginJobConf(); this.accessId = this.taskConfig.getString(Key.ACCESSKEYID); this.accessKey = this.taskConfig.getString(Key.ACCESSKEYSECRET); this.endpoint = this.taskConfig.getString(Key.ENDPOINT); this.project = this.taskConfig.getString(Key.PROJECT); this.topic = this.taskConfig.getString(Key.TOPIC); this.shardId = this.taskConfig.getString(Key.SHARDID); this.batchSize = this.taskConfig.getInt(Key.BATCHSIZE, 1024); this.timeStampUnit = this.taskConfig.getString(Key.TIMESTAMP_UNIT, "MICROSECOND"); try { this.beginTimestampMillis = DatahubReaderUtils.getUnixTimeFromDateTime(this.taskConfig.getString(Key.BEGINDATETIME)); } catch (ParseException e) { } try { this.endTimestampMillis = DatahubReaderUtils.getUnixTimeFromDateTime(this.taskConfig.getString(Key.ENDDATETIME)); } catch (ParseException e) { } this.columns = this.taskConfig.getList(Key.COLUMN, String.class); this.datahubClient = DatahubClientHelper.getDatahubClient(this.taskConfig); this.schema = DatahubReaderUtils.getDatahubSchemaWithRetry(this.datahubClient, this.project, topic); LOG.info("init datahub reader task finished.project:{} topic:{} batchSize:{}", project, topic, batchSize); } @Override public void destroy() { } @Override public void startRead(RecordSender recordSender) { LOG.info("read start"); String beginCursor = DatahubReaderUtils.getCursorWithRetry(this.datahubClient, this.project, this.topic, this.shardId, this.beginTimestampMillis); String endCursor = DatahubReaderUtils.getCursorWithRetry(this.datahubClient, this.project, this.topic, this.shardId, this.endTimestampMillis); if (beginCursor == null) { LOG.info("Shard:{} has no data!", this.shardId); return; } else if (endCursor == null) { endCursor = DatahubReaderUtils.getLatestCursorWithRetry(this.datahubClient, this.project, this.topic, this.shardId); } String curCursor = beginCursor; boolean exit = false; while (true) { GetRecordsResult result = DatahubReaderUtils.getRecordsResultWithRetry(this.datahubClient, this.project, this.topic, this.shardId, this.batchSize, curCursor, this.schema); List records = result.getRecords(); if (records.size() > 0) { for (RecordEntry record : records) { if (record.getSystemTime() >= this.endTimestampMillis) { exit = true; break; } HashMap dataMap = new HashMap(); List fields = ((TupleRecordData) record.getRecordData()).getRecordSchema().getFields(); for (int i = 0; i < fields.size(); i++) { Field field = fields.get(i); Column column = DatahubReaderUtils.getColumnFromField(record, field, this.timeStampUnit); dataMap.put(field.getName(), column); } Record dataxRecord = recordSender.createRecord(); if (null != this.columns && 1 == this.columns.size()) { String columnsInStr = columns.get(0).toString(); if ("\"*\"".equals(columnsInStr) || "*".equals(columnsInStr)) { for (int i = 0; i < fields.size(); i++) { dataxRecord.addColumn(dataMap.get(fields.get(i).getName())); } } else { if (dataMap.containsKey(columnsInStr)) { dataxRecord.addColumn(dataMap.get(columnsInStr)); } else { dataxRecord.addColumn(new StringColumn(null)); } } } else { for (String col : this.columns) { if (dataMap.containsKey(col)) { dataxRecord.addColumn(dataMap.get(col)); } else { dataxRecord.addColumn(new StringColumn(null)); } } } recordSender.sendToWriter(dataxRecord); } } else { break; } if (exit) { break; } curCursor = result.getNextCursor(); } LOG.info("end read datahub shard..."); } } } ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.datahubreader; import com.alibaba.datax.common.spi.ErrorCode; public enum DatahubReaderErrorCode implements ErrorCode { BAD_CONFIG_VALUE("DatahubReader-00", "The value you configured is invalid."), LOG_HUB_ERROR("DatahubReader-01","Datahub exception"), REQUIRE_VALUE("DatahubReader-02","Missing parameters"), EMPTY_LOGSTORE_VALUE("DatahubReader-03","There is no shard under this LogStore"); private final String code; private final String description; private DatahubReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubReaderUtils.java ================================================ package com.alibaba.datax.plugin.reader.datahubreader; import java.math.BigDecimal; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import java.util.List; import java.util.concurrent.Callable; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.DataXCaseEnvUtil; import com.alibaba.datax.common.util.RetryUtil; import com.aliyun.datahub.client.DatahubClient; import com.aliyun.datahub.client.exception.InvalidParameterException; import com.aliyun.datahub.client.model.*; public class DatahubReaderUtils { public static long getUnixTimeFromDateTime(String dateTime) throws ParseException { try { String format = Constant.DATETIME_FORMAT; SimpleDateFormat simpleDateFormat = new SimpleDateFormat(format); return simpleDateFormat.parse(dateTime).getTime(); } catch (ParseException ignored) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid DateTime[" + dateTime + "]!"); } } public static List getShardsWithRetry(final DatahubClient datahubClient, final String project, final String topic) { List shards = null; try { shards = RetryUtil.executeWithRetry(new Callable>() { @Override public List call() throws Exception { ListShardResult listShardResult = datahubClient.listShard(project, topic); return listShardResult.getShards(); } }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "get Shards error, please check ! detail error messsage: " + e.toString()); } return shards; } public static String getCursorWithRetry(final DatahubClient datahubClient, final String project, final String topic, final String shardId, final long timestamp) { String cursor; try { cursor = RetryUtil.executeWithRetry(new Callable() { @Override public String call() throws Exception { try { return datahubClient.getCursor(project, topic, shardId, CursorType.SYSTEM_TIME, timestamp).getCursor(); } catch (InvalidParameterException e) { if (e.getErrorMessage().indexOf("Time in seek request is out of range") >= 0) { return null; } else { throw e; } } } }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "get Cursor error, please check ! detail error messsage: " + e.toString()); } return cursor; } public static String getLatestCursorWithRetry(final DatahubClient datahubClient, final String project, final String topic, final String shardId) { String cursor; try { cursor = RetryUtil.executeWithRetry(new Callable() { @Override public String call() throws Exception { return datahubClient.getCursor(project, topic, shardId, CursorType.LATEST).getCursor(); } }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "get Cursor error, please check ! detail error messsage: " + e.toString()); } return cursor; } public static RecordSchema getDatahubSchemaWithRetry(final DatahubClient datahubClient, final String project, final String topic) { RecordSchema schema; try { schema = RetryUtil.executeWithRetry(new Callable() { @Override public RecordSchema call() throws Exception { return datahubClient.getTopic(project, topic).getRecordSchema(); } }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "get Topic Schema error, please check ! detail error messsage: " + e.toString()); } return schema; } public static GetRecordsResult getRecordsResultWithRetry(final DatahubClient datahubClient, final String project, final String topic, final String shardId, final int batchSize, final String cursor, final RecordSchema schema) { GetRecordsResult result; try { result = RetryUtil.executeWithRetry(new Callable() { @Override public GetRecordsResult call() throws Exception { return datahubClient.getRecords(project, topic, shardId, schema, cursor, batchSize); } }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(DatahubReaderErrorCode.BAD_CONFIG_VALUE, "get Record Result error, please check ! detail error messsage: " + e.toString()); } return result; } public static Column getColumnFromField(RecordEntry record, Field field, String timeStampUnit) { Column col = null; TupleRecordData o = (TupleRecordData) record.getRecordData(); switch (field.getType()) { case SMALLINT: Short shortValue = ((Short) o.getField(field.getName())); col = new LongColumn(shortValue == null ? null: shortValue.longValue()); break; case INTEGER: col = new LongColumn((Integer) o.getField(field.getName())); break; case BIGINT: { col = new LongColumn((Long) o.getField(field.getName())); break; } case TINYINT: { Byte byteValue = ((Byte) o.getField(field.getName())); col = new LongColumn(byteValue == null ? null : byteValue.longValue()); break; } case BOOLEAN: { col = new BoolColumn((Boolean) o.getField(field.getName())); break; } case FLOAT: col = new DoubleColumn((Float) o.getField(field.getName())); break; case DOUBLE: { col = new DoubleColumn((Double) o.getField(field.getName())); break; } case STRING: { col = new StringColumn((String) o.getField(field.getName())); break; } case DECIMAL: { BigDecimal value = (BigDecimal) o.getField(field.getName()); col = new DoubleColumn(value == null ? null : value.doubleValue()); break; } case TIMESTAMP: { Long value = (Long) o.getField(field.getName()); if ("MILLISECOND".equals(timeStampUnit)) { // MILLISECOND, 13位精度,直接 new Date() col = new DateColumn(value == null ? null : new Date(value)); } else if ("SECOND".equals(timeStampUnit)){ col = new DateColumn(value == null ? null : new Date(value * 1000)); } else { // 默认都是 MICROSECOND, 16位精度, 和之前的逻辑保持一致。 col = new DateColumn(value == null ? null : new Date(value / 1000)); } break; } default: throw new RuntimeException("Unknown column type: " + field.getType()); } return col; } } ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/DatahubWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.datahubreader; import com.alibaba.datax.common.spi.ErrorCode; import com.alibaba.datax.common.util.MessageSource; public enum DatahubWriterErrorCode implements ErrorCode { MISSING_REQUIRED_VALUE("DatahubWriter-01", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.missing_required_value")), INVALID_CONFIG_VALUE("DatahubWriter-02", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.invalid_config_value")), GET_TOPOIC_INFO_FAIL("DatahubWriter-03", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.get_topic_info_fail")), WRITE_DATAHUB_FAIL("DatahubWriter-04", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.write_datahub_fail")), SCHEMA_NOT_MATCH("DatahubWriter-05", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.schema_not_match")), ; private final String code; private final String description; private DatahubWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.datahubreader; public final class Key { /** * 此处声明插件用到的需要插件使用者提供的配置项 */ public static final String ENDPOINT = "endpoint"; public static final String ACCESSKEYID = "accessId"; public static final String ACCESSKEYSECRET = "accessKey"; public static final String PROJECT = "project"; public static final String TOPIC = "topic"; public static final String BEGINDATETIME = "beginDateTime"; public static final String ENDDATETIME = "endDateTime"; public static final String BATCHSIZE = "batchSize"; public static final String COLUMN = "column"; public static final String SHARDID = "shardId"; public static final String CONFIG_KEY_ENDPOINT = "endpoint"; public static final String CONFIG_KEY_ACCESS_ID = "accessId"; public static final String CONFIG_KEY_ACCESS_KEY = "accessKey"; public static final String TIMESTAMP_UNIT = "timeStampUnit"; } ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_en_US.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_ja_JP.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_CN.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_HK.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.errorcode.missing_required_value=您缺失了必須填寫的參數值. errorcode.invalid_config_value=您的參數配寘錯誤. errorcode.get_topic_info_fail=獲取shard清單失敗. errorcode.write_datahub_fail=寫數據失敗. errorcode.schema_not_match=數據格式錯誤. ================================================ FILE: datahubreader/src/main/java/com/alibaba/datax/plugin/reader/datahubreader/LocalStrings_zh_TW.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.errorcode.missing_required_value=您缺失了必須填寫的參數值. errorcode.invalid_config_value=您的參數配寘錯誤. errorcode.get_topic_info_fail=獲取shard清單失敗. errorcode.write_datahub_fail=寫數據失敗. errorcode.schema_not_match=數據格式錯誤. ================================================ FILE: datahubreader/src/main/resources/job_config_template.json ================================================ { "name": "datahubreader", "parameter": { "endpoint":"", "accessId": "", "accessKey": "", "project": "", "topic": "", "beginDateTime": "20180913121019", "endDateTime": "20180913121119", "batchSize": 1024, "column": [] } } ================================================ FILE: datahubreader/src/main/resources/plugin.json ================================================ { "name": "datahubreader", "class": "com.alibaba.datax.plugin.reader.datahubreader.DatahubReader", "description": "datahub reader", "developer": "alibaba" } ================================================ FILE: datahubwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 datahubwriter 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.aliyun.datahub aliyun-sdk-datahub 2.21.6-public junit junit 4.12 test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: datahubwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin/writer/datahubwriter target/ datahubwriter-0.0.1-SNAPSHOT.jar plugin/writer/datahubwriter false plugin/writer/datahubwriter/libs runtime ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubClientHelper.java ================================================ package com.alibaba.datax.plugin.writer.datahubwriter; import org.apache.commons.lang3.StringUtils; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import com.aliyun.datahub.client.DatahubClient; import com.aliyun.datahub.client.DatahubClientBuilder; import com.aliyun.datahub.client.auth.Account; import com.aliyun.datahub.client.auth.AliyunAccount; import com.aliyun.datahub.client.common.DatahubConfig; import com.aliyun.datahub.client.http.HttpConfig; public class DatahubClientHelper { public static DatahubClient getDatahubClient(Configuration jobConfig) { String accessId = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_ID, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); String accessKey = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_KEY, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); String endpoint = jobConfig.getNecessaryValue(Key.CONFIG_KEY_ENDPOINT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); Account account = new AliyunAccount(accessId, accessKey); // 是否开启二进制传输,服务端2.12版本开始支持 boolean enableBinary = jobConfig.getBool("enableBinary", false); DatahubConfig datahubConfig = new DatahubConfig(endpoint, account, enableBinary); // HttpConfig可不设置,不设置时采用默认值 // 读写数据推荐打开网络传输 LZ4压缩 HttpConfig httpConfig = null; String httpConfigStr = jobConfig.getString("httpConfig"); if (StringUtils.isNotBlank(httpConfigStr)) { httpConfig = JSON.parseObject(httpConfigStr, new TypeReference() { }); } DatahubClientBuilder builder = DatahubClientBuilder.newBuilder().setDatahubConfig(datahubConfig); if (null != httpConfig) { builder.setHttpConfig(httpConfig); } DatahubClient datahubClient = builder.build(); return datahubClient; } } ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubWriter.java ================================================ package com.alibaba.datax.plugin.writer.datahubwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.DataXCaseEnvUtil; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.fastjson2.JSON; import com.aliyun.datahub.client.DatahubClient; import com.aliyun.datahub.client.model.FieldType; import com.aliyun.datahub.client.model.GetTopicResult; import com.aliyun.datahub.client.model.ListShardResult; import com.aliyun.datahub.client.model.PutErrorEntry; import com.aliyun.datahub.client.model.PutRecordsResult; import com.aliyun.datahub.client.model.RecordEntry; import com.aliyun.datahub.client.model.RecordSchema; import com.aliyun.datahub.client.model.RecordType; import com.aliyun.datahub.client.model.ShardEntry; import com.aliyun.datahub.client.model.ShardState; import com.aliyun.datahub.client.model.TupleRecordData; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.Random; import java.util.concurrent.Callable; public class DatahubWriter extends Writer { /** * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 *

* 整个 Writer 执行流程是: *

     * Job类init-->prepare-->split
     *
     *                          Task类init-->prepare-->startWrite-->post-->destroy
     *                          Task类init-->prepare-->startWrite-->post-->destroy
     *
     *                                                                            Job类post-->destroy
     * 
*/ public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private Configuration jobConfig = null; @Override public void init() { this.jobConfig = super.getPluginJobConf(); jobConfig.getNecessaryValue(Key.CONFIG_KEY_ENDPOINT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_ID, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); jobConfig.getNecessaryValue(Key.CONFIG_KEY_ACCESS_KEY, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); jobConfig.getNecessaryValue(Key.CONFIG_KEY_PROJECT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); jobConfig.getNecessaryValue(Key.CONFIG_KEY_TOPIC, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); } @Override public void prepare() { String project = jobConfig.getNecessaryValue(Key.CONFIG_KEY_PROJECT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); String topic = jobConfig.getNecessaryValue(Key.CONFIG_KEY_TOPIC, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); RecordType recordType = null; DatahubClient client = DatahubClientHelper.getDatahubClient(this.jobConfig); try { GetTopicResult getTopicResult = client.getTopic(project, topic); recordType = getTopicResult.getRecordType(); } catch (Exception e) { LOG.warn("get topic type error: {}", e.getMessage()); } if (null != recordType) { if (recordType == RecordType.BLOB) { throw DataXException.asDataXException(DatahubWriterErrorCode.WRITE_DATAHUB_FAIL, "DatahubWriter only support 'Tuple' RecordType now, but your RecordType is 'BLOB'"); } } } @Override public List split(int mandatoryNumber) { List configs = new ArrayList(); for (int i = 0; i < mandatoryNumber; ++i) { configs.add(jobConfig.clone()); } return configs; } @Override public void post() {} @Override public void destroy() {} } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory .getLogger(Task.class); private static final List FATAL_ERRORS_DEFAULT = Arrays.asList( "InvalidParameterM", "MalformedRecord", "INVALID_SHARDID", "NoSuchTopic", "NoSuchShard" ); private Configuration taskConfig; private DatahubClient client; private String project; private String topic; private List shards; private int maxCommitSize; private int maxRetryCount; private RecordSchema schema; private long retryInterval; private Random random; private List column; private List columnIndex; private boolean enableColumnConfig; private List fatalErrors; @Override public void init() { this.taskConfig = super.getPluginJobConf(); project = taskConfig.getNecessaryValue(Key.CONFIG_KEY_PROJECT, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); topic = taskConfig.getNecessaryValue(Key.CONFIG_KEY_TOPIC, DatahubWriterErrorCode.MISSING_REQUIRED_VALUE); maxCommitSize = taskConfig.getInt(Key.CONFIG_KEY_MAX_COMMIT_SIZE, 1024*1024); maxRetryCount = taskConfig.getInt(Key.CONFIG_KEY_MAX_RETRY_COUNT, 500); this.retryInterval = taskConfig.getInt(Key.RETRY_INTERVAL, 650); this.random = new Random(); this.column = this.taskConfig.getList(Key.CONFIG_KEY_COLUMN, String.class); // ["*"] if (null != this.column && 1 == this.column.size()) { if (StringUtils.equals("*", this.column.get(0))) { this.column = null; } } this.columnIndex = new ArrayList(); // 留个开关保平安 this.enableColumnConfig = this.taskConfig.getBool("enableColumnConfig", true); this.fatalErrors = this.taskConfig.getList("fatalErrors", Task.FATAL_ERRORS_DEFAULT, String.class); this.client = DatahubClientHelper.getDatahubClient(this.taskConfig); } @Override public void prepare() { final String shardIdConfig = this.taskConfig.getString(Key.CONFIG_KEY_SHARD_ID); this.shards = new ArrayList(); try { RetryUtil.executeWithRetry(new Callable() { @Override public Void call() throws Exception { ListShardResult result = client.listShard(project, topic); if (StringUtils.isNotBlank(shardIdConfig)) { shards.add(shardIdConfig); } else { for (ShardEntry shard : result.getShards()) { if (shard.getState() == ShardState.ACTIVE || shard.getState() == ShardState.OPENING) { shards.add(shard.getShardId()); } } } schema = client.getTopic(project, topic).getRecordSchema(); return null; } }, DataXCaseEnvUtil.getRetryTimes(5), DataXCaseEnvUtil.getRetryInterval(10000L), DataXCaseEnvUtil.getRetryExponential(false)); } catch (Exception e) { throw DataXException.asDataXException(DatahubWriterErrorCode.GET_TOPOIC_INFO_FAIL, "get topic info failed", e); } LOG.info("datahub topic {} shard to write: {}", this.topic, JSON.toJSONString(this.shards)); LOG.info("datahub topic {} has schema: {}", this.topic, JSON.toJSONString(this.schema)); // 根据 schmea 顺序 和用户配置的 column,计算写datahub的顺序关系,以支持列换序 // 后续统一使用 columnIndex 的顺位关系写 datahub int totalSize = this.schema.getFields().size(); if (null != this.column && !this.column.isEmpty() && this.enableColumnConfig) { for (String eachCol : this.column) { int indexFound = -1; for (int i = 0; i < totalSize; i++) { // warn: 大小写ignore if (StringUtils.equalsIgnoreCase(eachCol, this.schema.getField(i).getName())) { indexFound = i; break; } } if (indexFound >= 0) { this.columnIndex.add(indexFound); } else { throw DataXException.asDataXException(DatahubWriterErrorCode.SCHEMA_NOT_MATCH, String.format("can not find column %s in datahub topic %s", eachCol, this.topic)); } } } else { for (int i = 0; i < totalSize; i++) { this.columnIndex.add(i); } } } @Override public void startWrite(RecordReceiver recordReceiver) { Record record; List records = new ArrayList(); String shardId = null; if (1 == this.shards.size()) { shardId = shards.get(0); } else { shardId = shards.get(this.random.nextInt(shards.size())); } int commitSize = 0; try { while ((record = recordReceiver.getFromReader()) != null) { RecordEntry dhRecord = convertRecord(record, shardId); if (dhRecord != null) { records.add(dhRecord); } commitSize += record.getByteSize(); if (commitSize >= maxCommitSize) { commit(records); records.clear(); commitSize = 0; if (1 == this.shards.size()) { shardId = shards.get(0); } else { shardId = shards.get(this.random.nextInt(shards.size())); } } } if (commitSize > 0) { commit(records); } } catch (Exception e) { throw DataXException.asDataXException( DatahubWriterErrorCode.WRITE_DATAHUB_FAIL, e); } } @Override public void post() {} @Override public void destroy() {} private void commit(List records) throws InterruptedException { PutRecordsResult result = client.putRecords(project, topic, records); if (result.getFailedRecordCount() > 0) { for (int i = 0; i < maxRetryCount; ++i) { boolean limitExceededMessagePrinted = false; for (PutErrorEntry error : result.getPutErrorEntries()) { // 如果是 LimitExceeded 这样打印日志,不能每行记录打印一次了 if (StringUtils.equalsIgnoreCase("LimitExceeded", error.getErrorcode())) { if (!limitExceededMessagePrinted) { LOG.warn("write record error, request id: {}, error code: {}, error message: {}", result.getRequestId(), error.getErrorcode(), error.getMessage()); limitExceededMessagePrinted = true; } } else { LOG.error("write record error, request id: {}, error code: {}, error message: {}", result.getRequestId(), error.getErrorcode(), error.getMessage()); } if (this.fatalErrors.contains(error.getErrorcode())) { throw DataXException.asDataXException( DatahubWriterErrorCode.WRITE_DATAHUB_FAIL, error.getMessage()); } } if (this.retryInterval >= 0) { Thread.sleep(this.retryInterval); } else { Thread.sleep(new Random().nextInt(700) + 300); } result = client.putRecords(project, topic, result.getFailedRecords()); if (result.getFailedRecordCount() == 0) { return; } } throw DataXException.asDataXException( DatahubWriterErrorCode.WRITE_DATAHUB_FAIL, "write datahub failed"); } } private RecordEntry convertRecord(Record dxRecord, String shardId) { try { RecordEntry dhRecord = new RecordEntry(); dhRecord.setShardId(shardId); TupleRecordData data = new TupleRecordData(this.schema); for (int i = 0; i < this.columnIndex.size(); ++i) { int orderInSchema = this.columnIndex.get(i); FieldType type = this.schema.getField(orderInSchema).getType(); Column column = dxRecord.getColumn(i); switch (type) { case BIGINT: data.setField(orderInSchema, column.asLong()); break; case DOUBLE: data.setField(orderInSchema, column.asDouble()); break; case STRING: data.setField(orderInSchema, column.asString()); break; case BOOLEAN: data.setField(orderInSchema, column.asBoolean()); break; case TIMESTAMP: if (null == column.asDate()) { data.setField(orderInSchema, null); } else { data.setField(orderInSchema, column.asDate().getTime() * 1000); } break; case DECIMAL: // warn data.setField(orderInSchema, column.asBigDecimal()); break; case INTEGER: data.setField(orderInSchema, column.asLong()); break; case FLOAT: data.setField(orderInSchema, column.asDouble()); break; case TINYINT: data.setField(orderInSchema, column.asLong()); break; case SMALLINT: data.setField(orderInSchema, column.asLong()); break; default: throw DataXException.asDataXException( DatahubWriterErrorCode.SCHEMA_NOT_MATCH, String.format("does not support type: %s", type)); } } dhRecord.setRecordData(data); return dhRecord; } catch (Exception e) { super.getTaskPluginCollector().collectDirtyRecord(dxRecord, e, "convert recor failed"); } return null; } } } ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/DatahubWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.datahubwriter; import com.alibaba.datax.common.spi.ErrorCode; import com.alibaba.datax.common.util.MessageSource; public enum DatahubWriterErrorCode implements ErrorCode { MISSING_REQUIRED_VALUE("DatahubWriter-01", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.missing_required_value")), INVALID_CONFIG_VALUE("DatahubWriter-02", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.invalid_config_value")), GET_TOPOIC_INFO_FAIL("DatahubWriter-03", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.get_topic_info_fail")), WRITE_DATAHUB_FAIL("DatahubWriter-04", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.write_datahub_fail")), SCHEMA_NOT_MATCH("DatahubWriter-05", MessageSource.loadResourceBundle(DatahubWriterErrorCode.class).message("errorcode.schema_not_match")), ; private final String code; private final String description; private DatahubWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.datahubwriter; public final class Key { /** * 此处声明插件用到的需要插件使用者提供的配置项 */ public static final String CONFIG_KEY_ENDPOINT = "endpoint"; public static final String CONFIG_KEY_ACCESS_ID = "accessId"; public static final String CONFIG_KEY_ACCESS_KEY = "accessKey"; public static final String CONFIG_KEY_PROJECT = "project"; public static final String CONFIG_KEY_TOPIC = "topic"; public static final String CONFIG_KEY_WRITE_MODE = "mode"; public static final String CONFIG_KEY_SHARD_ID = "shardId"; public static final String CONFIG_KEY_MAX_COMMIT_SIZE = "maxCommitSize"; public static final String CONFIG_KEY_MAX_RETRY_COUNT = "maxRetryCount"; public static final String CONFIG_VALUE_SEQUENCE_MODE = "sequence"; public static final String CONFIG_VALUE_RANDOM_MODE = "random"; public final static String MAX_RETRY_TIME = "maxRetryTime"; public final static String RETRY_INTERVAL = "retryInterval"; public final static String CONFIG_KEY_COLUMN = "column"; } ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_en_US.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_ja_JP.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_CN.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF. ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_HK.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.errorcode.missing_required_value=您缺失了必須填寫的參數值. errorcode.invalid_config_value=您的參數配寘錯誤. errorcode.get_topic_info_fail=獲取shard清單失敗. errorcode.write_datahub_fail=寫數據失敗. errorcode.schema_not_match=數據格式錯誤. ================================================ FILE: datahubwriter/src/main/java/com/alibaba/datax/plugin/writer/datahubwriter/LocalStrings_zh_TW.properties ================================================ errorcode.missing_required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.invalid_config_value=\u60A8\u7684\u53C2\u6570\u914D\u7F6E\u9519\u8BEF. errorcode.get_topic_info_fail=\u83B7\u53D6shard\u5217\u8868\u5931\u8D25. errorcode.write_datahub_fail=\u5199\u6570\u636E\u5931\u8D25. errorcode.schema_not_match=\u6570\u636E\u683C\u5F0F\u9519\u8BEF.errorcode.missing_required_value=您缺失了必須填寫的參數值. errorcode.invalid_config_value=您的參數配寘錯誤. errorcode.get_topic_info_fail=獲取shard清單失敗. errorcode.write_datahub_fail=寫數據失敗. errorcode.schema_not_match=數據格式錯誤. ================================================ FILE: datahubwriter/src/main/resources/job_config_template.json ================================================ { "name": "datahubwriter", "parameter": { "endpoint":"", "accessId": "", "accessKey": "", "project": "", "topic": "", "mode": "random", "shardId": "", "maxCommitSize": 524288, "maxRetryCount": 500 } } ================================================ FILE: datahubwriter/src/main/resources/plugin.json ================================================ { "name": "datahubwriter", "class": "com.alibaba.datax.plugin.writer.datahubwriter.DatahubWriter", "description": "datahub writer", "developer": "alibaba" } ================================================ FILE: datax-example/datax-example-core/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-example 0.0.1-SNAPSHOT datax-example-core 8 8 UTF-8 ================================================ FILE: datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/ExampleContainer.java ================================================ package com.alibaba.datax.example; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.Engine; import com.alibaba.datax.example.util.ExampleConfigParser; /** * {@code Date} 2023/8/6 11:22 * * @author fuyouj */ public class ExampleContainer { /** * example对外暴露的启动入口 * 使用前最好看下 datax-example/doc/README.MD * @param jobPath 任务json绝对路径 */ public static void start(String jobPath) { Configuration configuration = ExampleConfigParser.parse(jobPath); Engine engine = new Engine(); engine.start(configuration); } } ================================================ FILE: datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/Main.java ================================================ package com.alibaba.datax.example; import com.alibaba.datax.example.util.PathUtil; /** * @author fuyouj */ public class Main { /** * 1.在example模块pom文件添加你依赖的的调试插件, * 你可以直接打开本模块的pom文件,参考是如何引入streamreader,streamwriter * 2. 在此处指定你的job文件 */ public static void main(String[] args) { String classPathJobPath = "/job/stream2stream.json"; String absJobPath = PathUtil.getAbsolutePathFromClassPath(classPathJobPath); ExampleContainer.start(absJobPath); } } ================================================ FILE: datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/util/ExampleConfigParser.java ================================================ package com.alibaba.datax.example.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.util.ConfigParser; import com.alibaba.datax.core.util.FrameworkErrorCode; import com.alibaba.datax.core.util.container.CoreConstant; import java.io.File; import java.io.IOException; import java.net.URL; import java.nio.file.Paths; import java.util.*; /** * @author fuyouj */ public class ExampleConfigParser { private static final String CORE_CONF = "/example/conf/core.json"; private static final String PLUGIN_DESC_FILE = "plugin.json"; /** * 指定Job配置路径,ConfigParser会解析Job、Plugin、Core全部信息,并以Configuration返回 * 不同于Core的ConfigParser,这里的core,plugin 不依赖于编译后的datax.home,而是扫描程序编译后的target目录 */ public static Configuration parse(final String jobPath) { Configuration configuration = ConfigParser.parseJobConfig(jobPath); configuration.merge(coreConfig(), false); Map pluginTypeMap = new HashMap<>(); String readerName = configuration.getString(CoreConstant.DATAX_JOB_CONTENT_READER_NAME); String writerName = configuration.getString(CoreConstant.DATAX_JOB_CONTENT_WRITER_NAME); pluginTypeMap.put(readerName, "reader"); pluginTypeMap.put(writerName, "writer"); Configuration pluginsDescConfig = parsePluginsConfig(pluginTypeMap); configuration.merge(pluginsDescConfig, false); return configuration; } private static Configuration parsePluginsConfig(Map pluginTypeMap) { Configuration configuration = Configuration.newDefault(); //最初打算通过user.dir获取工作目录来扫描插件, //但是user.dir在不同有一些不确定性,所以废弃了这个选择 for (File basePackage : runtimeBasePackages()) { if (pluginTypeMap.isEmpty()) { break; } scanPluginByPackage(basePackage, configuration, basePackage.listFiles(), pluginTypeMap); } if (!pluginTypeMap.isEmpty()) { String failedPlugin = pluginTypeMap.keySet().toString(); String message = "\nplugin %s load failed :ry to analyze the reasons from the following aspects.。\n" + "1: Check if the name of the plugin is spelled correctly, and verify whether DataX supports this plugin\n" + "2:Verify if the tag has been added under section in the pom file of the relevant plugin.\n" + " src/main/resources\n" + " \n" + " **/*.*\n" + " \n" + " true\n" + " \n [Refer to the streamreader pom file] \n" + "3: Check that the datax-yourPlugin-example module imported your test plugin"; message = String.format(message, failedPlugin); throw DataXException.asDataXException(FrameworkErrorCode.PLUGIN_INIT_ERROR, message); } return configuration; } /** * 通过classLoader获取程序编译的输出目录 * * @return File[/datax-example/target/classes,xxReader/target/classes,xxWriter/target/classes] */ private static File[] runtimeBasePackages() { List basePackages = new ArrayList<>(); ClassLoader classLoader = Thread.currentThread().getContextClassLoader(); Enumeration resources = null; try { resources = classLoader.getResources(""); } catch (IOException e) { throw DataXException.asDataXException(e.getMessage()); } while (resources.hasMoreElements()) { URL resource = resources.nextElement(); File file = new File(resource.getFile()); if (file.isDirectory()) { basePackages.add(file); } } return basePackages.toArray(new File[0]); } /** * @param packageFile 编译出来的target/classes根目录 便于找到插件时设置插件的URL目录,设置根目录是最保险的方式 * @param configuration pluginConfig * @param files 待扫描文件 * @param needPluginTypeMap 需要的插件 */ private static void scanPluginByPackage(File packageFile, Configuration configuration, File[] files, Map needPluginTypeMap) { if (files == null) { return; } for (File file : files) { if (file.isFile() && PLUGIN_DESC_FILE.equals(file.getName())) { Configuration pluginDesc = Configuration.from(file); String descPluginName = pluginDesc.getString("name", ""); if (needPluginTypeMap.containsKey(descPluginName)) { String type = needPluginTypeMap.get(descPluginName); configuration.merge(parseOnePlugin(packageFile.getAbsolutePath(), type, descPluginName, pluginDesc), false); needPluginTypeMap.remove(descPluginName); } } else { scanPluginByPackage(packageFile, configuration, file.listFiles(), needPluginTypeMap); } } } private static Configuration parseOnePlugin(String packagePath, String pluginType, String pluginName, Configuration pluginDesc) { //设置path 兼容jarLoader的加载方式URLClassLoader pluginDesc.set("path", packagePath); Configuration pluginConfInJob = Configuration.newDefault(); pluginConfInJob.set( String.format("plugin.%s.%s", pluginType, pluginName), pluginDesc.getInternal()); return pluginConfInJob; } private static Configuration coreConfig() { try { URL resource = ExampleConfigParser.class.getResource(CORE_CONF); return Configuration.from(Paths.get(resource.toURI()).toFile()); } catch (Exception ignore) { throw DataXException.asDataXException("Failed to load the configuration file core.json. " + "Please check whether /example/conf/core.json exists!"); } } } ================================================ FILE: datax-example/datax-example-core/src/main/java/com/alibaba/datax/example/util/PathUtil.java ================================================ package com.alibaba.datax.example.util; import com.alibaba.datax.common.exception.DataXException; import java.net.URI; import java.net.URISyntaxException; import java.net.URL; import java.nio.file.Paths; /** * @author fuyouj */ public class PathUtil { public static String getAbsolutePathFromClassPath(String path) { URL resource = PathUtil.class.getResource(path); try { assert resource != null; URI uri = resource.toURI(); return Paths.get(uri).toString(); } catch (NullPointerException | URISyntaxException e) { throw DataXException.asDataXException("path error,please check whether the path is correct"); } } } ================================================ FILE: datax-example/datax-example-core/src/main/resources/example/conf/core.json ================================================ { "entry": { "jvm": "-Xms1G -Xmx1G", "environment": {} }, "common": { "column": { "datetimeFormat": "yyyy-MM-dd HH:mm:ss", "timeFormat": "HH:mm:ss", "dateFormat": "yyyy-MM-dd", "extraFormats":["yyyyMMdd"], "timeZone": "GMT+8", "encoding": "utf-8" } }, "core": { "dataXServer": { "address": "http://localhost:7001/api", "timeout": 10000, "reportDataxLog": false, "reportPerfLog": false }, "transport": { "channel": { "class": "com.alibaba.datax.core.transport.channel.memory.MemoryChannel", "speed": { "byte": -1, "record": -1 }, "flowControlInterval": 20, "capacity": 512, "byteCapacity": 67108864 }, "exchanger": { "class": "com.alibaba.datax.core.plugin.BufferedRecordExchanger", "bufferSize": 32 } }, "container": { "job": { "reportInterval": 10000 }, "taskGroup": { "channel": 5 }, "trace": { "enable": "false" } }, "statistics": { "collector": { "plugin": { "taskClass": "com.alibaba.datax.core.statistics.plugin.task.StdoutPluginCollector", "maxDirtyNumber": 10 } } } } } ================================================ FILE: datax-example/datax-example-core/src/test/java/com/alibaba/datax/example/util/PathUtilTest.java ================================================ package com.alibaba.datax.example.util; import org.junit.Assert; import org.junit.Test; /** * {@code Author} FuYouJ * {@code Date} 2023/8/19 21:38 */ public class PathUtilTest { @Test public void testParseClassPathFile() { String path = "/pathTest.json"; String absolutePathFromClassPath = PathUtil.getAbsolutePathFromClassPath(path); Assert.assertNotNull(absolutePathFromClassPath); } } ================================================ FILE: datax-example/datax-example-core/src/test/resources/pathTest.json ================================================ {} ================================================ FILE: datax-example/datax-example-neo4j/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-example 0.0.1-SNAPSHOT datax-example-neo4j 8 8 UTF-8 1.17.6 4.4.9 com.alibaba.datax datax-example-core 0.0.1-SNAPSHOT org.testcontainers testcontainers ${test.container.version} com.alibaba.datax neo4jwriter 0.0.1-SNAPSHOT com.alibaba.datax datax-example-streamreader 0.0.1-SNAPSHOT ================================================ FILE: datax-example/datax-example-neo4j/src/test/java/com/alibaba/datax/example/neo4j/StreamReader2Neo4jWriterTest.java ================================================ package com.alibaba.datax.example.neo4j; import com.alibaba.datax.example.ExampleContainer; import com.alibaba.datax.example.util.PathUtil; import org.junit.After; import org.junit.Assert; import org.junit.Before; import org.junit.Test; import org.neo4j.driver.*; import org.neo4j.driver.types.Node; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.testcontainers.containers.GenericContainer; import org.testcontainers.containers.Network; import org.testcontainers.containers.output.Slf4jLogConsumer; import org.testcontainers.lifecycle.Startables; import org.testcontainers.shaded.org.awaitility.Awaitility; import org.testcontainers.utility.DockerImageName; import org.testcontainers.utility.DockerLoggerFactory; import java.net.URI; import java.util.Arrays; import java.util.concurrent.TimeUnit; import java.util.stream.Stream; /** * {@code Author} FuYouJ * {@code Date} 2023/8/19 21:48 */ public class StreamReader2Neo4jWriterTest { private static final Logger LOGGER = LoggerFactory.getLogger(StreamReader2Neo4jWriterTest.class); private static final String CONTAINER_IMAGE = "neo4j:5.9.0"; private static final String CONTAINER_HOST = "neo4j-host"; private static final int HTTP_PORT = 7474; private static final int BOLT_PORT = 7687; private static final String CONTAINER_NEO4J_USERNAME = "neo4j"; private static final String CONTAINER_NEO4J_PASSWORD = "Test@12343"; private static final URI CONTAINER_URI = URI.create("neo4j://localhost:" + BOLT_PORT); protected static final Network NETWORK = Network.newNetwork(); private GenericContainer container; protected Driver neo4jDriver; protected Session neo4jSession; private static final int CHANNEL = 5; private static final int READER_NUM = 10; @Before public void init() { DockerImageName imageName = DockerImageName.parse(CONTAINER_IMAGE); container = new GenericContainer<>(imageName) .withNetwork(NETWORK) .withNetworkAliases(CONTAINER_HOST) .withExposedPorts(HTTP_PORT, BOLT_PORT) .withEnv( "NEO4J_AUTH", CONTAINER_NEO4J_USERNAME + "/" + CONTAINER_NEO4J_PASSWORD) .withEnv("apoc.export.file.enabled", "true") .withEnv("apoc.import.file.enabled", "true") .withEnv("apoc.import.file.use_neo4j_config", "true") .withEnv("NEO4J_PLUGINS", "[\"apoc\"]") .withLogConsumer( new Slf4jLogConsumer( DockerLoggerFactory.getLogger(CONTAINER_IMAGE))); container.setPortBindings( Arrays.asList( String.format("%s:%s", HTTP_PORT, HTTP_PORT), String.format("%s:%s", BOLT_PORT, BOLT_PORT))); Startables.deepStart(Stream.of(container)).join(); LOGGER.info("container started"); Awaitility.given() .ignoreExceptions() .await() .atMost(30, TimeUnit.SECONDS) .untilAsserted(this::initConnection); } //在neo4jWriter模块使用Example测试整个job,方便发现整个流程的代码问题 @Test public void streamReader2Neo4j() { deleteHistoryIfExist(); String path = "/streamreader2neo4j.json"; String jobPath = PathUtil.getAbsolutePathFromClassPath(path); ExampleContainer.start(jobPath); //根据channel和reader的mock数据,校验结果集是否符合预期 verifyWriteResult(); } private void deleteHistoryIfExist() { String query = "match (n:StreamReader) return n limit 1"; String delete = "match (n:StreamReader) delete n"; if (neo4jSession.run(query).hasNext()) { neo4jSession.run(delete); } } private void verifyWriteResult() { int total = CHANNEL * READER_NUM; String query = "match (n:StreamReader) return n"; Result run = neo4jSession.run(query); int count = 0; while (run.hasNext()) { Record record = run.next(); Node node = record.get("n").asNode(); if (node.hasLabel("StreamReader")) { count++; } } Assert.assertEquals(count, total); } @After public void destroy() { if (neo4jSession != null) { neo4jSession.close(); } if (neo4jDriver != null) { neo4jDriver.close(); } if (container != null) { container.close(); } } private void initConnection() { neo4jDriver = GraphDatabase.driver( CONTAINER_URI, AuthTokens.basic(CONTAINER_NEO4J_USERNAME, CONTAINER_NEO4J_PASSWORD)); neo4jSession = neo4jDriver.session(SessionConfig.forDatabase("neo4j")); } } ================================================ FILE: datax-example/datax-example-neo4j/src/test/resources/streamreader2neo4j.json ================================================ { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "sliceRecordCount": 10, "column": [ { "type": "string", "value": "StreamReader" }, { "type": "string", "value": "1997" } ] } }, "writer": { "name": "neo4jWriter", "parameter": { "uri": "bolt://localhost:7687", "username":"neo4j", "password":"Test@12343", "database":"neo4j", "cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ", "batchDataVariableName": "batch", "batchSize": "3", "properties": [ { "name": "Label", "type": "string" }, { "name": "id", "type": "STRING" } ] } } } ], "setting": { "speed": { "channel": 5 } } } } ================================================ FILE: datax-example/datax-example-streamreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-example 0.0.1-SNAPSHOT datax-example-streamreader 8 8 UTF-8 com.alibaba.datax datax-example-core 0.0.1-SNAPSHOT com.alibaba.datax streamreader 0.0.1-SNAPSHOT com.alibaba.datax streamwriter 0.0.1-SNAPSHOT ================================================ FILE: datax-example/datax-example-streamreader/src/test/java/com/alibaba/datax/example/streamreader/StreamReader2StreamWriterTest.java ================================================ package com.alibaba.datax.example.streamreader; import com.alibaba.datax.example.ExampleContainer; import com.alibaba.datax.example.util.PathUtil; import org.junit.Test; /** * {@code Author} FuYouJ * {@code Date} 2023/8/14 20:16 */ public class StreamReader2StreamWriterTest { @Test public void testStreamReader2StreamWriter() { String path = "/stream2stream.json"; String jobPath = PathUtil.getAbsolutePathFromClassPath(path); ExampleContainer.start(jobPath); } } ================================================ FILE: datax-example/datax-example-streamreader/src/test/resources/stream2stream.json ================================================ { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "sliceRecordCount": 10, "column": [ { "type": "long", "value": "10" }, { "type": "string", "value": "hello,你好,世界-DataX" } ] } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 5 } } } } ================================================ FILE: datax-example/doc/README.md ================================================ ## [DataX-Example]调试datax插件的模块 ### 为什么要开发这个模块 一般使用DataX启动数据同步任务是从datax.py 脚本开始,获取程序datax包目录设置到系统变量datax.home里,此后系统核心插件的加载,配置初始化均依赖于变量datax.home,这带来了一些麻烦,以一次本地 DeBug streamreader 插件为例。 - maven 打包 datax 生成 datax 目录 - 在 IDE 中 设置系统环境变量 datax.home,或者在Engine启动类中硬编码设置datax.home。 - 修改插件 streamreader 代码 - 再次 maven 打包,使JarLoader 能够加载到最新的 streamreader 代码。 - 调试代码 在以上步骤中,打包完全不必要且最耗时,等待打包也最煎熬。 所以我编写一个新的模块(datax-example),此模块特用于本地调试和复现 BUG。如果模块顺利编写完成,那么以上流程将被简化至两步。 - 修改插件 streamreader 代码。 - 调试代码 img ### 目录结构 该目录结构演示了如何使用datax-example-core编写测试用例,和校验代码流程。 img ### 实现原理 - 不修改原有的ConfigParer,使用新的ExampleConfigParser,仅用于example模块。他不依赖datax.home,而是依赖ide编译后的target目录 - 将ide的target目录作为每个插件的目录类加载目录。 ![img](img/img02.png) ### 如何使用 1.修改插件的pom文件,做如下改动。以streamreader为例。
改动前 ```xml maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} ``` 改动后 ```xml src/main/resources **/*.* true maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} ``` #### 在测试模块模块使用 参考datax-example/datax-example-streamreader的StreamReader2StreamWriterTest.java ```java public class StreamReader2StreamWriterTest { @Test public void testStreamReader2StreamWriter() { String path = "/stream2stream.json"; String jobPath = PathUtil.getAbsolutePathFromClassPath(path); ExampleContainer.start(jobPath); } } ``` 参考datax-example/datax-example-neo4j的StreamReader2Neo4jWriterTest ```java public class StreamReader2Neo4jWriterTest{ @Test public void streamReader2Neo4j() { deleteHistoryIfExist(); String path = "/streamreader2neo4j.json"; String jobPath = PathUtil.getAbsolutePathFromClassPath(path); ExampleContainer.start(jobPath); //根据channel和reader的mock数据,校验结果集是否符合预期 verifyWriteResult(); } } ``` ================================================ FILE: datax-example/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT datax-example pom datax-example-core datax-example-streamreader datax-example-neo4j 8 8 UTF-8 4.13.2 com.alibaba.datax datax-common 0.0.1-SNAPSHOT com.alibaba.datax datax-core 0.0.1-SNAPSHOT junit junit ${junit4.version} test src/main/resources **/*.* true maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} ================================================ FILE: dataxPluginDev.md ================================================ # DataX插件开发宝典 本文面向DataX插件开发人员,尝试尽可能全面地阐述开发一个DataX插件所经过的历程,力求消除开发者的困惑,让插件开发变得简单。 ## 一、开发之前 > 路走对了,就不怕远。✓ > 路走远了,就不管对不对。✕ 当你打开这篇文档,想必已经不用在此解释什么是`DataX`了。那下一个问题便是: ### `DataX`为什么要使用插件机制? 从设计之初,`DataX`就把异构数据源同步作为自身的使命,为了应对不同数据源的差异、同时提供一致的同步原语和扩展能力,`DataX`自然而然地采用了`框架` + `插件` 的模式: - 插件只需关心数据的读取或者写入本身。 - 而同步的共性问题,比如:类型转换、性能、统计,则交由框架来处理。 作为插件开发人员,则需要关注两个问题: 1. 数据源本身的读写数据正确性。 2. 如何与框架沟通、合理正确地使用框架。 ### 开工前需要想明白的问题 就插件本身而言,希望在您动手coding之前,能够回答我们列举的这些问题,不然路走远了发现没走对,就尴尬了。 ## 二、插件视角看框架 ### 逻辑执行模型 插件开发者不用关心太多,基本只需要关注特定系统读和写,以及自己的代码在逻辑上是怎样被执行的,哪一个方法是在什么时候被调用的。在此之前,需要明确以下概念: - `Job`: `Job`是DataX用以描述从一个源头到一个目的端的同步作业,是DataX数据同步的最小业务单元。比如:从一张mysql的表同步到odps的一个表的特定分区。 - `Task`: `Task`是为最大化而把`Job`拆分得到的最小执行单元。比如:读一张有1024个分表的mysql分库分表的`Job`,拆分成1024个读`Task`,用若干个并发执行。 - `TaskGroup`: 描述的是一组`Task`集合。在同一个`TaskGroupContainer`执行下的`Task`集合称之为`TaskGroup` - `JobContainer`: `Job`执行器,负责`Job`全局拆分、调度、前置语句和后置语句等工作的工作单元。类似Yarn中的JobTracker - `TaskGroupContainer`: `TaskGroup`执行器,负责执行一组`Task`的工作单元,类似Yarn中的TaskTracker。 简而言之, **`Job`拆分成`Task`,在分别在框架提供的容器中执行,插件只需要实现`Job`和`Task`两部分逻辑**。 ### 物理执行模型 框架为插件提供物理上的执行能力(线程)。`DataX`框架有三种运行模式: - `Standalone`: 单进程运行,没有外部依赖。 - `Local`: 单进程运行,统计信息、错误信息汇报到集中存储。 - `Distrubuted`: 分布式多进程运行,依赖`DataX Service`服务。 当然,上述三种模式对插件的编写而言没有什么区别,你只需要避开一些小错误,插件就能够在单机/分布式之间无缝切换了。 当`JobContainer`和`TaskGroupContainer`运行在同一个进程内时,就是单机模式(`Standalone`和`Local`);当它们分布在不同的进程中执行时,就是分布式(`Distributed`)模式。 是不是很简单? ### 编程接口 那么,`Job`和`Task`的逻辑应是怎么对应到具体的代码中的? 首先,插件的入口类必须扩展`Reader`或`Writer`抽象类,并且实现分别实现`Job`和`Task`两个内部抽象类,`Job`和`Task`的实现必须是 **内部类** 的形式,原因见 **加载原理** 一节。以Reader为例: ```java public class SomeReader extends Reader { public static class Job extends Reader.Job { @Override public void init() { } @Override public void prepare() { } @Override public List split(int adviceNumber) { return null; } @Override public void post() { } @Override public void destroy() { } } public static class Task extends Reader.Task { @Override public void init() { } @Override public void prepare() { } @Override public void startRead(RecordSender recordSender) { } @Override public void post() { } @Override public void destroy() { } } } ``` `Job`接口功能如下: - `init`: Job对象初始化工作,此时可以通过`super.getPluginJobConf()`获取与本插件相关的配置。读插件获得配置中`reader`部分,写插件获得`writer`部分。 - `prepare`: 全局准备工作,比如odpswriter清空目标表。 - `split`: 拆分`Task`。参数`adviceNumber`框架建议的拆分数,一般是运行时所配置的并发度。值返回的是`Task`的配置列表。 - `post`: 全局的后置工作,比如mysqlwriter同步完影子表后的rename操作。 - `destroy`: Job对象自身的销毁工作。 `Task`接口功能如下: - `init`:Task对象的初始化。此时可以通过`super.getPluginJobConf()`获取与本`Task`相关的配置。这里的配置是`Job`的`split`方法返回的配置列表中的其中一个。 - `prepare`:局部的准备工作。 - `startRead`: 从数据源读数据,写入到`RecordSender`中。`RecordSender`会把数据写入连接Reader和Writer的缓存队列。 - `startWrite`:从`RecordReceiver`中读取数据,写入目标数据源。`RecordReceiver`中的数据来自Reader和Writer之间的缓存队列。 - `post`: 局部的后置工作。 - `destroy`: Task象自身的销毁工作。 需要注意的是: - `Job`和`Task`之间一定不能有共享变量,因为分布式运行时不能保证共享变量会被正确初始化。两者之间只能通过配置文件进行依赖。 - `prepare`和`post`在`Job`和`Task`中都存在,插件需要根据实际情况确定在什么地方执行操作。 框架按照如下的顺序执行`Job`和`Task`的接口: ![DataXReaderWriter (2)](https://github.com/alibaba/DataX/blob/master/images/plugin_dev_guide_1.png) 上图中,黄色表示`Job`部分的执行阶段,蓝色表示`Task`部分的执行阶段,绿色表示框架执行阶段。 相关类关系如下: ![DataX](https://github.com/alibaba/DataX/blob/master/images/plugin_dev_guide_2.png) ### 插件定义 代码写好了,有没有想过框架是怎么找到插件的入口类的?框架是如何加载插件的呢? 在每个插件的项目中,都有一个`plugin.json`文件,这个文件定义了插件的相关信息,包括入口类。例如: ```json { "name": "mysqlwriter", "class": "com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter", "description": "Use Jdbc connect to database, execute insert sql.", "developer": "alibaba" } ``` - `name`: 插件名称,大小写敏感。框架根据用户在配置文件中指定的名称来搜寻插件。 **十分重要** 。 - `class`: 入口类的全限定名称,框架通过反射插件入口类的实例。**十分重要** 。 - `description`: 描述信息。 - `developer`: 开发人员。 ### 打包发布 `DataX`使用`assembly`打包,`assembly`的使用方法请咨询谷哥或者度娘。打包命令如下: ```bash mvn clean package -DskipTests assembly:assembly ``` `DataX`插件需要遵循统一的目录结构: ``` ${DATAX_HOME} |-- bin | `-- datax.py |-- conf | |-- core.json | `-- logback.xml |-- lib | `-- datax-core-dependencies.jar `-- plugin |-- reader | `-- mysqlreader | |-- libs | | `-- mysql-reader-plugin-dependencies.jar | |-- mysqlreader-0.0.1-SNAPSHOT.jar | `-- plugin.json `-- writer |-- mysqlwriter | |-- libs | | `-- mysql-writer-plugin-dependencies.jar | |-- mysqlwriter-0.0.1-SNAPSHOT.jar | `-- plugin.json |-- oceanbasewriter `-- odpswriter ``` - `${DATAX_HOME}/bin`: 可执行程序目录。 - `${DATAX_HOME}/conf`: 框架配置目录。 - `${DATAX_HOME}/lib`: 框架依赖库目录。 - `${DATAX_HOME}/plugin`: 插件目录。 插件目录分为`reader`和`writer`子目录,读写插件分别存放。插件目录规范如下: - `${PLUGIN_HOME}/libs`: 插件的依赖库。 - `${PLUGIN_HOME}/plugin-name-version.jar`: 插件本身的jar。 - `${PLUGIN_HOME}/plugin.json`: 插件描述文件。 尽管框架加载插件时,会把`${PLUGIN_HOME}`下所有的jar放到`classpath`,但还是推荐依赖库的jar和插件本身的jar分开存放。 注意: **插件的目录名字必须和`plugin.json`中定义的插件名称一致。** ### 配置文件 `DataX`使用`json`作为配置文件的格式。一个典型的`DataX`任务配置如下: ```json { "job": { "content": [ { "reader": { "name": "odpsreader", "parameter": { "accessKey": "", "accessId": "", "column": [""], "isCompress": "", "odpsServer": "", "partition": [ "" ], "project": "", "table": "", "tunnelServer": "" } }, "writer": { "name": "oraclewriter", "parameter": { "username": "", "password": "", "column": ["*"], "connection": [ { "jdbcUrl": "", "table": [ "" ] } ] } } } ] } } ``` `DataX`框架有`core.json`配置文件,指定了框架的默认行为。任务的配置里头可以指定框架中已经存在的配置项,而且具有更高的优先级,会覆盖`core.json`中的默认值。 **配置中`job.content.reader.parameter`的value部分会传给`Reader.Job`;`job.content.writer.parameter`的value部分会传给`Writer.Job`** ,`Reader.Job`和`Writer.Job`可以通过`super.getPluginJobConf()`来获取。 `DataX`框架支持对特定的配置项进行RSA加密,例子中以`*`开头的项目便是加密后的值。 **配置项加密解密过程对插件是透明,插件仍然以不带`*`的key来查询配置和操作配置项** 。 #### 如何设计配置参数 > 配置文件的设计是插件开发的第一步! 任务配置中`reader`和`writer`下`parameter`部分是插件的配置参数,插件的配置参数应当遵循以下原则: - 驼峰命名:所有配置项采用驼峰命名法,首字母小写,单词首字母大写。 - 正交原则:配置项必须正交,功能没有重复,没有潜规则。 - 富类型:合理使用json的类型,减少无谓的处理逻辑,减少出错的可能。 - 使用正确的数据类型。比如,bool类型的值使用`true`/`false`,而非`"yes"`/`"true"`/`0`等。 - 合理使用集合类型,比如,用数组替代有分隔符的字符串。 - 类似通用:遵守同一类型的插件的习惯,比如关系型数据库的`connection`参数都是如下结构: ```json { "connection": [ { "table": [ "table_1", "table_2" ], "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/database_1", "jdbc:mysql://127.0.0.2:3306/database_1_slave" ] }, { "table": [ "table_3", "table_4" ], "jdbcUrl": [ "jdbc:mysql://127.0.0.3:3306/database_2", "jdbc:mysql://127.0.0.4:3306/database_2_slave" ] } ] } ``` - ... #### 如何使用`Configuration`类 为了简化对json的操作,`DataX`提供了简单的DSL配合`Configuration`类使用。 `Configuration`提供了常见的`get`, `带类型get`,`带默认值get`,`set`等读写配置项的操作,以及`clone`, `toJSON`等方法。配置项读写操作都需要传入一个`path`做为参数,这个`path`就是`DataX`定义的DSL。语法有两条: 1. 子map用`.key`表示,`path`的第一个点省略。 2. 数组元素用`[index]`表示。 比如操作如下json: ```json { "a": { "b": { "c": 2 }, "f": [ 1, 2, { "g": true, "h": false }, 4 ] }, "x": 4 } ``` 比如调用`configuration.get(path)`方法,当path为如下值的时候得到的结果为: - `x`:`4` - `a.b.c`:`2` - `a.b.c.d`:`null` - `a.b.f[0]`:`1` - `a.b.f[2].g`:`true` 注意,因为插件看到的配置只是整个配置的一部分。使用`Configuration`对象时,需要注意当前的根路径是什么。 更多`Configuration`的操作请参考`ConfigurationTest.java`。 ### 插件数据传输 跟一般的`生产者-消费者`模式一样,`Reader`插件和`Writer`插件之间也是通过`channel`来实现数据的传输的。`channel`可以是内存的,也可能是持久化的,插件不必关心。插件通过`RecordSender`往`channel`写入数据,通过`RecordReceiver`从`channel`读取数据。 `channel`中的一条数据为一个`Record`的对象,`Record`中可以放多个`Column`对象,这可以简单理解为数据库中的记录和列。 `Record`有如下方法: ```java public interface Record { // 加入一个列,放在最后的位置 void addColumn(Column column); // 在指定下标处放置一个列 void setColumn(int i, final Column column); // 获取一个列 Column getColumn(int i); // 转换为json String String toString(); // 获取总列数 int getColumnNumber(); // 计算整条记录在内存中占用的字节数 int getByteSize(); } ``` 因为`Record`是一个接口,`Reader`插件首先调用`RecordSender.createRecord()`创建一个`Record`实例,然后把`Column`一个个添加到`Record`中。 `Writer`插件调用`RecordReceiver.getFromReader()`方法获取`Record`,然后把`Column`遍历出来,写入目标存储中。当`Reader`尚未退出,传输还在进行时,如果暂时没有数据`RecordReceiver.getFromReader()`方法会阻塞直到有数据。如果传输已经结束,会返回`null`,`Writer`插件可以据此判断是否结束`startWrite`方法。 `Column`的构造和操作,我们在《类型转换》一节介绍。 ### 类型转换 为了规范源端和目的端类型转换操作,保证数据不失真,DataX支持六种内部数据类型: - `Long`:定点数(Int、Short、Long、BigInteger等)。 - `Double`:浮点数(Float、Double、BigDecimal(无限精度)等)。 - `String`:字符串类型,底层不限长,使用通用字符集(Unicode)。 - `Date`:日期类型。 - `Bool`:布尔值。 - `Bytes`:二进制,可以存放诸如MP3等非结构化数据。 对应地,有`DateColumn`、`LongColumn`、`DoubleColumn`、`BytesColumn`、`StringColumn`和`BoolColumn`六种`Column`的实现。 `Column`除了提供数据相关的方法外,还提供一系列以`as`开头的数据类型转换转换方法。 ![Columns](https://github.com/alibaba/DataX/blob/master/images/plugin_dev_guide_3.png) DataX的内部类型在实现上会选用不同的java类型: | 内部类型 | 实现类型 | 备注 | | ----- | -------- | ----- | | Date | java.util.Date | | | Long | java.math.BigInteger| 使用无限精度的大整数,保证不失真 | | Double| java.lang.String| 用String表示,保证不失真 | | Bytes | byte[]| | | String| java.lang.String | | | Bool | java.lang.Boolean | | 类型之间相互转换的关系如下: | from\to | Date | Long | Double | Bytes | String | Bool | | ----- | -------- | ----- | ------ | -------- | ----- | ----- | | Date | - | 使用毫秒时间戳 | 不支持 | 不支持 | 使用系统配置的date/time/datetime格式转换 | 不支持 | | Long | 作为毫秒时间戳构造Date | - | BigInteger转为BigDecimal,然后BigDecimal.doubleValue() | 不支持 | BigInteger.toString() | 0为false,否则true | | Double | 不支持 | 内部String构造BigDecimal,然后BigDecimal.longValue() | - | 不支持 | 直接返回内部String | | | Bytes | 不支持 | 不支持 | 不支持 | - | 按照`common.column.encoding`配置的编码转换为String,默认`utf-8` | 不支持 | | String | 按照配置的date/time/datetime/extra格式解析 | 用String构造BigDecimal,然后取longValue() | 用String构造BigDecimal,然后取doubleValue(),会正确处理`NaN`/`Infinity`/`-Infinity` | 按照`common.column.encoding`配置的编码转换为byte[],默认`utf-8` | - | "true"为`true`, "false"为`false`,大小写不敏感。其他字符串不支持 | | Bool | 不支持 | `true`为`1L`,否则`0L` | | `true`为`1.0`,否则`0.0` | 不支持 | - | ### 脏数据处理 #### 什么是脏数据? 目前主要有三类脏数据: 1. Reader读到不支持的类型、不合法的值。 1. 不支持的类型转换,比如:`Bytes`转换为`Date`。 2. 写入目标端失败,比如:写mysql整型长度超长。 #### 如何处理脏数据 在`Reader.Task`和`Writer.Task`中,通过`AbstractTaskPlugin.getTaskPluginCollector()`可以拿到一个`TaskPluginCollector`,它提供了一系列`collectDirtyRecord`的方法。当脏数据出现时,只需要调用合适的`collectDirtyRecord`方法,把被认为是脏数据的`Record`传入即可。 用户可以在任务的配置中指定脏数据限制条数或者百分比限制,当脏数据超出限制时,框架会结束同步任务,退出。插件需要保证脏数据都被收集到,其他工作交给框架就好。 ### 加载原理 1. 框架扫描`plugin/reader`和`plugin/writer`目录,加载每个插件的`plugin.json`文件。 2. 以`plugin.json`文件中`name`为key,索引所有的插件配置。如果发现重名的插件,框架会异常退出。 3. 用户在插件中在`reader`/`writer`配置的`name`字段指定插件名字。框架根据插件的类型(`reader`/`writer`)和插件名称去插件的路径下扫描所有的jar,加入`classpath`。 4. 根据插件配置中定义的入口类,框架通过反射实例化对应的`Job`和`Task`对象。 ### 编写测试用例 1. 在datax-example工程下新建新的插件测试模块,调用`ExampleContainer.start(jobPath)`方法来检测你的代码逻辑是否正确。[datax-example使用](https://github.com/alibaba/DataX/blob/master/datax-example/doc/README.md) ## 三、Last but not Least > 文档是工程师的良知。 每个插件都必须在`DataX`官方wiki中有一篇文档,文档需要包括但不限于以下内容: 1. **快速介绍**:介绍插件的使用场景,特点等。 2. **实现原理**:介绍插件实现的底层原理,比如`mysqlwriter`通过`insert into`和`replace into`来实现插入,`tair`插件通过tair客户端实现写入。 3. **配置说明** - 给出典型场景下的同步任务的json配置文件。 - 介绍每个参数的含义、是否必选、默认值、取值范围和其他约束。 4. **类型转换** - 插件是如何在实际的存储类型和`DataX`的内部类型之间进行转换的。 - 以及是否存在特殊处理。 5. **性能报告** - 软硬件环境,系统版本,java版本,CPU、内存等。 - 数据特征,记录大小等。 - 测试参数集(多组),系统参数(比如并发数),插件参数(比如batchSize) - 不同参数下同步速度(Rec/s, MB/s),机器负载(load, cpu)等,对数据源压力(load, cpu, mem等)。 6. **约束限制**:是否存在其他的使用限制条件。 7. **FAQ**:用户经常会遇到的问题。 ================================================ FILE: dorisreader/doc/dorisreader.md ================================================ # DorisReader 插件文档 ___ ## 1 快速介绍 DorisReader插件实现了从Doris读取数据。在底层实现上,DorisReader通过JDBC连接远程Doris数据库,并执行相应的sql语句将数据从doris库中SELECT出来。 ## 2 实现原理 简而言之,DorisReader通过JDBC连接器连接到远程的Doris数据库,并根据用户配置的信息生成查询SELECT SQL语句,然后发送到远程Doris数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,DorisReader将其拼接为SQL语句发送到Doris数据库;对于用户配置querySql信息,DorisReader直接将其发送到Doris数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Doris数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { "channel": 3 }, "errorLimit": { "record": 0, "percentage": 0.02 } }, "content": [ { "reader": { "name": "dorisreader", "parameter": { "username": "root", "password": "root", "column": [ "id", "name" ], "splitPk": "db_id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:Doris://127.0.0.1:9030/database" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print":true } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "speed": { "channel":1 } }, "content": [ { "reader": { "name": "dorisreader", "parameter": { "username": "root", "password": "root", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;", "select db_id,on_line_flag from db_info where db_id >= 10;" ], "jdbcUrl": [ "jdbc:Doris://127.0.0.1:9030/database" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,DorisReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,DorisReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 * 必选:是
* 默认值:无
* **username** * 描述:数据源的用户名
* 必选:是
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,DorisReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照Doris SQL语法格式: ["id", "\`table\`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] id为普通列名,\`table\`为包含保留字的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 * 必选:是
* 默认值:无
* **splitPk** * 描述:DorisReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,DorisReader将报错! 如果splitPk不填写,包括不提供splitPk或者splitPk值为空,DataX视作使用单通道同步该表数据。 * 必选:否
* 默认值:空
* **where** * 描述:筛选条件,DorisReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
where条件可以有效地进行业务增量同步。如果不填写where语句,包括不提供where的key或者value,DataX均视作同步全量数据。 * 必选:否
* 默认值:无
* **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
`当用户配置querySql时,DorisReader直接忽略table、column、where条件的配置`,querySql优先级大于table、column、where选项。 * 必选:否
* 默认值:无
### 3.3 类型转换 目前DorisReader支持大部分Doris类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出DorisReaderr针对Doris类型转换列表: | DataX 内部类型| doris 数据类型 | | -------- |-------------------------------------------------------| | Long | int, tinyint, smallint, int, bigint,Largint | | Double | float, double, decimal | | String | varchar, char, text, string, map, json, array, struct | | Date | date, datetime | | Boolean | Boolean | 请注意: * `tinyint(1) DataX视作为整形`。 ================================================ FILE: dorisreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT dorisreader dorisreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java ${mysql.driver.version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: dorisreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/dorisreader target/ dorisreader-0.0.1-SNAPSHOT.jar plugin/reader/dorisreader false plugin/reader/dorisreader/libs runtime ================================================ FILE: dorisreader/src/main/java/com/alibaba/datax/plugin/reader/dorisreader/DorisReader.java ================================================ package com.alibaba.datax.plugin.reader.dorisreader; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class DorisReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.Doris; public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private Configuration originalConfig = null; private CommonRdbmsReader.Job commonRdbmsReaderJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); Integer fetchSize = this.originalConfig.getInt(Constant.FETCH_SIZE,Integer.MIN_VALUE); this.originalConfig.set(Constant.FETCH_SIZE, fetchSize); this.commonRdbmsReaderJob = new CommonRdbmsReader.Job(DATABASE_TYPE); this.commonRdbmsReaderJob.init(this.originalConfig); } @Override public void preCheck(){ init(); this.commonRdbmsReaderJob.preCheck(this.originalConfig,DATABASE_TYPE); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderJob.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderJob.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderTask; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderTask = new CommonRdbmsReader.Task(DATABASE_TYPE,super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderTask.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(Constant.FETCH_SIZE); this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderTask.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); } } } ================================================ FILE: dorisreader/src/main/java/com/alibaba/datax/plugin/reader/dorisreader/DorisReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.dorisreader; import com.alibaba.datax.common.spi.ErrorCode; public enum DorisReaderErrorCode implements ErrorCode { ; private final String code; private final String description; private DorisReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: dorisreader/src/main/resources/plugin.json ================================================ { "name": "dorisreader", "class": "com.alibaba.datax.plugin.reader.dorisreader.DorisReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: dorisreader/src/main/resources/plugin_job_template.json ================================================ { "name": "dorisreader", "parameter": { "username": "", "password": "", "column": [], "connection": [ { "jdbcUrl": [], "table": [] } ], "where": "" } } ================================================ FILE: doriswriter/doc/doriswriter.md ================================================ # DorisWriter 插件文档 ## 1 快速介绍 DorisWriter支持将大批量数据写入Doris中。 ## 2 实现原理 DorisWriter 通过Doris原生支持Stream load方式导入数据, DorisWriter会将`reader`读取的数据进行缓存在内存中,拼接成Json文本,然后批量导入至Doris。 ## 3 功能说明 ### 3.1 配置样例 这里是一份从Stream读取数据后导入至Doris的配置文件。 ``` { "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": ["emp_no", "birth_date", "first_name","last_name","gender","hire_date"], "connection": [ { "jdbcUrl": ["jdbc:mysql://localhost:3306/demo"], "table": ["employees_1"] } ], "username": "root", "password": "xxxxx", "where": "" } }, "writer": { "name": "doriswriter", "parameter": { "loadUrl": ["172.16.0.13:8030"], "column": ["emp_no", "birth_date", "first_name","last_name","gender","hire_date"], "username": "root", "password": "xxxxxx", "postSql": ["select count(1) from all_employees_info"], "preSql": [], "flushInterval":30000, "connection": [ { "jdbcUrl": "jdbc:mysql://172.16.0.13:9030/demo", "selectedDatabase": "demo", "table": ["all_employees_info"] } ], "loadProps": { "format": "json", "strip_outer_array": true } } } } ], "setting": { "speed": { "channel": "1" } } } } ``` ### 3.2 参数说明 * **jdbcUrl** - 描述:Doris 的 JDBC 连接串,用户执行 preSql 或 postSQL。 - 必选:是 - 默认值:无 * **loadUrl** - 描述:作为 Stream Load 的连接目标。格式为 "ip:port"。其中 IP 是 FE 节点 IP,port 是 FE 节点的 http_port。可以填写多个,多个之间使用英文状态的分号隔开:`;`,doriswriter 将以轮询的方式访问。 - 必选:是 - 默认值:无 * **username** - 描述:访问Doris数据库的用户名 - 必选:是 - 默认值:无 * **password** - 描述:访问Doris数据库的密码 - 必选:否 - 默认值:空 * **connection.selectedDatabase** - 描述:需要写入的Doris数据库名称。 - 必选:是 - 默认值:无 * **connection.table** - 描述:需要写入的Doris表名称。 - 必选:是 - 默认值:无 * **column** - 描述:目的表**需要写入数据**的字段,这些字段将作为生成的 Json 数据的字段名。字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 - 必选:是 - 默认值:否 * **preSql** - 描述:写入数据到目的表前,会先执行这里的标准语句。 - 必选:否 - 默认值:无 * **postSql** - 描述:写入数据到目的表后,会执行这里的标准语句。 - 必选:否 - 默认值:无 * **maxBatchRows** - 描述:每批次导入数据的最大行数。和 **batchSize** 共同控制每批次的导入数量。每批次数据达到两个阈值之一,即开始导入这一批次的数据。 - 必选:否 - 默认值:500000 * **batchSize** - 描述:每批次导入数据的最大数据量。和 **maxBatchRows** 共同控制每批次的导入数量。每批次数据达到两个阈值之一,即开始导入这一批次的数据。 - 必选:否 - 默认值:104857600 * **maxRetries** - 描述:每批次导入数据失败后的重试次数。 - 必选:否 - 默认值:0 * **labelPrefix** - 描述:每批次导入任务的 label 前缀。最终的 label 将有 `labelPrefix + UUID` 组成全局唯一的 label,确保数据不会重复导入 - 必选:否 - 默认值:`datax_doris_writer_` * **loadProps** - 描述:StreamLoad 的请求参数,详情参照StreamLoad介绍页面。[Stream load - Apache Doris](https://doris.apache.org/zh-CN/docs/data-operate/import/import-way/stream-load-manual) 这里包括导入的数据格式:format等,导入数据格式默认我们使用csv,支持JSON,具体可以参照下面类型转换部分,也可以参照上面Stream load 官方信息 - 必选:否 - 默认值:无 ### 类型转换 默认传入的数据均会被转为字符串,并以`\t`作为列分隔符,`\n`作为行分隔符,组成`csv`文件进行StreamLoad导入操作。 默认是csv格式导入,如需更改列分隔符, 则正确配置 `loadProps` 即可: ```json "loadProps": { "column_separator": "\\x01", "line_delimiter": "\\x02" } ``` 如需更改导入格式为`json`, 则正确配置 `loadProps` 即可: ```json "loadProps": { "format": "json", "strip_outer_array": true } ``` 更多信息请参照 Doris 官网:[Stream load - Apache Doris](https://doris.apache.org/zh-CN/docs/data-operate/import/import-way/stream-load-manual) ================================================ FILE: doriswriter/doc/mysql2doris.json ================================================ { "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": ["k1", "k2", "k3"], "connection": [ { "jdbcUrl": ["jdbc:mysql://192.168.10.10:3306/db1"], "table": ["t1"] } ], "username": "root", "password": "", "where": "" } }, "writer": { "name": "doriswriter", "parameter": { "loadUrl": ["192.168.1.1:8030"], "loadProps": {}, "database": "db1", "column": ["k1", "k2", "k3"], "username": "root", "password": "", "postSql": [], "preSql": [], "connection": [ { "jdbcUrl":"jdbc:mysql://192.168.1.1:9030/", "table":["xxx"], "selectedDatabase":"xxxx" } ] } } } ], "setting": { "speed": { "channel": "1" } } } } ================================================ FILE: doriswriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 doriswriter doriswriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java ${mysql.driver.version} org.apache.httpcomponents httpclient 4.5.13 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: doriswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/doriswriter target/ doriswriter-0.0.1-SNAPSHOT.jar plugin/writer/doriswriter false plugin/writer/doriswriter/libs runtime ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DelimiterParser.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.google.common.base.Strings; import java.io.StringWriter; public class DelimiterParser { private static final String HEX_STRING = "0123456789ABCDEF"; public static String parse(String sp, String dSp) throws RuntimeException { if ( Strings.isNullOrEmpty(sp)) { return dSp; } if (!sp.toUpperCase().startsWith("\\X")) { return sp; } String hexStr = sp.substring(2); // check hex str if (hexStr.isEmpty()) { throw new RuntimeException("Failed to parse delimiter: `Hex str is empty`"); } if (hexStr.length() % 2 != 0) { throw new RuntimeException("Failed to parse delimiter: `Hex str length error`"); } for (char hexChar : hexStr.toUpperCase().toCharArray()) { if (HEX_STRING.indexOf(hexChar) == -1) { throw new RuntimeException("Failed to parse delimiter: `Hex str format error`"); } } // transform to separator StringWriter writer = new StringWriter(); for (byte b : hexStrToBytes(hexStr)) { writer.append((char) b); } return writer.toString(); } private static byte[] hexStrToBytes(String hexStr) { String upperHexStr = hexStr.toUpperCase(); int length = upperHexStr.length() / 2; char[] hexChars = upperHexStr.toCharArray(); byte[] bytes = new byte[length]; for (int i = 0; i < length; i++) { int pos = i * 2; bytes[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1])); } return bytes; } private static byte charToByte(char c) { return (byte) HEX_STRING.indexOf(c); } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisBaseCodec.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.alibaba.datax.common.element.Column; public class DorisBaseCodec { protected String convertionField( Column col) { if (null == col.getRawData() || Column.Type.NULL == col.getType()) { return null; } if ( Column.Type.BOOL == col.getType()) { return String.valueOf(col.asLong()); } if ( Column.Type.BYTES == col.getType()) { byte[] bts = (byte[])col.getRawData(); long value = 0; for (int i = 0; i < bts.length; i++) { value += (bts[bts.length - i - 1] & 0xffL) << (8 * i); } return String.valueOf(value); } return col.asString(); } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCodec.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.alibaba.datax.common.element.Record; import java.io.Serializable; public interface DorisCodec extends Serializable { String codec( Record row); } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCodecFactory.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import java.util.Map; public class DorisCodecFactory { public DorisCodecFactory (){ } public static DorisCodec createCodec( Keys writerOptions) { if ( Keys.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { Map props = writerOptions.getLoadProps(); return new DorisCsvCodec (null == props || !props.containsKey("column_separator") ? null : String.valueOf(props.get("column_separator"))); } if ( Keys.StreamLoadFormat.JSON.equals(writerOptions.getStreamLoadFormat())) { return new DorisJsonCodec (writerOptions.getColumns()); } throw new RuntimeException("Failed to create row serializer, unsupported `format` from stream load properties."); } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.alibaba.datax.common.element.Record; public class DorisCsvCodec extends DorisBaseCodec implements DorisCodec { private static final long serialVersionUID = 1L; private final String columnSeparator; public DorisCsvCodec ( String sp) { this.columnSeparator = DelimiterParser.parse(sp, "\t"); } @Override public String codec( Record row) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < row.getColumnNumber(); i++) { String value = convertionField(row.getColumn(i)); sb.append(null == value ? "\\N" : value); if (i < row.getColumnNumber() - 1) { sb.append(columnSeparator); } } return sb.toString(); } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisJsonCodec.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.alibaba.datax.common.element.Record; import com.alibaba.fastjson2.JSON; import java.util.HashMap; import java.util.List; import java.util.Map; public class DorisJsonCodec extends DorisBaseCodec implements DorisCodec { private static final long serialVersionUID = 1L; private final List fieldNames; public DorisJsonCodec ( List fieldNames) { this.fieldNames = fieldNames; } @Override public String codec( Record row) { if (null == fieldNames) { return ""; } Map rowMap = new HashMap<> (fieldNames.size()); int idx = 0; for (String fieldName : fieldNames) { rowMap.put(fieldName, convertionField(row.getColumn(idx))); idx++; } return JSON.toJSONString(rowMap); } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisStreamLoadObserver.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.alibaba.fastjson2.JSON; import org.apache.commons.codec.binary.Base64; import org.apache.http.HttpEntity; import org.apache.http.HttpHeaders; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.methods.HttpPut; import org.apache.http.entity.ByteArrayEntity; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.DefaultRedirectStrategy; import org.apache.http.impl.client.HttpClientBuilder; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.net.HttpURLConnection; import java.net.URL; import java.nio.ByteBuffer; import java.nio.charset.StandardCharsets; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors; public class DorisStreamLoadObserver { private static final Logger LOG = LoggerFactory.getLogger(DorisStreamLoadObserver.class); private Keys options; private long pos; private static final String RESULT_FAILED = "Fail"; private static final String RESULT_LABEL_EXISTED = "Label Already Exists"; private static final String LAEBL_STATE_VISIBLE = "VISIBLE"; private static final String LAEBL_STATE_COMMITTED = "COMMITTED"; private static final String RESULT_LABEL_PREPARE = "PREPARE"; private static final String RESULT_LABEL_ABORTED = "ABORTED"; private static final String RESULT_LABEL_UNKNOWN = "UNKNOWN"; public DorisStreamLoadObserver ( Keys options){ this.options = options; } public void streamLoad(WriterTuple data) throws Exception { String host = getLoadHost(); if(host == null){ throw new IOException ("load_url cannot be empty, or the host cannot connect.Please check your configuration."); } String loadUrl = new StringBuilder(host) .append("/api/") .append(options.getDatabase()) .append("/") .append(options.getTable()) .append("/_stream_load") .toString(); LOG.info("Start to join batch data: rows[{}] bytes[{}] label[{}].", data.getRows().size(), data.getBytes(), data.getLabel()); Map loadResult = put(loadUrl, data.getLabel(), addRows(data.getRows(), data.getBytes().intValue())); LOG.info("StreamLoad response :{}",JSON.toJSONString(loadResult)); final String keyStatus = "Status"; if (null == loadResult || !loadResult.containsKey(keyStatus)) { throw new IOException("Unable to flush data to Doris: unknown result status."); } LOG.debug("StreamLoad response:{}",JSON.toJSONString(loadResult)); if (RESULT_FAILED.equals(loadResult.get(keyStatus))) { throw new IOException( new StringBuilder("Failed to flush data to Doris.\n").append(JSON.toJSONString(loadResult)).toString() ); } else if (RESULT_LABEL_EXISTED.equals(loadResult.get(keyStatus))) { LOG.debug("StreamLoad response:{}",JSON.toJSONString(loadResult)); checkStreamLoadState(host, data.getLabel()); } } private void checkStreamLoadState(String host, String label) throws IOException { int idx = 0; while(true) { try { TimeUnit.SECONDS.sleep(Math.min(++idx, 5)); } catch (InterruptedException ex) { break; } try (CloseableHttpClient httpclient = HttpClients.createDefault()) { HttpGet httpGet = new HttpGet(new StringBuilder(host).append("/api/").append(options.getDatabase()).append("/get_load_state?label=").append(label).toString()); httpGet.setHeader("Authorization", getBasicAuthHeader(options.getUsername(), options.getPassword())); httpGet.setHeader("Connection", "close"); try (CloseableHttpResponse resp = httpclient.execute(httpGet)) { HttpEntity respEntity = getHttpEntity(resp); if (respEntity == null) { throw new IOException(String.format("Failed to flush data to Doris, Error " + "could not get the final state of label[%s].\n", label), null); } Map result = (Map)JSON.parse(EntityUtils.toString(respEntity)); String labelState = (String)result.get("data"); if (null == labelState) { throw new IOException(String.format("Failed to flush data to Doris, Error " + "could not get the final state of label[%s]. response[%s]\n", label, EntityUtils.toString(respEntity)), null); } LOG.info(String.format("Checking label[%s] state[%s]\n", label, labelState)); switch(labelState) { case LAEBL_STATE_VISIBLE: case LAEBL_STATE_COMMITTED: return; case RESULT_LABEL_PREPARE: continue; case RESULT_LABEL_ABORTED: throw new DorisWriterExcetion (String.format("Failed to flush data to Doris, Error " + "label[%s] state[%s]\n", label, labelState), null, true); case RESULT_LABEL_UNKNOWN: default: throw new IOException(String.format("Failed to flush data to Doris, Error " + "label[%s] state[%s]\n", label, labelState), null); } } } } } private byte[] addRows(List rows, int totalBytes) { if (Keys.StreamLoadFormat.CSV.equals(options.getStreamLoadFormat())) { Map props = (options.getLoadProps() == null ? new HashMap<> () : options.getLoadProps()); byte[] lineDelimiter = DelimiterParser.parse((String)props.get("line_delimiter"), "\n").getBytes(StandardCharsets.UTF_8); ByteBuffer bos = ByteBuffer.allocate(totalBytes + rows.size() * lineDelimiter.length); for (byte[] row : rows) { bos.put(row); bos.put(lineDelimiter); } return bos.array(); } if (Keys.StreamLoadFormat.JSON.equals(options.getStreamLoadFormat())) { ByteBuffer bos = ByteBuffer.allocate(totalBytes + (rows.isEmpty() ? 2 : rows.size() + 1)); bos.put("[".getBytes(StandardCharsets.UTF_8)); byte[] jsonDelimiter = ",".getBytes(StandardCharsets.UTF_8); boolean isFirstElement = true; for (byte[] row : rows) { if (!isFirstElement) { bos.put(jsonDelimiter); } bos.put(row); isFirstElement = false; } bos.put("]".getBytes(StandardCharsets.UTF_8)); return bos.array(); } throw new RuntimeException("Failed to join rows data, unsupported `format` from stream load properties:"); } private Map put(String loadUrl, String label, byte[] data) throws IOException { LOG.info(String.format("Executing stream load to: '%s', size: '%s'", loadUrl, data.length)); final HttpClientBuilder httpClientBuilder = HttpClients.custom() .setRedirectStrategy(new DefaultRedirectStrategy () { @Override protected boolean isRedirectable(String method) { return true; } }); try ( CloseableHttpClient httpclient = httpClientBuilder.build()) { HttpPut httpPut = new HttpPut(loadUrl); httpPut.removeHeaders(HttpHeaders.CONTENT_LENGTH); httpPut.removeHeaders(HttpHeaders.TRANSFER_ENCODING); List cols = options.getColumns(); if (null != cols && !cols.isEmpty() && Keys.StreamLoadFormat.CSV.equals(options.getStreamLoadFormat())) { httpPut.setHeader("columns", String.join(",", cols.stream().map(f -> String.format("`%s`", f)).collect(Collectors.toList()))); } if (null != options.getLoadProps()) { for (Map.Entry entry : options.getLoadProps().entrySet()) { httpPut.setHeader(entry.getKey(), String.valueOf(entry.getValue())); } } httpPut.setHeader("Expect", "100-continue"); httpPut.setHeader("label", label); httpPut.setHeader("two_phase_commit", "false"); httpPut.setHeader("Authorization", getBasicAuthHeader(options.getUsername(), options.getPassword())); httpPut.setEntity(new ByteArrayEntity(data)); httpPut.setConfig(RequestConfig.custom().setRedirectsEnabled(true).build()); try ( CloseableHttpResponse resp = httpclient.execute(httpPut)) { HttpEntity respEntity = getHttpEntity(resp); if (respEntity == null) return null; return (Map)JSON.parse(EntityUtils.toString(respEntity)); } } } private String getBasicAuthHeader(String username, String password) { String auth = username + ":" + password; byte[] encodedAuth = Base64.encodeBase64(auth.getBytes(StandardCharsets.UTF_8)); return new StringBuilder("Basic ").append(new String(encodedAuth)).toString(); } private HttpEntity getHttpEntity(CloseableHttpResponse resp) { int code = resp.getStatusLine().getStatusCode(); if (200 != code) { LOG.warn("Request failed with code:{}", code); return null; } HttpEntity respEntity = resp.getEntity(); if (null == respEntity) { LOG.warn("Request failed with empty response."); return null; } return respEntity; } private String getLoadHost() { List hostList = options.getLoadUrlList(); Collections.shuffle(hostList); String host = new StringBuilder("http://").append(hostList.get((0))).toString(); if (checkConnection(host)){ return host; } return null; } private boolean checkConnection(String host) { try { URL url = new URL(host); HttpURLConnection co = (HttpURLConnection) url.openConnection(); co.setConnectTimeout(5000); co.connect(); co.disconnect(); return true; } catch (Exception e1) { e1.printStackTrace(); return false; } } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisUtil.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.druid.sql.parser.ParserException; import com.google.common.base.Strings; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.util.ArrayList; import java.util.Collections; import java.util.List; /** * jdbc util */ public class DorisUtil { private static final Logger LOG = LoggerFactory.getLogger(DorisUtil.class); private DorisUtil() {} public static List getDorisTableColumns( Connection conn, String databaseName, String tableName) { String currentSql = String.format("SELECT COLUMN_NAME FROM `information_schema`.`COLUMNS` WHERE `TABLE_SCHEMA` = '%s' AND `TABLE_NAME` = '%s' ORDER BY `ORDINAL_POSITION` ASC;", databaseName, tableName); List columns = new ArrayList<> (); ResultSet rs = null; try { rs = DBUtil.query(conn, currentSql); while (DBUtil.asyncResultSetNext(rs)) { String colName = rs.getString("COLUMN_NAME"); columns.add(colName); } return columns; } catch (Exception e) { throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); } finally { DBUtil.closeDBResources(rs, null, null); } } public static List renderPreOrPostSqls(List preOrPostSqls, String tableName) { if (null == preOrPostSqls) { return Collections.emptyList(); } List renderedSqls = new ArrayList<>(); for (String sql : preOrPostSqls) { if (! Strings.isNullOrEmpty(sql)) { renderedSqls.add(sql.replace(Constant.TABLE_NAME_PLACEHOLDER, tableName)); } } return renderedSqls; } public static void executeSqls(Connection conn, List sqls) { Statement stmt = null; String currentSql = null; try { stmt = conn.createStatement(); for (String sql : sqls) { currentSql = sql; DBUtil.executeSqlWithoutResultSet(stmt, sql); } } catch (Exception e) { throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); } finally { DBUtil.closeDBResources(null, stmt, null); } } public static void preCheckPrePareSQL( Keys options) { String table = options.getTable(); List preSqls = options.getPreSqlList(); List renderedPreSqls = DorisUtil.renderPreOrPostSqls(preSqls, table); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { LOG.info("Begin to preCheck preSqls:[{}].", String.join(";", renderedPreSqls)); for (String sql : renderedPreSqls) { try { DBUtil.sqlValid(sql, DataBaseType.MySql); } catch ( ParserException e) { throw RdbmsException.asPreSQLParserException(DataBaseType.MySql,e,sql); } } } } public static void preCheckPostSQL( Keys options) { String table = options.getTable(); List postSqls = options.getPostSqlList(); List renderedPostSqls = DorisUtil.renderPreOrPostSqls(postSqls, table); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { LOG.info("Begin to preCheck postSqls:[{}].", String.join(";", renderedPostSqls)); for(String sql : renderedPostSqls) { try { DBUtil.sqlValid(sql, DataBaseType.MySql); } catch (ParserException e){ throw RdbmsException.asPostSQLParserException(DataBaseType.MySql,e,sql); } } } } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriter.java ================================================ // Licensed to the Apache Software Foundation (ASF) under one // or more contributor license agreements. See the NOTICE file // distributed with this work for additional information // regarding copyright ownership. The ASF licenses this file // to you under the Apache License, Version 2.0 (the // "License"); you may not use this file except in compliance // with the License. You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, // software distributed under the License is distributed on an // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, either express or implied. See the License for the // specific language governing permissions and limitations // under the License. package com.alibaba.datax.plugin.writer.doriswriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.List; /** * doris data writer */ public class DorisWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig = null; private Keys options; @Override public void init() { this.originalConfig = super.getPluginJobConf(); options = new Keys (super.getPluginJobConf()); options.doPretreatment(); } @Override public void preCheck(){ this.init(); DorisUtil.preCheckPrePareSQL(options); DorisUtil.preCheckPostSQL(options); } @Override public void prepare() { String username = options.getUsername(); String password = options.getPassword(); String jdbcUrl = options.getJdbcUrl(); List renderedPreSqls = DorisUtil.renderPreOrPostSqls(options.getPreSqlList(), options.getTable()); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); LOG.info("Begin to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPreSqls), jdbcUrl); DorisUtil.executeSqls(conn, renderedPreSqls); DBUtil.closeDBResources(null, null, conn); } } @Override public List split(int mandatoryNumber) { List configurations = new ArrayList<>(mandatoryNumber); for (int i = 0; i < mandatoryNumber; i++) { configurations.add(originalConfig); } return configurations; } @Override public void post() { String username = options.getUsername(); String password = options.getPassword(); String jdbcUrl = options.getJdbcUrl(); List renderedPostSqls = DorisUtil.renderPreOrPostSqls(options.getPostSqlList(), options.getTable()); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); LOG.info("Start to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPostSqls), jdbcUrl); DorisUtil.executeSqls(conn, renderedPostSqls); DBUtil.closeDBResources(null, null, conn); } } @Override public void destroy() { } } public static class Task extends Writer.Task { private DorisWriterManager writerManager; private Keys options; private DorisCodec rowCodec; @Override public void init() { options = new Keys (super.getPluginJobConf()); if (options.isWildcardColumn()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, options.getJdbcUrl(), options.getUsername(), options.getPassword()); List columns = DorisUtil.getDorisTableColumns(conn, options.getDatabase(), options.getTable()); options.setInfoCchemaColumns(columns); } writerManager = new DorisWriterManager(options); rowCodec = DorisCodecFactory.createCodec(options); } @Override public void prepare() { } public void startWrite(RecordReceiver recordReceiver) { try { Record record; while ((record = recordReceiver.getFromReader()) != null) { if (record.getColumnNumber() != options.getColumns().size()) { throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "There is an error in the column configuration information. " + "This is because you have configured a task where the number of fields to be read from the source:%s " + "is not equal to the number of fields to be written to the destination table:%s. " + "Please check your configuration and make changes.", record.getColumnNumber(), options.getColumns().size())); } writerManager.writeRecord(rowCodec.codec(record)); } } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); } } @Override public void post() { try { writerManager.close(); } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); } } @Override public void destroy() {} @Override public boolean supportFailOver(){ return false; } } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriterExcetion.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import java.io.IOException; import java.util.Map; public class DorisWriterExcetion extends IOException { private final Map response; private boolean reCreateLabel; public DorisWriterExcetion ( String message, Map response) { super(message); this.response = response; } public DorisWriterExcetion ( String message, Map response, boolean reCreateLabel) { super(message); this.response = response; this.reCreateLabel = reCreateLabel; } public Map getFailedResponse() { return response; } public boolean needReCreateLabel() { return reCreateLabel; } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisWriterManager.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.google.common.base.Strings; import org.apache.commons.lang3.concurrent.BasicThreadFactory; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; import java.util.UUID; import java.util.concurrent.Executors; import java.util.concurrent.LinkedBlockingDeque; import java.util.concurrent.ScheduledExecutorService; import java.util.concurrent.ScheduledFuture; import java.util.concurrent.TimeUnit; public class DorisWriterManager { private static final Logger LOG = LoggerFactory.getLogger(DorisWriterManager.class); private final DorisStreamLoadObserver visitor; private final Keys options; private final List buffer = new ArrayList<> (); private int batchCount = 0; private long batchSize = 0; private volatile boolean closed = false; private volatile Exception flushException; private final LinkedBlockingDeque< WriterTuple > flushQueue; private ScheduledExecutorService scheduler; private ScheduledFuture scheduledFuture; public DorisWriterManager( Keys options) { this.options = options; this.visitor = new DorisStreamLoadObserver (options); flushQueue = new LinkedBlockingDeque<>(options.getFlushQueueLength()); this.startScheduler(); this.startAsyncFlushing(); } public void startScheduler() { stopScheduler(); this.scheduler = Executors.newScheduledThreadPool(1, new BasicThreadFactory.Builder().namingPattern("Doris-interval-flush").daemon(true).build()); this.scheduledFuture = this.scheduler.schedule(() -> { synchronized (DorisWriterManager.this) { if (!closed) { try { String label = createBatchLabel(); LOG.info(String.format("Doris interval Sinking triggered: label[%s].", label)); if (batchCount == 0) { startScheduler(); } flush(label, false); } catch (Exception e) { flushException = e; } } } }, options.getFlushInterval(), TimeUnit.MILLISECONDS); } public void stopScheduler() { if (this.scheduledFuture != null) { scheduledFuture.cancel(false); this.scheduler.shutdown(); } } public final synchronized void writeRecord(String record) throws IOException { checkFlushException(); try { byte[] bts = record.getBytes(StandardCharsets.UTF_8); buffer.add(bts); batchCount++; batchSize += bts.length; if (batchCount >= options.getBatchRows() || batchSize >= options.getBatchSize()) { String label = createBatchLabel(); LOG.debug(String.format("Doris buffer Sinking triggered: rows[%d] label[%s].", batchCount, label)); flush(label, false); } } catch (Exception e) { throw new IOException("Writing records to Doris failed.", e); } } public synchronized void flush(String label, boolean waitUtilDone) throws Exception { checkFlushException(); if (batchCount == 0) { if (waitUtilDone) { waitAsyncFlushingDone(); } return; } flushQueue.put(new WriterTuple (label, batchSize, new ArrayList<>(buffer))); if (waitUtilDone) { // wait the last flush waitAsyncFlushingDone(); } buffer.clear(); batchCount = 0; batchSize = 0; } public synchronized void close() { if (!closed) { closed = true; try { String label = createBatchLabel(); if (batchCount > 0) LOG.debug(String.format("Doris Sink is about to close: label[%s].", label)); flush(label, true); } catch (Exception e) { throw new RuntimeException("Writing records to Doris failed.", e); } } checkFlushException(); } public String createBatchLabel() { StringBuilder sb = new StringBuilder(); if (! Strings.isNullOrEmpty(options.getLabelPrefix())) { sb.append(options.getLabelPrefix()); } return sb.append(UUID.randomUUID().toString()) .toString(); } private void startAsyncFlushing() { // start flush thread Thread flushThread = new Thread(new Runnable(){ public void run() { while(true) { try { asyncFlush(); } catch (Exception e) { flushException = e; } } } }); flushThread.setDaemon(true); flushThread.start(); } private void waitAsyncFlushingDone() throws InterruptedException { // wait previous flushings for (int i = 0; i <= options.getFlushQueueLength(); i++) { flushQueue.put(new WriterTuple ("", 0l, null)); } checkFlushException(); } private void asyncFlush() throws Exception { WriterTuple flushData = flushQueue.take(); if (Strings.isNullOrEmpty(flushData.getLabel())) { return; } stopScheduler(); LOG.debug(String.format("Async stream load: rows[%d] bytes[%d] label[%s].", flushData.getRows().size(), flushData.getBytes(), flushData.getLabel())); for (int i = 0; i <= options.getMaxRetries(); i++) { try { // flush to Doris with stream load visitor.streamLoad(flushData); LOG.info(String.format("Async stream load finished: label[%s].", flushData.getLabel())); startScheduler(); break; } catch (Exception e) { LOG.warn("Failed to flush batch data to Doris, retry times = {}", i, e); if (i >= options.getMaxRetries()) { throw new IOException(e); } if (e instanceof DorisWriterExcetion && (( DorisWriterExcetion )e).needReCreateLabel()) { String newLabel = createBatchLabel(); LOG.warn(String.format("Batch label changed from [%s] to [%s]", flushData.getLabel(), newLabel)); flushData.setLabel(newLabel); } try { Thread.sleep(1000l * Math.min(i + 1, 10)); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); throw new IOException("Unable to flush, interrupted while doing another attempt", e); } } } } private void checkFlushException() { if (flushException != null) { throw new RuntimeException("Writing records to Doris failed.", flushException); } } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/Keys.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import java.io.Serializable; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class Keys implements Serializable { private static final long serialVersionUID = 1l; private static final int MAX_RETRIES = 3; private static final int BATCH_ROWS = 500000; private static final long DEFAULT_FLUSH_INTERVAL = 30000; private static final String LOAD_PROPS_FORMAT = "format"; public enum StreamLoadFormat { CSV, JSON; } private static final String USERNAME = "username"; private static final String PASSWORD = "password"; private static final String DATABASE = "connection[0].selectedDatabase"; private static final String TABLE = "connection[0].table[0]"; private static final String COLUMN = "column"; private static final String PRE_SQL = "preSql"; private static final String POST_SQL = "postSql"; private static final String JDBC_URL = "connection[0].jdbcUrl"; private static final String LABEL_PREFIX = "labelPrefix"; private static final String MAX_BATCH_ROWS = "maxBatchRows"; private static final String MAX_BATCH_SIZE = "batchSize"; private static final String FLUSH_INTERVAL = "flushInterval"; private static final String LOAD_URL = "loadUrl"; private static final String FLUSH_QUEUE_LENGTH = "flushQueueLength"; private static final String LOAD_PROPS = "loadProps"; private static final String DEFAULT_LABEL_PREFIX = "datax_doris_writer_"; private static final long DEFAULT_MAX_BATCH_SIZE = 90 * 1024 * 1024; //default 90M private final Configuration options; private List infoSchemaColumns; private List userSetColumns; private boolean isWildcardColumn; public Keys ( Configuration options) { this.options = options; this.userSetColumns = options.getList(COLUMN, String.class).stream().map(str -> str.replace("`", "")).collect(Collectors.toList()); if (1 == options.getList(COLUMN, String.class).size() && "*".trim().equals(options.getList(COLUMN, String.class).get(0))) { this.isWildcardColumn = true; } } public void doPretreatment() { validateRequired(); validateStreamLoadUrl(); } public String getJdbcUrl() { return options.getString(JDBC_URL); } public String getDatabase() { return options.getString(DATABASE); } public String getTable() { return options.getString(TABLE); } public String getUsername() { return options.getString(USERNAME); } public String getPassword() { return options.getString(PASSWORD); } public String getLabelPrefix() { String label = options.getString(LABEL_PREFIX); return null == label ? DEFAULT_LABEL_PREFIX : label; } public List getLoadUrlList() { return options.getList(LOAD_URL, String.class); } public List getColumns() { if (isWildcardColumn) { return this.infoSchemaColumns; } return this.userSetColumns; } public boolean isWildcardColumn() { return this.isWildcardColumn; } public void setInfoCchemaColumns(List cols) { this.infoSchemaColumns = cols; } public List getPreSqlList() { return options.getList(PRE_SQL, String.class); } public List getPostSqlList() { return options.getList(POST_SQL, String.class); } public Map getLoadProps() { return options.getMap(LOAD_PROPS); } public int getMaxRetries() { return MAX_RETRIES; } public int getBatchRows() { Integer rows = options.getInt(MAX_BATCH_ROWS); return null == rows ? BATCH_ROWS : rows; } public long getBatchSize() { Long size = options.getLong(MAX_BATCH_SIZE); return null == size ? DEFAULT_MAX_BATCH_SIZE : size; } public long getFlushInterval() { Long interval = options.getLong(FLUSH_INTERVAL); return null == interval ? DEFAULT_FLUSH_INTERVAL : interval; } public int getFlushQueueLength() { Integer len = options.getInt(FLUSH_QUEUE_LENGTH); return null == len ? 1 : len; } public StreamLoadFormat getStreamLoadFormat() { Map loadProps = getLoadProps(); if (null == loadProps) { return StreamLoadFormat.CSV; } if (loadProps.containsKey(LOAD_PROPS_FORMAT) && StreamLoadFormat.JSON.name().equalsIgnoreCase(String.valueOf(loadProps.get(LOAD_PROPS_FORMAT)))) { return StreamLoadFormat.JSON; } return StreamLoadFormat.CSV; } private void validateStreamLoadUrl() { List urlList = getLoadUrlList(); for (String host : urlList) { if (host.split(":").length < 2) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, "The format of loadUrl is not correct, please enter:[`fe_ip:fe_http_ip;fe_ip:fe_http_ip`]."); } } } private void validateRequired() { final String[] requiredOptionKeys = new String[]{ USERNAME, DATABASE, TABLE, COLUMN, LOAD_URL }; for (String optionKey : requiredOptionKeys) { options.getNecessaryValue(optionKey, DBUtilErrorCode.REQUIRED_VALUE); } } } ================================================ FILE: doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/WriterTuple.java ================================================ package com.alibaba.datax.plugin.writer.doriswriter; import java.util.List; public class WriterTuple { private String label; private Long bytes; private List rows; public WriterTuple ( String label, Long bytes, List rows){ this.label = label; this.rows = rows; this.bytes = bytes; } public String getLabel() { return label; } public void setLabel(String label) { this.label = label; } public Long getBytes() { return bytes; } public List getRows() { return rows; } } ================================================ FILE: doriswriter/src/main/resources/plugin.json ================================================ { "name": "doriswriter", "class": "com.alibaba.datax.plugin.writer.doriswriter.DorisWriter", "description": "apache doris writer plugin", "developer": "apche doris" } ================================================ FILE: doriswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "doriswriter", "parameter": { "username": "", "password": "", "column": [], "preSql": [], "postSql": [], "beLoadUrl": [], "loadUrl": [], "loadProps": {}, "connection": [ { "jdbcUrl": "", "selectedDatabase": "", "table": [] } ] } } ================================================ FILE: drdsreader/doc/drdsreader.md ================================================ # DrdsReader 插件文档 ___ ## 1 快速介绍 DrdsReader插件实现了从DRDS(分布式RDS)读取数据。在底层实现上,DrdsReader通过JDBC连接远程DRDS数据库,并执行相应的sql语句将数据从DRDS库中SELECT出来。 DRDS的插件目前DataX只适配了Mysql引擎的场景,DRDS对于DataX而言,就是一套分布式Mysql数据库,并且大部分通信协议遵守Mysql使用场景。 ## 2 实现原理 简而言之,DrdsReader通过JDBC连接器连接到远程的DRDS数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程DRDS数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,DrdsReader将其拼接为SQL语句发送到DRDS数据库。不同于普通的Mysql数据库,DRDS作为分布式数据库系统,无法适配所有Mysql的协议,包括复杂的Join等语句,DRDS暂时无法支持。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从DRDS数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte": 1048576 } //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { "name": "drdsReader", "parameter": { // 数据库连接用户名 "username": "root", // 数据库连接密码 "password": "root", "column": [ "id","name" ], "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/database" ] } ] } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print":true, } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { }, "content": [ { "reader": { "name": "drdsreader", "parameter": { "username": "root", "password": "root", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;" ], "jdbcUrl": [ "jdbc:drds://localhost:3306/database"] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述.注意,jdbcUrl必须包含在connection配置单元中。DRDSReader中关于jdbcUrl中JSON数组填写一个JDBC连接即可。 jdbcUrl按照Mysql官方规范,并可以填写连接附件控制信息。具体请参看[mysql官方文档](http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html)。 * 必选:是
* 默认值:无
* **username** * 描述:数据源的用户名
* 必选:是
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:是
* 默认值:无
* **table** * 描述:所选取需要抽取的表。注意,由于DRDS本身就是分布式数据源,因此填写多张表无意义。系统对多表不做校验。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照Mysql SQL语法格式: ["id", "\`table\`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] id为普通列名,\`table\`为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 column必须用户显示指定同步的列集合,不允许为空! * 必选:是
* 默认值:无
* **where** * 描述:筛选条件,DrdsReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。
。 where条件可以有效地进行业务增量同步。where条件不配置或者为空,视作全表同步数据。 * 必选:否
* 默认值:无
* **querySql** * 描述:暂时不支持配置querySql模式
### 3.3 类型转换 目前DrdsReader支持大部分DRDS类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出DrdsReader针对DRDS类型转换列表: | DataX 内部类型| DRDS 数据类型 | | -------- | ----- | | Long |int, tinyint, smallint, mediumint, int, bigint| | Double |float, double, decimal| | String |varchar, char, tinytext, text, mediumtext, longtext | | Date |date, datetime, timestamp, time, year | | Boolean |bit, bool | | Bytes |tinyblob, mediumblob, blob, longblob, varbinary | 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 * `类似Mysql,tinyint(1)视作整形`。 * `类似Mysql,bit类型读取目前是未定义状态。` ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: CREATE TABLE `tc_biz_vertical_test_0000` ( `biz_order_id` bigint(20) NOT NULL COMMENT 'id', `key_value` varchar(4000) NOT NULL COMMENT 'Key-value的内容', `gmt_create` datetime NOT NULL COMMENT '创建时间', `gmt_modified` datetime NOT NULL COMMENT '修改时间', `attribute_cc` int(11) DEFAULT NULL COMMENT '防止并发修改的标志', `value_type` int(11) NOT NULL DEFAULT '0' COMMENT '类型', `buyer_id` bigint(20) DEFAULT NULL COMMENT 'buyerid', `seller_id` bigint(20) DEFAULT NULL COMMENT 'seller_id', PRIMARY KEY (`biz_order_id`,`value_type`), KEY `idx_biz_vertical_gmtmodified` (`gmt_modified`) ) ENGINE=InnoDB DEFAULT CHARSET=gbk COMMENT='tc_biz_vertical' 单行记录类似于: biz_order_id: 888888888 key_value: ;orderIds:20148888888,2014888888813800; gmt_create: 2011-09-24 11:07:20 gmt_modified: 2011-10-24 17:56:34 attribute_cc: 1 value_type: 3 buyer_id: 8888888 seller_id: 1 #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 24核 Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz 2. mem: 48GB 3. net: 千兆双网卡 4. disc: DataX 数据不落磁盘,不统计此项 * DRDS数据库机器参数为: 1. cpu: 32核 Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz 2. mem: 256GB 3. net: 千兆双网卡 4. disc: BTWL419303E2800RGN INTEL SSDSC2BB800G4 D2010370 #### 4.1.3 DataX jvm 参数 -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数| 是否按照主键切分| DataX速度(Rec/s)| DataX机器运行负载|DB网卡流出流量(MB/s)|DB运行负载| |--------|--------| --------|--------|--------|--------|--------|--------| 说明: 1. 这里的单表,主键类型为 bigint(20),范围为:190247559466810-570722244711460,从主键范围划分看,数据分布均匀。 2. 对单表如果没有安装主键切分,那么配置通道个数不会提升速度,效果与1个通道一样。 #### 4.2.2 分表测试报告(2个分库,每个分库16张分表,共计32张分表) | 通道数| DataX速度(Rec/s)|DataX机器运行负载|DB网卡流出流量(MB/s)|DB运行负载| |--------| --------|--------|--------|--------|--------|--------| ## 5 约束限制 ### 5.1 一致性视图问题 DRDS本身属于分布式数据库,对外无法提供一致性的多库多表视图,不同于Mysql等单库单表同步,DRDSReader无法抽取同一个时间切片的分库分表快照信息,也就是说DataX DrdsReader抽取底层不同的分表将获取不同的分表快照,无法保证强一致性。 ### 5.2 数据库编码问题 DRDS本身的编码设置非常灵活,包括指定编码到库、表、字段级别,甚至可以均不同编码。优先级从高到低为字段、表、库、实例。我们不推荐数据库用户设置如此混乱的编码,最好在库级别就统一到UTF-8。 DrdsReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此DrdsReader不需用户指定编码,可以自动获取编码并转码。 对于DRDS底层写入编码和其设定的编码不一致的混乱情况,DrdsReader对此无法识别,对此也无法提供解决方案,对于这类情况,`导出有可能为乱码`。 ### 5.3 增量数据同步 DrdsReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种: * 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,DrdsReader只需要WHERE条件跟上一同步阶段时间戳即可。 * 对于新增流水型数据,DrdsReader可以WHERE条件后跟上一阶段最大自增ID即可。 对于业务上无字段区分新增、修改数据情况,DrdsReader也无法进行增量数据同步,只能同步全量数据。 ### 5.4 Sql安全性 DrdsReader提供querySql语句交给用户自己实现SELECT抽取语句,DrdsReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。 ## 6 FAQ *** **Q: DrdsReader同步报错,报错信息为XXX** A: 网络或者权限问题,请使用DRDS命令行测试: mysql -u -p -h -D -e "select * from <表名>" 如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 *** **Q: 我想同步DRDS增量数据,怎么配置?** A: DrdsReader必须业务支持增量字段DataX才能同步增量,例如在淘宝大部分业务表中,通过gmt_modified字段表征这条记录的最新修改时间,那么DataX DrdsReader只需要配置where条件为 ``` "where": "Date(add_time) = '2014-06-01'" ``` *** ================================================ FILE: drdsreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT drdsreader drdsreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java ${mysql.driver.version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: drdsreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/drdsreader target/ drdsreader-0.0.1-SNAPSHOT.jar plugin/reader/drdsreader false plugin/reader/drdsreader/libs runtime ================================================ FILE: drdsreader/src/main/java/com/alibaba/datax/plugin/reader/drdsreader/DrdsReader.java ================================================ package com.alibaba.datax.plugin.reader.drdsreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.TableExpandUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class DrdsReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.MySql; private static final Logger LOG = LoggerFactory.getLogger(DrdsReader.class); public static class Job extends Reader.Job { private Configuration originalConfig = null; private CommonRdbmsReader.Job commonRdbmsReaderJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); int fetchSize = this.originalConfig.getInt(Constant.FETCH_SIZE, Integer.MIN_VALUE); this.originalConfig.set(Constant.FETCH_SIZE, fetchSize); this.validateConfiguration(); this.commonRdbmsReaderJob = new CommonRdbmsReader.Job( DATABASE_TYPE); this.commonRdbmsReaderJob.init(this.originalConfig); } @Override public List split(int adviceNumber) { return DrdsReaderSplitUtil.doSplit(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderJob.destroy(this.originalConfig); } private void validateConfiguration() { // do not splitPk String splitPk = originalConfig.getString(Key.SPLIT_PK, null); if (null != splitPk) { LOG.warn("由于您读取数据库是drds, 所以您不需要配置 splitPk. 如果您不想看到这条提醒,请移除您源头表中配置的 splitPk."); this.originalConfig.remove(Key.SPLIT_PK); } List conns = this.originalConfig.getList( Constant.CONN_MARK, Object.class); if (null == conns || conns.size() != 1) { throw DataXException.asDataXException( DBUtilErrorCode.REQUIRED_VALUE, "您未配置读取数据库jdbcUrl的信息. 正确的配置方式是给 jdbcUrl 配置上您需要读取的连接. 请检查您的配置并作出修改."); } Configuration connConf = Configuration .from(conns.get(0).toString()); connConf.getNecessaryValue(Key.JDBC_URL, DBUtilErrorCode.REQUIRED_VALUE); // only one jdbcUrl List jdbcUrls = connConf .getList(Key.JDBC_URL, String.class); if (null == jdbcUrls || jdbcUrls.size() != 1) { throw DataXException.asDataXException( DBUtilErrorCode.ILLEGAL_VALUE, "您的jdbcUrl配置信息有误, 因为您配置读取数据库jdbcUrl的数量不正确. 正确的配置方式是配置且只配置 1 个目的 jdbcUrl. 请检查您的配置并作出修改."); } // if have table,only one List tables = connConf.getList(Key.TABLE, String.class); if (null != tables && tables.size() != 1) { throw DataXException .asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "您的jdbcUrl配置信息有误. 由于您读取数据库是drds,配置读取源表数目错误. 正确的配置方式是配置且只配置 1 个目的 table. 请检查您的配置并作出修改."); } if (null != tables && tables.size() == 1) { List expandedTables = TableExpandUtil.expandTableConf( DATABASE_TYPE, tables); if (null == expandedTables || expandedTables.size() != 1) { throw DataXException .asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "您的jdbcUrl配置信息有误. 由于您读取数据库是drds,配置读取源表数目错误. 正确的配置方式是配置且只配置 1 个目的 table. 请检查您的配置并作出修改."); } } // if have querySql,only one List querySqls = connConf.getList(Key.QUERY_SQL, String.class); if (null != querySqls && querySqls.size() != 1) { throw DataXException .asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "您的querySql配置信息有误. 由于您读取数据库是drds, 配置读取querySql数目错误. 正确的配置方式是配置且只配置 1 个 querySql. 请检查您的配置并作出修改."); } // warn:other checking about table,querySql in common } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderTask; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderTask = new CommonRdbmsReader.Task( DATABASE_TYPE,super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderTask.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(Constant.FETCH_SIZE); this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderTask.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); } } } ================================================ FILE: drdsreader/src/main/java/com/alibaba/datax/plugin/reader/drdsreader/DrdsReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.drdsreader; import com.alibaba.datax.common.spi.ErrorCode; public enum DrdsReaderErrorCode implements ErrorCode { GET_TOPOLOGY_FAILED("DrdsReader-01", "获取 drds 表的拓扑结构失败."),; private final String code; private final String description; private DrdsReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: drdsreader/src/main/java/com/alibaba/datax/plugin/reader/drdsreader/DrdsReaderSplitUtil.java ================================================ package com.alibaba.datax.plugin.reader.drdsreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.reader.util.SingleTableSplitUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.ResultSet; import java.util.*; public class DrdsReaderSplitUtil { private static final Logger LOG = LoggerFactory .getLogger(DrdsReaderSplitUtil.class); public static List doSplit(Configuration originalSliceConfig, int adviceNumber) { boolean isTableMode = originalSliceConfig.getBool(Constant.IS_TABLE_MODE).booleanValue(); int tableNumber = originalSliceConfig.getInt(Constant.TABLE_NUMBER_MARK); if (isTableMode && tableNumber == 1) { //需要先把内层的 table,connection 先放到外层 String table = originalSliceConfig.getString(String.format("%s[0].%s[0]", Constant.CONN_MARK, Key.TABLE)).trim(); originalSliceConfig.set(Key.TABLE, table); //注意:这里的 jdbcUrl 不是从数组中获取的,因为之前的 master init 方法已经进行过预处理 String jdbcUrl = originalSliceConfig.getString(String.format("%s[0].%s", Constant.CONN_MARK, Key.JDBC_URL)).trim(); originalSliceConfig.set(Key.JDBC_URL, DataBaseType.DRDS.appendJDBCSuffixForReader(jdbcUrl)); originalSliceConfig.remove(Constant.CONN_MARK); return doDrdsReaderSplit(originalSliceConfig); } else { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, "您的配置信息中的表(table)的配置有误. 因为Drdsreader 只需要读取一张逻辑表,后台会通过DRDS Proxy自动获取实际对应物理表的数据. 请检查您的配置并作出修改."); } } private static List doDrdsReaderSplit(Configuration originalSliceConfig) { List splittedConfigurations = new ArrayList(); Map> topology = getTopology(originalSliceConfig); if (null == topology || topology.isEmpty()) { throw DataXException.asDataXException(DrdsReaderErrorCode.GET_TOPOLOGY_FAILED, "获取 drds 表拓扑结构失败, 拓扑结构不能为空."); } else { String table = originalSliceConfig.getString(Key.TABLE).trim(); String column = originalSliceConfig.getString(Key.COLUMN).trim(); String where = originalSliceConfig.getString(Key.WHERE, null); // 不能带英语分号结尾 String sql = SingleTableSplitUtil .buildQuerySql(column, table, where); // 根据拓扑拆分任务 for (Map.Entry> entry : topology.entrySet()) { String group = entry.getKey(); StringBuilder sqlbuilder = new StringBuilder(); sqlbuilder.append("/*+TDDL({'extra':{'MERGE_UNION':'false'},'type':'direct',"); sqlbuilder.append("'vtab':'").append(table).append("',"); sqlbuilder.append("'dbid':'").append(group).append("',"); sqlbuilder.append("'realtabs':["); Iterator it = entry.getValue().iterator(); while (it.hasNext()) { String realTable = it.next(); sqlbuilder.append('\'').append(realTable).append('\''); if (it.hasNext()) { sqlbuilder.append(','); } } sqlbuilder.append("]})*/"); sqlbuilder.append(sql); Configuration param = originalSliceConfig.clone(); param.set(Key.QUERY_SQL, sqlbuilder.toString()); splittedConfigurations.add(param); } return splittedConfigurations; } } private static Map> getTopology(Configuration configuration) { Map> topology = new HashMap>(); String jdbcURL = configuration.getString(Key.JDBC_URL); String username = configuration.getString(Key.USERNAME); String password = configuration.getString(Key.PASSWORD); String logicTable = configuration.getString(Key.TABLE).trim(); Connection conn = null; ResultSet rs = null; try { conn = DBUtil.getConnection(DataBaseType.DRDS, jdbcURL, username, password); rs = DBUtil.query(conn, "SHOW TOPOLOGY " + logicTable); while (DBUtil.asyncResultSetNext(rs)) { String groupName = rs.getString("GROUP_NAME"); String tableName = rs.getString("TABLE_NAME"); List tables = topology.get(groupName); if (tables == null) { tables = new ArrayList(); topology.put(groupName, tables); } tables.add(tableName); } return topology; } catch (Exception e) { throw DataXException.asDataXException(DrdsReaderErrorCode.GET_TOPOLOGY_FAILED, String.format("获取 drds 表拓扑结构失败.根据您的配置, datax获取不到拓扑信息。相关上下文信息:表:%s, jdbcUrl:%s . 请联系 drds 管理员处理.", logicTable, jdbcURL), e); } finally { DBUtil.closeDBResources(rs, null, conn); } } } ================================================ FILE: drdsreader/src/main/resources/plugin.json ================================================ { "name": "drdsreader", "class": "com.alibaba.datax.plugin.reader.drdsreader.DrdsReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: drdsreader/src/main/resources/plugin_job_template.json ================================================ { "name": "drdsreader", "parameter": { "jdbcUrl": "", "username": "", "password": "", "table": "", "column": [], "where": "" } } ================================================ FILE: drdswriter/doc/drdswriter.md ================================================ # DataX DRDSWriter --- ## 1 快速介绍 DRDSWriter 插件实现了写入数据到 DRDS 的目的表的功能。在底层实现上, DRDSWriter 通过 JDBC 连接远程 DRDS 数据库的 Proxy,并执行相应的 replace into ... 的 sql 语句将数据写入 DRDS,特别注意执行的 Sql 语句是 replace into,为了避免数据重复写入,需要你的表具备主键或者唯一性索引(Unique Key)。 DRDSWriter 面向ETL开发工程师,他们使用 DRDSWriter 从数仓导入数据到 DRDS。同时 DRDSWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 DRDSWriter 通过 DataX 框架获取 Reader 生成的协议数据,通过 `replace into...`(没有遇到主键/唯一性索引冲突时,与 insert into 行为一致,冲突时会用新行替换原有行所有字段) 的语句写入数据到 DRDS。DRDSWriter 累积一定数据,提交给 DRDS 的 Proxy,该 Proxy 内部决定数据是写入一张还是多张表以及多张表写入时如何路由数据。
注意:整个任务至少需要具备 replace into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 DRDS 导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": {                    "name": "drdswriter", "parameter": { "writeMode": "insert", "username": "root", "password": "root", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax?useUnicode=true&characterEncoding=gbk", "table": [ "test" ] } ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息。作业运行时,DataX 会在你提供的 jdbcUrl 后面追加如下属性:yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true 注意:1、在一个数据库上只能配置一个 jdbcUrl 值 2、一个DRDS 写入任务仅能配置一个 jdbcUrl 3、jdbcUrl按照Mysql/DRDS官方规范,并可以填写连接附加控制信息,比如想指定连接编码为 gbk ,则在 jdbcUrl 后面追加属性 useUnicode=true&characterEncoding=gbk。具体请参看 Mysql/DRDS官方文档或者咨询对应 DBA。 * 必选:是
* 默认值:无
* **username** * 描述:目的数据库的用户名
* 必选:是
* 默认值:无
* **password** * 描述:目的数据库的密码
* 必选:是
* 默认值:无
* **table** * 描述:目的表的表名称。 只能配置一个DRDS 的表名称。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
* 默认值:无
* **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用*表示, 例如: "column": ["*"] **column配置项必须指定,不能留空!** 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 * 必选:是
* 默认值:否
* **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。比如你想在导入数据前清空数据表中的数据,那么可以配置为:`"preSql":["delete from yourTableName"]`
* 必选:否
* 默认值:无
* **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
* 必选:否
* 默认值:无
* **writeMode** * 描述:默认为 replace,目前仅支持 replace,可以不配置。
* 必选:否
* 默认值:replace
* **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与DRDS的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
* 必选:否
* 默认值:
### 3.3 类型转换 类似 MysqlWriter ,目前 DRDSWriter 支持大部分 Mysql 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出 DRDSWriter 针对 Mysql 类型转换列表: | DataX 内部类型| Mysql 数据类型 | | -------- | ----- | | Long |int, tinyint, smallint, mediumint, int, bigint, year| | Double |float, double, decimal| | String |varchar, char, tinytext, text, mediumtext, longtext | | Date |date, datetime, timestamp, time | | Boolean |bit, bool | | Bytes |tinyblob, mediumblob, blob, longblob, varbinary | ## 4 性能报告 ## 5 约束限制 ## FAQ *** **Q: DRDSWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 *** **Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。第二种,向临时表导入数据,完成后再 rename 到线上表。 *** **Q: 上面第二种方法可以避免对线上数据造成影响,那我具体怎样操作?** A: 可以配置临时表导入 ================================================ FILE: drdswriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT drdswriter drdswriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java ${mysql.driver.version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: drdswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/drdswriter target/ drdswriter-0.0.1-SNAPSHOT.jar plugin/writer/drdswriter false plugin/writer/drdswriter/libs runtime ================================================ FILE: drdswriter/src/main/java/com/alibaba/datax/plugin/writer/drdswriter/DrdsWriter.java ================================================ package com.alibaba.datax.plugin.writer.drdswriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.util.List; public class DrdsWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.DRDS; public static class Job extends Writer.Job { private Configuration originalConfig = null; private String DEFAULT_WRITEMODE = "replace"; private String INSERT_IGNORE_WRITEMODE = "insert ignore"; private CommonRdbmsWriter.Job commonRdbmsWriterJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); String writeMode = this.originalConfig.getString(Key.WRITE_MODE, DEFAULT_WRITEMODE); if (!DEFAULT_WRITEMODE.equalsIgnoreCase(writeMode) && !INSERT_IGNORE_WRITEMODE.equalsIgnoreCase(writeMode)) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, String.format("写入模式(writeMode)配置错误. DRDSWriter只支持两种写入模式为:[%s, %s], 但是您配置的写入模式为:%s. 请检查您的配置并作出修改.", DEFAULT_WRITEMODE, INSERT_IGNORE_WRITEMODE, writeMode)); } this.originalConfig.set(Key.WRITE_MODE, writeMode); this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterJob.init(this.originalConfig); } // 对于 Drds 而言,只会暴露一张逻辑表,所以直接在 Master 做 pre,post 操作 @Override public void prepare() { this.commonRdbmsWriterJob.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); } // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) @Override public void post() { this.commonRdbmsWriterJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterJob.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterTask; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterTask = new CommonRdbmsWriter.Task(DATABASE_TYPE); this.commonRdbmsWriterTask.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); } //TODO 改用连接池,确保每次获取的连接都是可用的(注意:连接可能需要每次都初始化其 session) public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterTask.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); } } } ================================================ FILE: drdswriter/src/main/resources/plugin.json ================================================ { "name": "drdswriter", "class": "com.alibaba.datax.plugin.writer.drdswriter.DrdsWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: drdswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "drdswriter", "parameter": { "jdbcUrl": "", "username": "", "password": "", "table": "", "column": [], "writeMode": "", "preSql": [], "postSql": [] } } ================================================ FILE: elasticsearchwriter/README.md ================================================ 本插件仅在Elasticsearch 5.x上测试 ================================================ FILE: elasticsearchwriter/build.sh ================================================ #!/bin/sh SCRIPT_HOME=$(cd $(dirname $0); pwd) cd $SCRIPT_HOME/.. mvn clean package -DskipTests assembly:assembly cd $SCRIPT_HOME/target/datax/plugin/writer/ if [ -d "eswriter" ]; then tar -zcvf eswriter.tgz eswriter cp eswriter.tgz $SCRIPT_HOME cd $SCRIPT_HOME ansible-playbook -i hosts main.yml -u vagrant -k fi ================================================ FILE: elasticsearchwriter/doc/elasticsearchwriter.md ================================================ # DataX ElasticSearchWriter --- ## 1 快速介绍 数据导入elasticsearch的插件 ## 2 实现原理 使用elasticsearch的rest api接口, 批量把从reader读入的数据写入elasticsearch ## 3 功能说明 ### 3.1 配置样例 #### job.json ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { ... }, "writer": { "name": "elasticsearchwriter", "parameter": { "endpoint": "http://xxx:9999", "accessId": "xxxx", "accessKey": "xxxx", "index": "test-1", "type": "default", "cleanup": true, "settings": {"index" :{"number_of_shards": 1, "number_of_replicas": 0}}, "discovery": false, "batchSize": 1000, "splitter": ",", "column": [ {"name": "pk", "type": "id"}, { "name": "col_ip","type": "ip" }, { "name": "col_double","type": "double" }, { "name": "col_long","type": "long" }, { "name": "col_integer","type": "integer" }, { "name": "col_keyword", "type": "keyword" }, { "name": "col_text", "type": "text", "analyzer": "ik_max_word"}, { "name": "col_geo_point", "type": "geo_point" }, { "name": "col_date", "type": "date", "format": "yyyy-MM-dd HH:mm:ss"}, { "name": "col_nested1", "type": "nested" }, { "name": "col_nested2", "type": "nested" }, { "name": "col_object1", "type": "object" }, { "name": "col_object2", "type": "object" }, { "name": "col_integer_array", "type":"integer", "array":true}, { "name": "col_geo_shape", "type":"geo_shape", "tree": "quadtree", "precision": "10m"} ] } } } ] } } ``` #### 3.2 参数说明 * endpoint * 描述:ElasticSearch的连接地址 * 必选:是 * 默认值:无 * accessId * 描述:http auth中的user * 必选:否 * 默认值:空 * accessKey * 描述:http auth中的password * 必选:否 * 默认值:空 * index * 描述:elasticsearch中的index名 * 必选:是 * 默认值:无 * type * 描述:elasticsearch中index的type名 * 必选:否 * 默认值:index名 * cleanup * 描述:是否删除原表 * 必选:否 * 默认值:false * batchSize * 描述:每次批量数据的条数 * 必选:否 * 默认值:1000 * trySize * 描述:失败后重试的次数 * 必选:否 * 默认值:30 * timeout * 描述:客户端超时时间 * 必选:否 * 默认值:600000 * discovery * 描述:启用节点发现将(轮询)并定期更新客户机中的服务器列表。 * 必选:否 * 默认值:false * compression * 描述:http请求,开启压缩 * 必选:否 * 默认值:true * multiThread * 描述:http请求,是否有多线程 * 必选:否 * 默认值:true * ignoreWriteError * 描述:忽略写入错误,不重试,继续写入 * 必选:否 * 默认值:false * ignoreParseError * 描述:忽略解析数据格式错误,继续写入 * 必选:否 * 默认值:true * alias * 描述:数据导入完成后写入别名 * 必选:否 * 默认值:无 * aliasMode * 描述:数据导入完成后增加别名的模式,append(增加模式), exclusive(只留这一个) * 必选:否 * 默认值:append * settings * 描述:创建index时候的settings, 与elasticsearch官方相同 * 必选:否 * 默认值:无 * splitter * 描述:如果插入数据是array,就使用指定分隔符 * 必选:否 * 默认值:-,- * column * 描述:elasticsearch所支持的字段类型,样例中包含了全部 * 必选:是 * dynamic * 描述: 不使用datax的mappings,使用es自己的自动mappings * 必选: 否 * 默认值: false ================================================ FILE: elasticsearchwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 elasticsearchwriter com.alibaba.datax 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic io.searchbox jest-common 6.3.1 io.searchbox jest 6.3.1 joda-time joda-time 2.9.7 junit junit 4.13.1 test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: elasticsearchwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin/writer/elasticsearchwriter target/ elasticsearchwriter-0.0.1-SNAPSHOT.jar plugin/writer/elasticsearchwriter false plugin/writer/elasticsearchwriter/libs runtime ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchClient.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.ClusterInfo; import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.ClusterInfoResult; import com.alibaba.datax.plugin.writer.elasticsearchwriter.jest.PutMapping7; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONObject; import com.google.gson.Gson; import com.google.gson.JsonElement; import com.google.gson.JsonObject; import com.google.gson.JsonParser; import io.searchbox.action.Action; import io.searchbox.client.JestClient; import io.searchbox.client.JestClientFactory; import io.searchbox.client.JestResult; import io.searchbox.client.config.HttpClientConfig; import io.searchbox.client.config.HttpClientConfig.Builder; import io.searchbox.core.Bulk; import io.searchbox.indices.CreateIndex; import io.searchbox.indices.DeleteIndex; import io.searchbox.indices.IndicesExists; import io.searchbox.indices.aliases.*; import io.searchbox.indices.mapping.GetMapping; import io.searchbox.indices.mapping.PutMapping; import io.searchbox.indices.settings.GetSettings; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; /** * Created by xiongfeng.bxf on 17/2/8. */ public class ElasticSearchClient { private static final Logger LOGGER = LoggerFactory.getLogger(ElasticSearchClient.class); private JestClient jestClient; private Configuration conf; public JestClient getClient() { return jestClient; } public ElasticSearchClient(Configuration conf) { this.conf = conf; String endpoint = Key.getEndpoint(conf); //es是支持集群写入的 String[] endpoints = endpoint.split(","); String user = Key.getUsername(conf); String passwd = Key.getPassword(conf); boolean multiThread = Key.isMultiThread(conf); int readTimeout = Key.getTimeout(conf); boolean compression = Key.isCompression(conf); boolean discovery = Key.isDiscovery(conf); String discoveryFilter = Key.getDiscoveryFilter(conf); int totalConnection = this.conf.getInt("maxTotalConnection", 200); JestClientFactory factory = new JestClientFactory(); Builder httpClientConfig = new HttpClientConfig .Builder(Arrays.asList(endpoints)) // .setPreemptiveAuth(new HttpHost(endpoint)) .multiThreaded(multiThread) .connTimeout(readTimeout) .readTimeout(readTimeout) .maxTotalConnection(totalConnection) .requestCompressionEnabled(compression) .discoveryEnabled(discovery) .discoveryFrequency(5L, TimeUnit.MINUTES) .discoveryFilter(discoveryFilter); if (!(StringUtils.isBlank(user) || StringUtils.isBlank(passwd))) { // 匿名登录 httpClientConfig.defaultCredentials(user, passwd); } factory.setHttpClientConfig(httpClientConfig.build()); this.jestClient = factory.getObject(); } public boolean indicesExists(String indexName) throws Exception { boolean isIndicesExists = false; JestResult rst = execute(new IndicesExists.Builder(indexName).build()); if (rst.isSucceeded()) { isIndicesExists = true; } else { LOGGER.warn("IndicesExists got ResponseCode: {} ErrorMessage: {}", rst.getResponseCode(), rst.getErrorMessage()); switch (rst.getResponseCode()) { case 404: isIndicesExists = false; break; case 401: // 无权访问 default: LOGGER.warn(rst.getErrorMessage()); break; } } return isIndicesExists; } public boolean deleteIndex(String indexName) throws Exception { LOGGER.info("delete index {}", indexName); if (indicesExists(indexName)) { JestResult rst = execute(new DeleteIndex.Builder(indexName).build()); if (!rst.isSucceeded()) { LOGGER.warn("DeleteIndex got ResponseCode: {}, ErrorMessage: {}", rst.getResponseCode(), rst.getErrorMessage()); return false; } else { LOGGER.info("delete index {} success", indexName); } } else { LOGGER.info("index cannot found, skip delete index {}", indexName); } return true; } public boolean isGreaterOrEqualThan7() throws Exception { try { ClusterInfoResult result = execute(new ClusterInfo.Builder().build()); LOGGER.info("ClusterInfoResult: {}", result.getJsonString()); return result.isGreaterOrEqualThan7(); }catch(Exception e) { LOGGER.warn(e.getMessage()); return false; } } /** * 获取索引的settings * @param indexName 索引名 * @return 设置 */ public String getIndexSettings(String indexName) { GetSettings.Builder builder = new GetSettings.Builder(); builder.addIndex(indexName); GetSettings getSettings = builder.build(); try { LOGGER.info("begin GetSettings for index: {}", indexName); JestResult result = this.execute(getSettings); return result.getJsonString(); } catch (Exception e) { String message = "GetSettings for index error: " + e.getMessage(); LOGGER.warn(message, e); throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_GET_SETTINGS, e.getMessage(), e); } } public boolean createIndexIfNotExists(String indexName, String typeName, Object mappings, String settings, boolean dynamic, boolean isGreaterOrEqualThan7) throws Exception { JestResult rst; if (!indicesExists(indexName)) { LOGGER.info("create index {}", indexName); rst = execute( new CreateIndex.Builder(indexName) .settings(settings) .setParameter("master_timeout", Key.getMasterTimeout(this.conf)) .build() ); //index_already_exists_exception if (!rst.isSucceeded()) { LOGGER.warn("CreateIndex got ResponseCode: {}, ErrorMessage: {}", rst.getResponseCode(), rst.getErrorMessage()); if (getStatus(rst) == 400) { LOGGER.info(String.format("index {} already exists", indexName)); return true; } else { return false; } } else { LOGGER.info("create {} index success", indexName); } } if (dynamic) { LOGGER.info("dynamic is true, ignore mappings"); return true; } LOGGER.info("create mappings for {} {}", indexName, mappings); //如果大于7.x,mapping的PUT请求URI中不能带type,并且mapping设置中不能带有嵌套结构 if (isGreaterOrEqualThan7) { rst = execute(new PutMapping7.Builder(indexName, mappings). setParameter("master_timeout", Key.getMasterTimeout(this.conf)).build()); } else { rst = execute(new PutMapping.Builder(indexName, typeName, mappings) .setParameter("master_timeout", Key.getMasterTimeout(this.conf)).build()); } if (!rst.isSucceeded()) { LOGGER.error("PutMapping got ResponseCode: {}, ErrorMessage: {}", rst.getResponseCode(), rst.getErrorMessage()); return false; } else { LOGGER.info("index {} put mappings success", indexName); } return true; } public T execute(Action clientRequest) throws IOException { T rst = jestClient.execute(clientRequest); if (!rst.isSucceeded()) { LOGGER.warn(rst.getJsonString()); } return rst; } public Integer getStatus(JestResult rst) { JsonObject jsonObject = rst.getJsonObject(); if (jsonObject.has("status")) { return jsonObject.get("status").getAsInt(); } return 600; } public boolean isBulkResult(JestResult rst) { JsonObject jsonObject = rst.getJsonObject(); return jsonObject.has("items"); } public boolean alias(String indexname, String aliasname, boolean needClean) throws IOException { GetAliases getAliases = new GetAliases.Builder().addIndex(aliasname).build(); AliasMapping addAliasMapping = new AddAliasMapping.Builder(indexname, aliasname).build(); JestResult rst = null; List list = new ArrayList(); if (needClean) { rst = execute(getAliases); if (rst.isSucceeded()) { JsonParser jp = new JsonParser(); JsonObject jo = (JsonObject) jp.parse(rst.getJsonString()); for (Map.Entry entry : jo.entrySet()) { String tindex = entry.getKey(); if (indexname.equals(tindex)) { continue; } AliasMapping m = new RemoveAliasMapping.Builder(tindex, aliasname).build(); String s = new Gson().toJson(m.getData()); LOGGER.info(s); list.add(m); } } } ModifyAliases modifyAliases = new ModifyAliases.Builder(addAliasMapping).addAlias(list).setParameter("master_timeout", Key.getMasterTimeout(this.conf)).build(); rst = execute(modifyAliases); if (!rst.isSucceeded()) { LOGGER.error(rst.getErrorMessage()); throw new IOException(rst.getErrorMessage()); } return true; } /** * 获取index的mapping */ public String getIndexMapping(String indexName) { GetMapping.Builder builder = new GetMapping.Builder(); builder.addIndex(indexName); GetMapping getMapping = builder.build(); try { LOGGER.info("begin GetMapping for index: {}", indexName); JestResult result = this.execute(getMapping); return result.getJsonString(); } catch (Exception e) { String message = "GetMapping for index error: " + e.getMessage(); LOGGER.warn(message, e); throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_MAPPINGS, e.getMessage(), e); } } public String getMappingForIndexType(String indexName, String typeName) { String indexMapping = this.getIndexMapping(indexName); JSONObject indexMappingInJson = JSON.parseObject(indexMapping); List paths = Arrays.asList(indexName, "mappings"); JSONObject properties = JsonPathUtil.getJsonObject(paths, indexMappingInJson); JSONObject propertiesParent = properties; if (StringUtils.isNotBlank(typeName) && properties.containsKey(typeName)) { propertiesParent = (JSONObject) properties.get(typeName); } JSONObject mapping = (JSONObject) propertiesParent.get("properties"); return JSON.toJSONString(mapping); } public JestResult bulkInsert(Bulk.Builder bulk) throws Exception { // es_rejected_execution_exception // illegal_argument_exception // cluster_block_exception JestResult rst = null; rst = execute(bulk.build()); if (!rst.isSucceeded()) { LOGGER.warn(rst.getErrorMessage()); } return rst; } /** * 关闭JestClient客户端 * */ public void closeJestClient() { if (jestClient != null) { try { // jestClient.shutdownClient(); jestClient.close(); } catch (IOException e) { LOGGER.warn("ignore error: ", e.getMessage()); } } } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchColumn.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import java.util.List; /** * Created by xiongfeng.bxf on 17/3/2. */ public class ElasticSearchColumn { private String name;//: "appkey", private String type;//": "TEXT", private String timezone; /** * 源头数据格式化处理,datax做的事情 */ private String format; /** * 目标端格式化,es原生支持的格式 */ private String dstFormat; private boolean array; /** * 是否使用目标端(ES原生)数组类型 * * 默认是false */ private boolean dstArray = false; private boolean jsonArray; private boolean origin; private List combineFields; private String combineFieldsValueSeparator = "-"; public String getCombineFieldsValueSeparator() { return combineFieldsValueSeparator; } public void setCombineFieldsValueSeparator(String combineFieldsValueSeparator) { this.combineFieldsValueSeparator = combineFieldsValueSeparator; } public List getCombineFields() { return combineFields; } public void setCombineFields(List combineFields) { this.combineFields = combineFields; } public void setName(String name) { this.name = name; } public void setType(String type) { this.type = type; } public void setTimeZone(String timezone) { this.timezone = timezone; } public void setFormat(String format) { this.format = format; } public String getName() { return name; } public String getType() { return type; } public boolean isOrigin() { return origin; } public void setOrigin(boolean origin) { this.origin = origin; } public String getTimezone() { return timezone; } public String getFormat() { return format; } public void setTimezone(String timezone) { this.timezone = timezone; } public boolean isArray() { return array; } public void setArray(boolean array) { this.array = array; } public boolean isJsonArray() {return jsonArray;} public void setJsonArray(boolean jsonArray) {this.jsonArray = jsonArray;} public String getDstFormat() { return dstFormat; } public void setDstFormat(String dstFormat) { this.dstFormat = dstFormat; } public boolean isDstArray() { return dstArray; } public void setDstArray(boolean dstArray) { this.dstArray = dstArray; } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchFieldType.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; /** * Created by xiongfeng.bxf on 17/3/1. */ public enum ElasticSearchFieldType { ID, PARENT, ROUTING, VERSION, STRING, TEXT, KEYWORD, LONG, INTEGER, SHORT, BYTE, DOUBLE, FLOAT, DATE, BOOLEAN, BINARY, INTEGER_RANGE, FLOAT_RANGE, LONG_RANGE, DOUBLE_RANGE, DATE_RANGE, GEO_POINT, GEO_SHAPE, IP, IP_RANGE, COMPLETION, TOKEN_COUNT, OBJECT, NESTED; public static ElasticSearchFieldType getESFieldType(String type) { if (type == null) { return null; } for (ElasticSearchFieldType f : ElasticSearchFieldType.values()) { if (f.name().compareTo(type.toUpperCase()) == 0) { return f; } } return null; } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchWriter.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.DataXCaseEnvUtil; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.writer.elasticsearchwriter.Key.ActionType; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import com.alibaba.fastjson2.TypeReference; import com.alibaba.fastjson2.JSONWriter; import com.google.common.base.Joiner; import io.searchbox.client.JestResult; import io.searchbox.core.*; import io.searchbox.params.Parameters; import org.apache.commons.lang3.StringUtils; import org.joda.time.DateTime; import org.joda.time.DateTimeZone; import org.joda.time.format.DateTimeFormat; import org.joda.time.format.DateTimeFormatter; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.util.*; import java.util.concurrent.Callable; public class ElasticSearchWriter extends Writer { private final static String WRITE_COLUMNS = "write_columns"; public static class Job extends Writer.Job { private static final Logger LOGGER = LoggerFactory.getLogger(Job.class); private Configuration conf = null; int retryTimes = 3; long sleepTimeInMilliSecond = 10000L; private String settingsCache; private void setSettings(String settings) { this.settingsCache = JsonUtil.mergeJsonStr(settings, this.settingsCache); } @Override public void init() { this.conf = super.getPluginJobConf(); //LOGGER.info("conf:{}", conf); this.retryTimes = this.conf.getInt("retryTimes", 3); this.sleepTimeInMilliSecond = this.conf.getLong("sleepTimeInMilliSecond", 10000L); } public List getIncludeSettings() { return this.conf.getList("includeSettingKeys", Arrays.asList("number_of_shards", "number_of_replicas"), String.class); } /** * 从es中获取的原始settings转为需要的settings * @param originSettings 原始settings * @return settings */ private String convertSettings(String originSettings) { if(StringUtils.isBlank(originSettings)) { return null; } JSONObject jsonObject = JSON.parseObject(originSettings); for(String key : jsonObject.keySet()) { JSONObject settingsObj = jsonObject.getJSONObject(key); if(settingsObj != null) { JSONObject indexObj = settingsObj.getJSONObject("settings"); JSONObject settings = indexObj.getJSONObject("index"); JSONObject filterSettings = new JSONObject(); if(settings != null) { List includeSettings = getIncludeSettings(); if(includeSettings != null && includeSettings.size() > 0) { for(String includeSetting : includeSettings) { Object fieldValue = settings.get(includeSetting); if(fieldValue != null) { filterSettings.put(includeSetting, fieldValue); } } return filterSettings.toJSONString(); } } } } return null; } @Override public void prepare() { /** * 注意:此方法仅执行一次。 * 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 * 对于7.x之后的es版本,取消了index设置type的逻辑,因此在prepare阶段,加入了判断是否为7.x及以上版本 * 如果是7.x及以上版本,需要对于index的type做不同的处理 * 详见 : https://www.elastic.co/guide/en/elasticsearch/reference/6.8/removal-of-types.html */ final ElasticSearchClient esClient = new ElasticSearchClient(this.conf); final String indexName = Key.getIndexName(conf); ActionType actionType = Key.getActionType(conf); final String typeName = Key.getTypeName(conf); final boolean dynamic = Key.getDynamic(conf); final String dstDynamic = Key.getDstDynamic(conf); final String newSettings = JSONObject.toJSONString(Key.getSettings(conf)); LOGGER.info("conf settings:{}, settingsCache:{}", newSettings, this.settingsCache); final Integer esVersion = Key.getESVersion(conf); boolean hasId = this.hasID(); this.conf.set("hasId", hasId); if (ActionType.UPDATE.equals(actionType) && !hasId && !hasPrimaryKeyInfo()) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.UPDATE_WITH_ID, "Update mode must specify column type with id or primaryKeyInfo config"); } try { RetryUtil.executeWithRetry(() -> { boolean isGreaterOrEqualThan7 = esClient.isGreaterOrEqualThan7(); if (esVersion != null && esVersion >= 7) { isGreaterOrEqualThan7 = true; } String mappings = genMappings(dstDynamic, typeName, isGreaterOrEqualThan7); conf.set("isGreaterOrEqualThan7", isGreaterOrEqualThan7); LOGGER.info(String.format("index:[%s], type:[%s], mappings:[%s]", indexName, typeName, mappings)); boolean isIndicesExists = esClient.indicesExists(indexName); if (isIndicesExists) { try { // 将原有的mapping打印出来,便于排查问题 String oldMappings = esClient.getMappingForIndexType(indexName, typeName); LOGGER.info("the mappings for old index is: {}", oldMappings); } catch (Exception e) { LOGGER.warn("warn message: {}", e.getMessage()); } } if (Key.isTruncate(conf) && isIndicesExists) { // 备份老的索引中的settings到缓存 try { String oldOriginSettings = esClient.getIndexSettings(indexName); if (StringUtils.isNotBlank(oldOriginSettings)) { String includeSettings = convertSettings(oldOriginSettings); LOGGER.info("merge1 settings:{}, settingsCache:{}, includeSettings:{}", oldOriginSettings, this.settingsCache, includeSettings); this.setSettings(includeSettings); } } catch (Exception e) { LOGGER.warn("get old settings fail, indexName:{}", indexName); } esClient.deleteIndex(indexName); } // 更新缓存中的settings this.setSettings(newSettings); LOGGER.info("merge2 settings:{}, settingsCache:{}", newSettings, this.settingsCache); // 强制创建,内部自动忽略已存在的情况 if (!esClient.createIndexIfNotExists(indexName, typeName, mappings, this.settingsCache, dynamic, isGreaterOrEqualThan7)) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_MAPPINGS, ""); } return true; }, DataXCaseEnvUtil.getRetryTimes(this.retryTimes), DataXCaseEnvUtil.getRetryInterval(this.sleepTimeInMilliSecond), DataXCaseEnvUtil.getRetryExponential(false)); } catch (Exception ex) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_MAPPINGS, ex.getMessage(), ex); } finally { try { esClient.closeJestClient(); } catch (Exception e) { LOGGER.warn("ignore close jest client error: {}", e.getMessage()); } } } private boolean hasID() { List column = conf.getList("column"); if (column != null) { for (Object col : column) { JSONObject jo = JSONObject.parseObject(col.toString()); String colTypeStr = jo.getString("type"); ElasticSearchFieldType colType = ElasticSearchFieldType.getESFieldType(colTypeStr); if (ElasticSearchFieldType.ID.equals(colType)) { return true; } } } return false; } private boolean hasPrimaryKeyInfo() { PrimaryKeyInfo primaryKeyInfo = Key.getPrimaryKeyInfo(this.conf); if (null != primaryKeyInfo && null != primaryKeyInfo.getColumn() && !primaryKeyInfo.getColumn().isEmpty()) { return true; } else { return false; } } private String genMappings(String dstDynamic, String typeName, boolean isGreaterOrEqualThan7) { String mappings; Map propMap = new HashMap(); List columnList = new ArrayList(); ElasticSearchColumn combineItem = null; List column = conf.getList("column"); if (column != null) { for (Object col : column) { JSONObject jo = JSONObject.parseObject(col.toString()); String colName = jo.getString("name"); String colTypeStr = jo.getString("type"); if (colTypeStr == null) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, col.toString() + " column must have type"); } ElasticSearchFieldType colType = ElasticSearchFieldType.getESFieldType(colTypeStr); if (colType == null) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, col.toString() + " unsupported type"); } ElasticSearchColumn columnItem = new ElasticSearchColumn(); if (Key.PRIMARY_KEY_COLUMN_NAME.equals(colName)) { // 兼容已有版本 colType = ElasticSearchFieldType.ID; colTypeStr = "id"; } columnItem.setName(colName); columnItem.setType(colTypeStr); JSONArray combineFields = jo.getJSONArray("combineFields"); if (combineFields != null && !combineFields.isEmpty() && ElasticSearchFieldType.ID.equals(ElasticSearchFieldType.getESFieldType(colTypeStr))) { List fields = new ArrayList(); for (Object item : combineFields) { fields.add((String) item); } columnItem.setCombineFields(fields); combineItem = columnItem; } String combineFieldsValueSeparator = jo.getString("combineFieldsValueSeparator"); if (StringUtils.isNotBlank(combineFieldsValueSeparator)) { columnItem.setCombineFieldsValueSeparator(combineFieldsValueSeparator); } // 如果是id,version,routing,不需要创建mapping if (colType == ElasticSearchFieldType.ID || colType == ElasticSearchFieldType.VERSION || colType == ElasticSearchFieldType.ROUTING) { columnList.add(columnItem); continue; } // 如果是组合id中的字段,不需要创建mapping // 所以组合id的定义必须要在columns最前面 if (combineItem != null && combineItem.getCombineFields().contains(colName)) { columnList.add(columnItem); continue; } columnItem.setDstArray(false); Boolean array = jo.getBoolean("array"); if (array != null) { columnItem.setArray(array); Boolean dstArray = jo.getBoolean("dstArray"); if(dstArray!=null) { columnItem.setDstArray(dstArray); } } else { columnItem.setArray(false); } Boolean jsonArray = jo.getBoolean("json_array"); if (jsonArray != null) { columnItem.setJsonArray(jsonArray); } else { columnItem.setJsonArray(false); } Map field = new HashMap(); field.put("type", colTypeStr); //https://www.elastic.co/guide/en/elasticsearch/reference/5.2/breaking_50_mapping_changes.html#_literal_index_literal_property // https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_deep_dive_on_doc_values.html#_disabling_doc_values field.put("doc_values", jo.getBoolean("doc_values")); field.put("ignore_above", jo.getInteger("ignore_above")); field.put("index", jo.getBoolean("index")); switch (colType) { case STRING: // 兼容string类型,ES5之前版本 break; case KEYWORD: // https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html#_warm_up_global_ordinals field.put("eager_global_ordinals", jo.getBoolean("eager_global_ordinals")); break; case TEXT: field.put("analyzer", jo.getString("analyzer")); // 优化disk使用,也同步会提高index性能 // https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html field.put("norms", jo.getBoolean("norms")); field.put("index_options", jo.getBoolean("index_options")); if(jo.getString("fields") != null) { field.put("fields", jo.getJSONObject("fields")); } break; case DATE: if (Boolean.TRUE.equals(jo.getBoolean("origin"))) { if (jo.getString("format") != null) { field.put("format", jo.getString("format")); } // es原生format覆盖原先来的format if (jo.getString("dstFormat") != null) { field.put("format", jo.getString("dstFormat")); } if(jo.getBoolean("origin") != null) { columnItem.setOrigin(jo.getBoolean("origin")); } } else { columnItem.setTimeZone(jo.getString("timezone")); columnItem.setFormat(jo.getString("format")); } break; case GEO_SHAPE: field.put("tree", jo.getString("tree")); field.put("precision", jo.getString("precision")); break; case OBJECT: case NESTED: if (jo.getString("dynamic") != null) { field.put("dynamic", jo.getString("dynamic")); } break; default: break; } if (jo.containsKey("other_params")) { field.putAll(jo.getJSONObject("other_params")); } propMap.put(colName, field); columnList.add(columnItem); } } long version = System.currentTimeMillis(); LOGGER.info("unified version: {}", version); conf.set("version", version); conf.set(WRITE_COLUMNS, JSON.toJSONString(columnList)); LOGGER.info(JSON.toJSONString(columnList)); Map rootMappings = new HashMap(); Map typeMappings = new HashMap(); typeMappings.put("properties", propMap); rootMappings.put(typeName, typeMappings); // 7.x以后版本取消了index中关于type的指定,所以mapping的格式只能支持 // { // "properties" : { // "abc" : { // "type" : "text" // } // } // } // properties 外不能再嵌套typeName if(StringUtils.isNotBlank(dstDynamic)) { typeMappings.put("dynamic", dstDynamic); } if (isGreaterOrEqualThan7) { mappings = JSON.toJSONString(typeMappings); } else { mappings = JSON.toJSONString(rootMappings); } if (StringUtils.isBlank(mappings)) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, "must have mappings"); } return mappings; } @Override public List split(int mandatoryNumber) { List configurations = new ArrayList(mandatoryNumber); for (int i = 0; i < mandatoryNumber; i++) { configurations.add(this.conf.clone()); } return configurations; } @Override public void post() { ElasticSearchClient esClient = new ElasticSearchClient(this.conf); String alias = Key.getAlias(conf); if (!"".equals(alias)) { LOGGER.info(String.format("alias [%s] to [%s]", alias, Key.getIndexName(conf))); try { esClient.alias(Key.getIndexName(conf), alias, Key.isNeedCleanAlias(conf)); } catch (IOException e) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_ALIAS_MODIFY, e); } } } @Override public void destroy() { } } public static class Task extends Writer.Task { private static final Logger LOGGER = LoggerFactory.getLogger(Job.class); private Configuration conf; ElasticSearchClient esClient = null; private List typeList; private List columnList; private List> deleteByConditions; private int trySize; private long tryInterval; private int batchSize; private String index; private String type; private String splitter; private ActionType actionType; private ElasticSearchColumn combinedIdColumn; private Map colNameToIndexMap; private Map urlParams; private boolean columnSizeChecked = false; private boolean enableRedundantColumn = false; private boolean enableWriteNull = true; int retryTimes = 3; long sleepTimeInMilliSecond = 10000L; boolean isGreaterOrEqualThan7 = false; private String fieldDelimiter; private boolean hasId; private PrimaryKeyInfo primaryKeyInfo; private boolean hasPrimaryKeyInfo = false; private List esPartitionColumn; private boolean hasEsPartitionColumn = false; @Override public void init() { this.conf = super.getPluginJobConf(); this.index = Key.getIndexName(conf); this.type = Key.getTypeName(conf); this.trySize = Key.getTrySize(conf); this.tryInterval = Key.getTryInterval(conf); this.batchSize = Key.getBatchSize(conf); this.splitter = Key.getSplitter(conf); this.actionType = Key.getActionType(conf); this.urlParams = Key.getUrlParams(conf); this.enableWriteNull = Key.isEnableNullUpdate(conf); this.retryTimes = this.conf.getInt("retryTimes", 3); this.sleepTimeInMilliSecond = this.conf.getLong("sleepTimeInMilliSecond", 10000L); this.isGreaterOrEqualThan7 = this.conf.getBool("isGreaterOrEqualThan7", false); this.parseDeleteCondition(conf); this.columnList = JSON.parseObject(this.conf.getString(WRITE_COLUMNS), new TypeReference>() { }); LOGGER.info("columnList: {}", JSON.toJSONString(columnList)); this.hasId = this.conf.getBool("hasId", false); if (hasId) { LOGGER.info("Task has id column, will use it to set _id property"); } else { LOGGER.info("Task will use elasticsearch auto generated _id property"); } this.fieldDelimiter = Key.getFieldDelimiter(this.conf); this.enableRedundantColumn = this.conf.getBool("enableRedundantColumn", false); this.typeList = new ArrayList(); for (ElasticSearchColumn esColumn : this.columnList) { this.typeList.add(ElasticSearchFieldType.getESFieldType(esColumn.getType())); if (esColumn.getCombineFields() != null && esColumn.getCombineFields().size() > 0 && ElasticSearchFieldType.getESFieldType(esColumn.getType()).equals(ElasticSearchFieldType.ID)) { combinedIdColumn = esColumn; } } this.primaryKeyInfo = Key.getPrimaryKeyInfo(this.conf); this.esPartitionColumn = Key.getEsPartitionColumn(this.conf); this.colNameToIndexMap = new HashMap(5); this.handleMetaKeys(); this.esClient = new ElasticSearchClient(this.conf); } private void handleMetaKeys() { if (null != this.primaryKeyInfo && null != this.primaryKeyInfo.getColumn() && !this.primaryKeyInfo.getColumn().isEmpty()) { this.hasPrimaryKeyInfo = true; if (null == this.primaryKeyInfo.getFieldDelimiter()) { if (null != this.fieldDelimiter) { this.primaryKeyInfo.setFieldDelimiter(this.fieldDelimiter); } else { this.primaryKeyInfo.setFieldDelimiter(""); } } for (String eachPk : this.primaryKeyInfo.getColumn()) { boolean foundKeyInColumn = false; for (int i = 0; i < columnList.size(); i++) { if (StringUtils.equals(eachPk, columnList.get(i).getName())) { this.colNameToIndexMap.put(eachPk, i); foundKeyInColumn = true; break; } } if (!foundKeyInColumn) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.RECORD_FIELD_NOT_FOUND, "primaryKeyInfo has column not exists in column"); } } } if (null != this.esPartitionColumn && !this.esPartitionColumn.isEmpty()) { this.hasEsPartitionColumn = true; for (PartitionColumn eachPartitionCol : this.esPartitionColumn) { boolean foundKeyInColumn = false; for (int i = 0; i < columnList.size(); i++) { if (StringUtils.equals(eachPartitionCol.getName(), columnList.get(i).getName())) { this.colNameToIndexMap.put(eachPartitionCol.getName(), i); foundKeyInColumn = true; break; } } if (!foundKeyInColumn) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.RECORD_FIELD_NOT_FOUND, "esPartitionColumn has column not exists in column"); } } } } private void parseDeleteCondition(Configuration conf) { List> list = new ArrayList>(); String config = Key.getDeleteBy(conf); if (config != null) { JSONArray array = JSON.parseArray(config); for (Object obj : array) { list.add((Map) obj); } deleteByConditions = list; } } @Override public void prepare() { } /** * 示例:{ * "deleteBy" : [ * {"product_status" : [-1,-2], "sub_status" : -3}, * {"product_status" : -3} * ] * } * * 表示以下两类数据删除: * 1. product_status为-1或-2并且sub_status为-3 * 2. product_status为-3 * * 注意[{}]返回true * @param record * @return */ private boolean isDeleteRecord(Record record) { if (deleteByConditions == null) { return false; } Map kv = new HashMap(); for (int i = 0; i < record.getColumnNumber(); i++) { Column column = record.getColumn(i); String columnName = columnList.get(i).getName(); kv.put(columnName, column.asString()); } for (Map delCondition : deleteByConditions) { if (meetAllCondition(kv, delCondition)) { return true; } } return false; } private boolean meetAllCondition(Map kv, Map delCondition) { for (Map.Entry oneCondition : delCondition.entrySet()) { if (!checkOneCondition(kv, oneCondition)) { return false; } } return true; } private boolean checkOneCondition(Map kv, Map.Entry entry) { Object value = kv.get(entry.getKey()); if (entry.getValue() instanceof List) { for (Object obj : (List) entry.getValue()) { if (obj.toString().equals(value)) { return true; } } } else { if (value != null && value.equals(entry.getValue().toString())) { return true; } } return false; } @Override public void startWrite(RecordReceiver recordReceiver) { List writerBuffer = new ArrayList(this.batchSize); Record record = null; while ((record = recordReceiver.getFromReader()) != null) { if (!columnSizeChecked) { boolean isInvalid = true; if (enableRedundantColumn) { isInvalid = this.columnList.size() > record.getColumnNumber(); } else { isInvalid = this.columnList.size() != record.getColumnNumber(); } if (isInvalid) { String message = String.format( "column number not equal error, reader column size is %s, but the writer column size is %s", record.getColumnNumber(), this.columnList.size()); throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, message); } columnSizeChecked = true; } writerBuffer.add(record); if (writerBuffer.size() >= this.batchSize) { this.doBatchInsert(writerBuffer); writerBuffer.clear(); } } if (!writerBuffer.isEmpty()) { this.doBatchInsert(writerBuffer); writerBuffer.clear(); } } private String getDateStr(ElasticSearchColumn esColumn, Column column) { // 如果保持原样,就直接返回 if (esColumn.isOrigin()) { return column.asString(); } DateTime date = null; DateTimeZone dtz = DateTimeZone.getDefault(); if (esColumn.getTimezone() != null) { // 所有时区参考 http://www.joda.org/joda-time/timezones.html // TODO:创建一次多处复用 dtz = DateTimeZone.forID(esColumn.getTimezone()); } if (column.getType() != Column.Type.DATE && esColumn.getFormat() != null) { // TODO:创建一次多处复用 DateTimeFormatter formatter = DateTimeFormat.forPattern(esColumn.getFormat()); date = formatter.withZone(dtz).parseDateTime(column.asString()); return date.toString(); } else if (column.getType() == Column.Type.DATE) { if (null == column.getRawData()) { return null; } else { date = new DateTime(column.asLong(), dtz); return date.toString(); } } else { return column.asString(); } } private void doBatchInsert(final List writerBuffer) { Map data = null; Bulk.Builder bulkactionTmp = null; int totalNumber = writerBuffer.size(); int dirtyDataNumber = 0; if (this.isGreaterOrEqualThan7) { bulkactionTmp = new Bulk.Builder().defaultIndex(this.index); } else { bulkactionTmp = new Bulk.Builder().defaultIndex(this.index).defaultType(this.type); } final Bulk.Builder bulkaction = bulkactionTmp; // 增加url的参数 for (Map.Entry entry : urlParams.entrySet()) { bulkaction.setParameter(entry.getKey(), entry.getValue()); } for (Record record : writerBuffer) { data = new HashMap(); String id = null; String parent = null; String routing = null; String version = null; String columnName = null; Column column = null; try { for (int i = 0; i < record.getColumnNumber(); i++) { column = record.getColumn(i); columnName = columnList.get(i).getName(); // 如果组合id不等于null,需要把相关的字段全部忽略 if (combinedIdColumn != null) { if (combinedIdColumn.getCombineFields().contains(columnName)) { continue; } } //如果是json数组,当成对象类型处理 ElasticSearchFieldType columnType = columnList.get(i).isJsonArray() ? ElasticSearchFieldType.NESTED : typeList.get(i); Boolean dstArray = columnList.get(i).isDstArray(); //如果是数组类型,那它传入的是字符串类型,也有可能是null if (columnList.get(i).isArray() && null != column.asString()) { String[] dataList = column.asString().split(splitter); if (!columnType.equals(ElasticSearchFieldType.DATE)) { if (dstArray) { try { // 根据客户配置的类型,转换成相应的类型 switch (columnType) { case BYTE: case KEYWORD: case TEXT: data.put(columnName, dataList); break; case SHORT: case INTEGER: if (StringUtils.isBlank(column.asString().trim())) { data.put(columnName, null); } else { Integer[] intDataList = new Integer[dataList.length]; for (int j = 0; j < dataList.length; j++) { dataList[j] = dataList[j].trim(); if (StringUtils.isNotBlank(dataList[j])) { intDataList[j] = Integer.valueOf(dataList[j]); } } data.put(columnName, intDataList); } break; case LONG: if (StringUtils.isBlank(column.asString().trim())) { data.put(columnName, null); } else { Long[] longDataList = new Long[dataList.length]; for (int j = 0; j < dataList.length; j++) { dataList[j] = dataList[j].trim(); if (StringUtils.isNotBlank(dataList[j])) { longDataList[j] = Long.valueOf(dataList[j]); } } data.put(columnName, longDataList); } break; case FLOAT: case DOUBLE: if (StringUtils.isBlank(column.asString().trim())) { data.put(columnName, null); } else { Double[] doubleDataList = new Double[dataList.length]; for (int j = 0; j < dataList.length; j++) { dataList[j] = dataList[j].trim(); if (StringUtils.isNotBlank(dataList[j])) { doubleDataList[j] = Double.valueOf(dataList[j]); } } data.put(columnName, doubleDataList); } break; default: data.put(columnName, dataList); break; } } catch (Exception e) { LOGGER.info("脏数据,记录:{}", record.toString()); continue; } } else { data.put(columnName, dataList); } } else { data.put(columnName, dataList); } } else { // LOGGER.info("columnType: {} integer: {}", columnType, column.asString()); switch (columnType) { case ID: if (id != null) { id += record.getColumn(i).asString(); } else { id = record.getColumn(i).asString(); } break; case PARENT: if (parent != null) { parent += record.getColumn(i).asString(); } else { parent = record.getColumn(i).asString(); } break; case ROUTING: if (routing != null) { routing += record.getColumn(i).asString(); } else { routing = record.getColumn(i).asString(); } break; case VERSION: if (version != null) { version += record.getColumn(i).asString(); } else { version = record.getColumn(i).asString(); } break; case DATE: String dateStr = getDateStr(columnList.get(i), column); data.put(columnName, dateStr); break; case KEYWORD: case STRING: case TEXT: case IP: case GEO_POINT: case IP_RANGE: data.put(columnName, column.asString()); break; case BOOLEAN: data.put(columnName, column.asBoolean()); break; case BYTE: case BINARY: // json序列化不支持byte类型,es支持的binary类型,必须传入base64的格式 data.put(columnName, column.asString()); break; case LONG: data.put(columnName, column.asLong()); break; case INTEGER: data.put(columnName, column.asLong()); break; case SHORT: data.put(columnName, column.asLong()); break; case FLOAT: case DOUBLE: data.put(columnName, column.asDouble()); break; case GEO_SHAPE: case DATE_RANGE: case INTEGER_RANGE: case FLOAT_RANGE: case LONG_RANGE: case DOUBLE_RANGE: if (null == column.asString()) { data.put(columnName, column.asString()); } else { data.put(columnName, JSON.parse(column.asString())); } break; case NESTED: case OBJECT: if (null == column.asString()) { data.put(columnName, column.asString()); } else { // 转json格式 data.put(columnName, JSON.parse(column.asString())); } break; default: throw DataXException.asDataXException(ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE, String.format( "Type error: unsupported type %s for column %s", columnType, columnName)); } } } if (this.hasPrimaryKeyInfo) { List idData = new ArrayList(); for (String eachCol : this.primaryKeyInfo.getColumn()) { Column recordColumn = record.getColumn(this.colNameToIndexMap.get(eachCol)); idData.add(recordColumn.asString()); } id = StringUtils.join(idData, this.primaryKeyInfo.getFieldDelimiter()); } if (this.hasEsPartitionColumn) { List idData = new ArrayList(); for (PartitionColumn eachCol : this.esPartitionColumn) { Column recordColumn = record.getColumn(this.colNameToIndexMap.get(eachCol.getName())); idData.add(recordColumn.asString()); } routing = StringUtils.join(idData, ""); } } catch (Exception e) { // 脏数据 super.getTaskPluginCollector().collectDirtyRecord(record, String.format("parse error for column: %s errorMessage: %s", columnName, e.getMessage())); dirtyDataNumber++; // 处理下一个record continue; } if (LOGGER.isDebugEnabled()) { LOGGER.debug("id: {} routing: {} data: {}", id, routing, JSON.toJSONString(data)); } if (isDeleteRecord(record)) { Delete.Builder builder = new Delete.Builder(id); bulkaction.addAction(builder.build()); } else { // 使用用户自定义组合唯一键 if (combinedIdColumn != null) { try { id = processIDCombineFields(record, combinedIdColumn); // LOGGER.debug("id: {}", id); } catch (Exception e) { // 脏数据 super.getTaskPluginCollector().collectDirtyRecord(record, String.format("parse error for column: %s errorMessage: %s", columnName, e.getMessage())); // 处理下一个record dirtyDataNumber++; continue; } } switch (actionType) { case INDEX: // 先进行json序列化,jest client的gson序列化会把等号按照html序列化 Index.Builder builder = null; if (this.enableWriteNull) { builder = new Index.Builder( JSONObject.toJSONString(data, JSONWriter.Feature.WriteMapNullValue, JSONWriter.Feature.WriteEnumUsingToString)); } else { builder = new Index.Builder(JSONObject.toJSONString(data)); } if (id != null) { builder.id(id); } if (parent != null) { builder.setParameter(Parameters.PARENT, parent); } if (routing != null) { builder.setParameter(Parameters.ROUTING, routing); } if (version != null) { builder.setParameter(Parameters.VERSION, version); builder.setParameter(Parameters.VERSION_TYPE, "external"); } bulkaction.addAction(builder.build()); break; case UPDATE: // doc: https://www.cnblogs.com/crystaltu/articles/6992935.html // doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html Map updateDoc = new HashMap(); updateDoc.put("doc", data); updateDoc.put("doc_as_upsert", true); Update.Builder update = null; if (this.enableWriteNull) { // write: {a:"1",b:null} update = new Update.Builder( JSONObject.toJSONString(updateDoc, JSONWriter.Feature.WriteMapNullValue, JSONWriter.Feature.WriteEnumUsingToString)); // 在DEFAULT_GENERATE_FEATURE基础上,只增加了SerializerFeature.WRITE_MAP_NULL_FEATURES } else { // write: {"a":"1"} update = new Update.Builder(JSONObject.toJSONString(updateDoc)); } if (id != null) { update.id(id); } if (parent != null) { update.setParameter(Parameters.PARENT, parent); } if (routing != null) { update.setParameter(Parameters.ROUTING, routing); } // version type [EXTERNAL] is not supported by the update API if (version != null) { update.setParameter(Parameters.VERSION, version); } bulkaction.addAction(update.build()); break; default: break; } } } if (dirtyDataNumber >= totalNumber) { // all batch is dirty data LOGGER.warn("all this batch is dirty data, dirtyDataNumber: {} totalDataNumber: {}", dirtyDataNumber, totalNumber); return; } BulkResult bulkResult = null; try { bulkResult = RetryUtil.executeWithRetry(new Callable() { @Override public BulkResult call() throws Exception { JestResult jestResult = esClient.bulkInsert(bulkaction); if (jestResult.isSucceeded()) { return null; } String msg = String.format("response code: [%d] error :[%s]", jestResult.getResponseCode(), jestResult.getErrorMessage()); LOGGER.warn(msg); if (esClient.isBulkResult(jestResult)) { BulkResult brst = (BulkResult) jestResult; List failedItems = brst.getFailedItems(); for (BulkResult.BulkResultItem item : failedItems) { if (item.status != 400) { // 400 BAD_REQUEST 如果非数据异常,请求异常,则不允许忽略 throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_INDEX_INSERT, String.format("status:[%d], error: %s", item.status, item.error)); } else { // 如果用户选择不忽略解析错误,则抛异常,默认为忽略 if (!Key.isIgnoreParseError(conf)) { throw new NoReRunException(ElasticSearchWriterErrorCode.ES_INDEX_INSERT, String.format( "status:[%d], error: %s, config not ignoreParseError so throw this error", item.status, item.error)); } } } return brst; } else { Integer status = esClient.getStatus(jestResult); switch (status) { case 429: // TOO_MANY_REQUESTS LOGGER.warn("server response too many requests, so auto reduce speed"); break; default: break; } throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_INDEX_INSERT, jestResult.getErrorMessage()); } } }, this.trySize, this.tryInterval, false, Arrays.asList(DataXException.class)); } catch (Exception e) { if (Key.isIgnoreWriteError(this.conf)) { LOGGER.warn(String.format("Retry [%d] write failed, ignore the error, continue to write!", trySize)); } else { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.ES_INDEX_INSERT, e.getMessage(), e); } } if (null != bulkResult) { List items = bulkResult.getItems(); for (int idx = 0; idx < items.size(); ++idx) { BulkResult.BulkResultItem item = items.get(idx); if (item.error != null && !"".equals(item.error)) { super.getTaskPluginCollector().collectDirtyRecord(writerBuffer.get(idx), String.format("status:[%d], error: %s", item.status, item.error)); } } } } private int getRecordColumnIndex(Record record, String columnName) { if (colNameToIndexMap.containsKey(columnName)) { return colNameToIndexMap.get(columnName); } List columns = new ArrayList(); int index = -1; for (int i=0; i 1) { throw DataXException.asDataXException( ElasticSearchWriterErrorCode.RECORD_FIELD_NOT_FOUND, "record has multiple columns found by name: " + columnName); } colNameToIndexMap.put(columnName, index); return index; } private String processIDCombineFields(Record record, ElasticSearchColumn esColumn) { List values = new ArrayList(esColumn.getCombineFields().size()); for (String field : esColumn.getCombineFields()) { int colIndex = getRecordColumnIndex(record, field); Column col = record.getColumnNumber() <= colIndex ? null : record.getColumn(colIndex); if (col == null) { throw DataXException.asDataXException(ElasticSearchWriterErrorCode.RECORD_FIELD_NOT_FOUND, field); } values.add(col.asString()); } return Joiner.on(esColumn.getCombineFieldsValueSeparator()).join(values); } @Override public void post() { } @Override public void destroy() { try { this.esClient.closeJestClient(); } catch (Exception e) { LOGGER.warn("ignore close jest client error: {}", e.getMessage()); } } } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/ElasticSearchWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum ElasticSearchWriterErrorCode implements ErrorCode { BAD_CONFIG_VALUE("ESWriter-00", "The value you configured is not valid."), ES_INDEX_DELETE("ESWriter-01", "Delete index error."), ES_INDEX_CREATE("ESWriter-02", "Index creation error."), ES_MAPPINGS("ESWriter-03", "The mappings error."), ES_INDEX_INSERT("ESWriter-04", "Insert data error."), ES_ALIAS_MODIFY("ESWriter-05", "Alias modification error."), JSON_PARSE("ESWrite-06", "Json format parsing error"), UPDATE_WITH_ID("ESWrite-07", "Update mode must specify column type with id"), RECORD_FIELD_NOT_FOUND("ESWrite-08", "Field does not exist in the original table"), ES_GET_SETTINGS("ESWriter-09", "get settings failed"); ; private final String code; private final String description; ElasticSearchWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/JsonPathUtil.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import java.util.List; import com.alibaba.fastjson2.JSONObject; public class JsonPathUtil { public static JSONObject getJsonObject(List paths, JSONObject data) { if (null == paths || paths.isEmpty()) { return data; } if (null == data) { return null; } JSONObject dataTmp = data; for (String each : paths) { if (null != dataTmp) { dataTmp = dataTmp.getJSONObject(each); } else { return null; } } return dataTmp; } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/JsonUtil.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONException; import com.alibaba.fastjson2.JSONObject; /** * @author bozu * @date 2021/01/06 */ public class JsonUtil { /** * 合并两个json * @param source 源json * @param target 目标json * @return 合并后的json * @throws JSONException */ public static String mergeJsonStr(String source, String target) throws JSONException { if(source == null) { return target; } if(target == null) { return source; } return JSON.toJSONString(deepMerge(JSON.parseObject(source), JSON.parseObject(target))); } /** * 深度合并两个json对象,将source的值,merge到target中 * @param source 源json * @param target 目标json * @return 合并后的json * @throws JSONException */ private static JSONObject deepMerge(JSONObject source, JSONObject target) throws JSONException { for (String key: source.keySet()) { Object value = source.get(key); if (target.containsKey(key)) { // existing value for "key" - recursively deep merge: if (value instanceof JSONObject) { JSONObject valueJson = (JSONObject)value; deepMerge(valueJson, target.getJSONObject(key)); } else { target.put(key, value); } } else { target.put(key, value); } } return target; } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import java.util.HashMap; import java.util.List; import java.util.Map; public final class Key { // ---------------------------------------- // 类型定义 主键字段定义 // ---------------------------------------- public static final String PRIMARY_KEY_COLUMN_NAME = "pk"; public static enum ActionType { UNKONW, INDEX, CREATE, DELETE, UPDATE } public static ActionType getActionType(Configuration conf) { String actionType = conf.getString("actionType", "index"); if ("index".equals(actionType)) { return ActionType.INDEX; } else if ("create".equals(actionType)) { return ActionType.CREATE; } else if ("delete".equals(actionType)) { return ActionType.DELETE; } else if ("update".equals(actionType)) { return ActionType.UPDATE; } else { return ActionType.UNKONW; } } public static String getEndpoint(Configuration conf) { return conf.getNecessaryValue("endpoint", ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE); } public static String getUsername(Configuration conf) { return conf.getString("username", conf.getString("accessId")); } public static String getPassword(Configuration conf) { return conf.getString("password", conf.getString("accessKey")); } public static int getBatchSize(Configuration conf) { return conf.getInt("batchSize", 1024); } public static int getTrySize(Configuration conf) { return conf.getInt("trySize", 30); } public static long getTryInterval(Configuration conf) { return conf.getLong("tryInterval", 60000L); } public static int getTimeout(Configuration conf) { return conf.getInt("timeout", 600000); } public static boolean isTruncate(Configuration conf) { return conf.getBool("truncate", conf.getBool("cleanup", false)); } public static boolean isDiscovery(Configuration conf) { return conf.getBool("discovery", false); } public static boolean isCompression(Configuration conf) { return conf.getBool("compress", conf.getBool("compression", true)); } public static boolean isMultiThread(Configuration conf) { return conf.getBool("multiThread", true); } public static String getIndexName(Configuration conf) { return conf.getNecessaryValue("index", ElasticSearchWriterErrorCode.BAD_CONFIG_VALUE); } public static String getDeleteBy(Configuration conf) { return conf.getString("deleteBy"); } /** * TODO: 在7.0开始,一个索引只能建一个Type为_doc * */ public static String getTypeName(Configuration conf) { String indexType = conf.getString("indexType"); if(StringUtils.isBlank(indexType)){ indexType = conf.getString("type", getIndexName(conf)); } return indexType; } public static boolean isIgnoreWriteError(Configuration conf) { return conf.getBool("ignoreWriteError", false); } public static boolean isIgnoreParseError(Configuration conf) { return conf.getBool("ignoreParseError", true); } public static boolean isHighSpeedMode(Configuration conf) { if ("highspeed".equals(conf.getString("mode", ""))) { return true; } return false; } public static String getAlias(Configuration conf) { return conf.getString("alias", ""); } public static boolean isNeedCleanAlias(Configuration conf) { String mode = conf.getString("aliasMode", "append"); if ("exclusive".equals(mode)) { return true; } return false; } public static Map getSettings(Configuration conf) { return conf.getMap("settings", new HashMap()); } public static String getSplitter(Configuration conf) { return conf.getString("splitter", "-,-"); } public static boolean getDynamic(Configuration conf) { return conf.getBool("dynamic", false); } public static String getDstDynamic(Configuration conf) { return conf.getString("dstDynamic"); } public static String getDiscoveryFilter(Configuration conf){ return conf.getString("discoveryFilter","_all"); } public static Boolean getVersioning(Configuration conf) { return conf.getBool("versioning", false); } public static Long getUnifiedVersion(Configuration conf) { return conf.getLong("version", System.currentTimeMillis()); } public static Map getUrlParams(Configuration conf) { return conf.getMap("urlParams", new HashMap()); } public static Integer getESVersion(Configuration conf) { return conf.getInt("esVersion"); } public static String getMasterTimeout(Configuration conf) { return conf.getString("masterTimeout", "5m"); } public static boolean isEnableNullUpdate(Configuration conf) { return conf.getBool("enableWriteNull", true); } public static String getFieldDelimiter(Configuration conf) { return conf.getString("fieldDelimiter", ""); } public static PrimaryKeyInfo getPrimaryKeyInfo(Configuration conf) { String primaryKeyInfoString = conf.getString("primaryKeyInfo"); if (StringUtils.isNotBlank(primaryKeyInfoString)) { return JSON.parseObject(primaryKeyInfoString, new TypeReference() {}); } else { return null; } } public static List getEsPartitionColumn(Configuration conf) { String esPartitionColumnString = conf.getString("esPartitionColumn"); if (StringUtils.isNotBlank(esPartitionColumnString)) { return JSON.parseObject(esPartitionColumnString, new TypeReference>() {}); } else { return null; } } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/NoReRunException.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.spi.ErrorCode; public class NoReRunException extends DataXException { public NoReRunException(String errorMessage) { super(errorMessage); } public NoReRunException(ErrorCode errorCode, String errorMessage) { super(errorCode, errorMessage); } private static final long serialVersionUID = 1L; } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/PartitionColumn.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; public class PartitionColumn { private String name; // like: DATA private String metaType; private String comment; // like: VARCHAR private String type; public String getName() { return name; } public String getMetaType() { return metaType; } public String getComment() { return comment; } public String getType() { return type; } public void setName(String name) { this.name = name; } public void setMetaType(String metaType) { this.metaType = metaType; } public void setComment(String comment) { this.comment = comment; } public void setType(String type) { this.type = type; } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/PrimaryKeyInfo.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter; import java.util.List; public class PrimaryKeyInfo { /** * 主键类型:PrimaryKeyTypeEnum * * pk: 单个(业务)主键 specific: 联合主键 */ private String type; /** * 用户定义的联合主键的连接符号 */ private String fieldDelimiter; /** * 主键的列的名称 */ private List column; public String getType() { return type; } public String getFieldDelimiter() { return fieldDelimiter; } public List getColumn() { return column; } public void setType(String type) { this.type = type; } public void setFieldDelimiter(String fieldDelimiter) { this.fieldDelimiter = fieldDelimiter; } public void setColumn(List column) { this.column = column; } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/ClusterInfo.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter.jest; import com.google.gson.Gson; import io.searchbox.action.AbstractAction; import io.searchbox.client.config.ElasticsearchVersion; public class ClusterInfo extends AbstractAction { @Override protected String buildURI(ElasticsearchVersion elasticsearchVersion) { return ""; } @Override public String getRestMethodName() { return "GET"; } @Override public ClusterInfoResult createNewElasticSearchResult(String responseBody, int statusCode, String reasonPhrase, Gson gson) { return createNewElasticSearchResult(new ClusterInfoResult(gson), responseBody, statusCode, reasonPhrase, gson); } public static class Builder extends AbstractAction.Builder { public Builder() { setHeader("accept", "application/json"); setHeader("content-type", "application/json"); } @Override public ClusterInfo build() { return new ClusterInfo(); } } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/ClusterInfoResult.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter.jest; import com.google.gson.Gson; import io.searchbox.client.JestResult; import java.util.regex.Matcher; import java.util.regex.Pattern; public class ClusterInfoResult extends JestResult { private static final Pattern FIRST_NUMBER = Pattern.compile("\\d"); private static final int SEVEN = 7; public ClusterInfoResult(Gson gson) { super(gson); } public ClusterInfoResult(JestResult source) { super(source); } /** * 判断es集群的部署版本是否大于7.x * 大于7.x的es对于Index的type有较大改动,需要做额外判定 * 对于7.x与6.x版本的es都做过测试,返回符合预期;5.x以下版本直接try-catch后返回false,向下兼容 * @return */ public Boolean isGreaterOrEqualThan7() throws Exception { // 如果是没有权限,直接返回false,兼容老版本 if (responseCode == 403) { return false; } if (!isSucceeded) { throw new Exception(getJsonString()); } try { String version = jsonObject.getAsJsonObject("version").get("number").toString(); Matcher matcher = FIRST_NUMBER.matcher(version); matcher.find(); String number = matcher.group(); Integer versionNum = Integer.valueOf(number); return versionNum >= SEVEN; } catch (Exception e) { //5.x 以下版本不做兼容测试,如果返回json格式解析失败,有可能是以下版本,所以认为不大于7.x return false; } } } ================================================ FILE: elasticsearchwriter/src/main/java/com/alibaba/datax/plugin/writer/elasticsearchwriter/jest/PutMapping7.java ================================================ package com.alibaba.datax.plugin.writer.elasticsearchwriter.jest; import io.searchbox.action.GenericResultAbstractAction; import io.searchbox.client.config.ElasticsearchVersion; public class PutMapping7 extends GenericResultAbstractAction { protected PutMapping7(PutMapping7.Builder builder) { super(builder); this.indexName = builder.index; this.payload = builder.source; } @Override protected String buildURI(ElasticsearchVersion elasticsearchVersion) { return super.buildURI(elasticsearchVersion) + "/_mapping"; } @Override public String getRestMethodName() { return "PUT"; } public static class Builder extends GenericResultAbstractAction.Builder { private String index; private Object source; public Builder(String index, Object source) { this.index = index; this.source = source; } @Override public PutMapping7 build() { return new PutMapping7(this); } } } ================================================ FILE: elasticsearchwriter/src/main/resources/plugin.json ================================================ { "name": "elasticsearchwriter", "class": "com.alibaba.datax.plugin.writer.elasticsearchwriter.ElasticSearchWriter", "description": "适用于: 生产环境. 原理: TODO", "developer": "alibaba" } ================================================ FILE: ftpreader/doc/ftpreader.md ================================================ # DataX FtpReader 说明 ------------ ## 1 快速介绍 FtpReader提供了读取远程FTP文件系统数据存储的能力。在底层实现上,FtpReader获取远程FTP文件数据,并转换为DataX传输协议传递给Writer。 **本地文件内容存放的是一张逻辑意义上的二维表,例如CSV格式的文本信息。** ## 2 功能与限制 FtpReader实现了从远程FTP文件读取数据并转为DataX协议的功能,远程FTP文件本身是无结构化数据存储,对于DataX而言,FtpReader实现上类比TxtFileReader,有诸多相似之处。目前FtpReader支持功能如下: 1. 支持且仅支持读取TXT的文件,且要求TXT中shema为一张二维表。 2. 支持类CSV格式文件,自定义分隔符。 3. 支持多种类型数据读取(使用String表示),支持列裁剪,支持列常量 4. 支持递归读取、支持文件名过滤。 5. 支持文本压缩,现有压缩格式为zip、gzip、bzip2。 6. 多个File可以支持并发读取。 我们暂时不能做到: 1. 单个File支持多线程并发读取,这里涉及到单个File内部切分算法。二期考虑支持。 2. 单个File在压缩情况下,从技术上无法支持多线程并发读取。 ## 3 功能说明 ### 3.1 配置样例 ```json { "setting": {}, "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": { "name": "ftpreader", "parameter": { "protocol": "sftp", "host": "127.0.0.1", "port": 22, "username": "xx", "password": "xxx", "path": [ "/home/hanfa.shf/ftpReaderTest/data" ], "column": [ { "index": 0, "type": "long" }, { "index": 1, "type": "boolean" }, { "index": 2, "type": "double" }, { "index": 3, "type": "string" }, { "index": 4, "type": "date", "format": "yyyy.MM.dd" } ], "encoding": "UTF-8", "fieldDelimiter": "," } }, "writer": { "name": "ftpWriter", "parameter": { "path": "/home/hanfa.shf/ftpReaderTest/result", "fileName": "shihf", "writeMode": "truncate", "format": "yyyy-MM-dd" } } } ] } } ``` ### 3.2 参数说明 * **protocol** * 描述:ftp服务器协议,目前支持传输协议有ftp和sftp。
* 必选:是
* 默认值:无
* **host** * 描述:ftp服务器地址。
* 必选:是
* 默认值:无
* **port** * 描述:ftp服务器端口。
* 必选:否
* 默认值:若传输协议是sftp协议,默认值是22;若传输协议是标准ftp协议,默认值是21
* **timeout** * 描述:连接ftp服务器连接超时时间,单位毫秒。
* 必选:否
* 默认值:60000(1分钟)
* **connectPattern** * 描述:连接模式(主动模式或者被动模式)。该参数只在传输协议是标准ftp协议时使用,值只能为:PORT (主动),PASV(被动)。两种模式主要的不同是数据连接建立的不同。对于Port模式,是客户端在本地打开一个端口等服务器去连接建立数据连接,而Pasv模式就是服务器打开一个端口等待客户端去建立一个数据连接。
* 必选:否
* 默认值:PASV
* **username** * 描述:ftp服务器访问用户名。
* 必选:是
* 默认值:无
* **password** * 描述:ftp服务器访问密码。
* 必选:是
* 默认值:无
* **path** * 描述:远程FTP文件系统的路径信息,注意这里可以支持填写多个路径。
当指定单个远程FTP文件,FtpReader暂时只能使用单线程进行数据抽取。二期考虑在非压缩文件情况下针对单个File可以进行多线程并发读取。 当指定多个远程FTP文件,FtpReader支持使用多线程进行数据抽取。线程并发数通过通道数指定。 当指定通配符,FtpReader尝试遍历出多个文件信息。例如: 指定/*代表读取/目录下所有的文件,指定/bazhen/\*代表读取bazhen目录下游所有的文件。**FtpReader目前只支持\*作为文件通配符。** **特别需要注意的是,DataX会将一个作业下同步的所有Text File视作同一张数据表。用户必须自己保证所有的File能够适配同一套schema信息。读取文件用户必须保证为类CSV格式,并且提供给DataX权限可读。** **特别需要注意的是,如果Path指定的路径下没有符合匹配的文件抽取,DataX将报错。** * 必选:是
* 默认值:无
* **column** * 描述:读取字段列表,type指定源数据的类型,index指定当前列来自于文本第几列(以0开始),value指定当前类型为常量,不从源头文件读取数据,而是根据value值自动生成对应的列。
默认情况下,用户可以全部按照String类型读取数据,配置如下: ```json "column": ["*"] ``` 用户可以指定Column字段信息,配置如下: ```json { "type": "long", "index": 0 //从远程FTP文件文本第一列获取int字段 }, { "type": "string", "value": "alibaba" //从FtpReader内部生成alibaba的字符串字段作为当前字段 } ``` 对于用户指定Column信息,type必须填写,index/value必须选择其一。 * 必选:是
* 默认值:全部按照string类型读取
* **fieldDelimiter** * 描述:读取的字段分隔符
* 必选:是
* 默认值:,
* **compress** * 描述:文本压缩类型,默认不填写意味着没有压缩。支持压缩类型为zip、gzip、bzip2。
* 必选:否
* 默认值:没有压缩
* **encoding** * 描述:读取文件的编码配置。
* 必选:否
* 默认值:utf-8
* **skipHeader** * 描述:类CSV格式文件可能存在表头为标题情况,需要跳过。默认不跳过。
* 必选:否
* 默认值:false
* **nullFormat** * 描述:文本文件中无法使用标准字符串定义null(空指针),DataX提供nullFormat定义哪些字符串可以表示为null。
例如如果用户配置: nullFormat:"\N",那么如果源头数据是"\N",DataX视作null字段。 * 必选:否
* 默认值:\N
* **maxTraversalLevel** * 描述:允许遍历文件夹的最大层数。
* 必选:否
* 默认值:100
* **csvReaderConfig** * 描述:读取CSV类型文件参数配置,Map类型。读取CSV类型文件使用的CsvReader进行读取,会有很多配置,不配置则使用默认值。
* 必选:否
* 默认值:无
常见配置: ```json "csvReaderConfig":{ "safetySwitch": false, "skipEmptyRecords": false, "useTextQualifier": false } ``` 所有配置项及默认值,配置时 csvReaderConfig 的map中请**严格按照以下字段名字进行配置**: ``` boolean caseSensitive = true; char textQualifier = 34; boolean trimWhitespace = true; boolean useTextQualifier = true;//是否使用csv转义字符 char delimiter = 44;//分隔符 char recordDelimiter = 0; char comment = 35; boolean useComments = false; int escapeMode = 1; boolean safetySwitch = true;//单列长度是否限制100000字符 boolean skipEmptyRecords = true;//是否跳过空行 boolean captureRawRecord = true; ``` ### 3.3 类型转换 远程FTP文件本身不提供数据类型,该类型是DataX FtpReader定义: | DataX 内部类型| 远程FTP文件 数据类型 | | -------- | ----- | | | Long |Long | | Double |Double| | String |String| | Boolean |Boolean | | Date |Date | 其中: * 远程FTP文件 Long是指远程FTP文件文本中使用整形的字符串表示形式,例如"19901219"。 * 远程FTP文件 Double是指远程FTP文件文本中使用Double的字符串表示形式,例如"3.1415"。 * 远程FTP文件 Boolean是指远程FTP文件文本中使用Boolean的字符串表示形式,例如"true"、"false"。不区分大小写。 * 远程FTP文件 Date是指远程FTP文件文本中使用Date的字符串表示形式,例如"2014-12-31",Date可以指定format格式。 ## 4 性能报告 ## 5 约束限制 略 ## 6 FAQ 略 ================================================ FILE: ftpreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT ftpreader ftpreader FtpReader提供了读取指定ftp服务器文件功能,并可以根据用户配置的类型进行类型转换,建议开发、测试环境使用。 jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 com.jcraft jsch 0.1.54 commons-net commons-net 3.3 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: ftpreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/ftpreader target/ ftpreader-0.0.1-SNAPSHOT.jar plugin/reader/ftpreader false plugin/reader/ftpreader/libs runtime ================================================ FILE: ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.ftpreader; public class Constant { public static final String SOURCE_FILES = "sourceFiles"; public static final int DEFAULT_FTP_PORT = 21; public static final int DEFAULT_SFTP_PORT = 22; public static final int DEFAULT_TIMEOUT = 60000; public static final int DEFAULT_MAX_TRAVERSAL_LEVEL = 100; public static final String DEFAULT_FTP_CONNECT_PATTERN = "PASV"; } ================================================ FILE: ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/FtpHelper.java ================================================ package com.alibaba.datax.plugin.reader.ftpreader; import java.io.InputStream; import java.util.HashSet; import java.util.List; public abstract class FtpHelper { /** * * @Title: LoginFtpServer * @Description: 与ftp服务器建立连接 * @param @param host * @param @param username * @param @param password * @param @param port * @param @param timeout * @param @param connectMode * @return void * @throws */ public abstract void loginFtpServer(String host, String username, String password, int port, int timeout,String connectMode) ; /** * * @Title: LogoutFtpServer * todo 方法名首字母 * @Description: 断开与ftp服务器的连接 * @param * @return void * @throws */ public abstract void logoutFtpServer(); /** * * @Title: isDirExist * @Description: 判断指定路径是否是目录 * @param @param directoryPath * @param @return * @return boolean * @throws */ public abstract boolean isDirExist(String directoryPath); /** * * @Title: isFileExist * @Description: 判断指定路径是否是文件 * @param @param filePath * @param @return * @return boolean * @throws */ public abstract boolean isFileExist(String filePath); /** * * @Title: isSymbolicLink * @Description: 判断指定路径是否是软链接 * @param @param filePath * @param @return * @return boolean * @throws */ public abstract boolean isSymbolicLink(String filePath); /** * * @Title: getListFiles * @Description: 递归获取指定路径下符合条件的所有文件绝对路径 * @param @param directoryPath * @param @param parentLevel 父目录的递归层数(首次为0) * @param @param maxTraversalLevel 允许的最大递归层数 * @param @return * @return HashSet * @throws */ public abstract HashSet getListFiles(String directoryPath, int parentLevel, int maxTraversalLevel); /** * * @Title: getInputStream * @Description: 获取指定路径的输入流 * @param @param filePath * @param @return * @return InputStream * @throws */ public abstract InputStream getInputStream(String filePath); /** * * @Title: getAllFiles * @Description: 获取指定路径列表下符合条件的所有文件的绝对路径 * @param @param srcPaths 路径列表 * @param @param parentLevel 父目录的递归层数(首次为0) * @param @param maxTraversalLevel 允许的最大递归层数 * @param @return * @return HashSet * @throws */ public HashSet getAllFiles(List srcPaths, int parentLevel, int maxTraversalLevel){ HashSet sourceAllFiles = new HashSet(); if (!srcPaths.isEmpty()) { for (String eachPath : srcPaths) { sourceAllFiles.addAll(getListFiles(eachPath, parentLevel, maxTraversalLevel)); } } return sourceAllFiles; } } ================================================ FILE: ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/FtpReader.java ================================================ package com.alibaba.datax.plugin.reader.ftpreader; import java.io.InputStream; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; public class FtpReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originConfig = null; private List path = null; private HashSet sourceFiles; // ftp链接参数 private String protocol; private String host; private int port; private String username; private String password; private int timeout; private String connectPattern; private int maxTraversalLevel; private FtpHelper ftpHelper = null; @Override public void init() { this.originConfig = this.getPluginJobConf(); this.sourceFiles = new HashSet(); this.validateParameter(); UnstructuredStorageReaderUtil.validateParameter(this.originConfig); if ("sftp".equals(protocol)) { //sftp协议 this.port = originConfig.getInt(Key.PORT, Constant.DEFAULT_SFTP_PORT); this.ftpHelper = new SftpHelper(); } else if ("ftp".equals(protocol)) { // ftp 协议 this.port = originConfig.getInt(Key.PORT, Constant.DEFAULT_FTP_PORT); this.ftpHelper = new StandardFtpHelper(); } ftpHelper.loginFtpServer(host, username, password, port, timeout, connectPattern); } private void validateParameter() { //todo 常量 this.protocol = this.originConfig.getNecessaryValue(Key.PROTOCOL, FtpReaderErrorCode.REQUIRED_VALUE); boolean ptrotocolTag = "ftp".equals(this.protocol) || "sftp".equals(this.protocol); if (!ptrotocolTag) { throw DataXException.asDataXException(FtpReaderErrorCode.ILLEGAL_VALUE, String.format("仅支持 ftp和sftp 传输协议 , 不支持您配置的传输协议: [%s]", protocol)); } this.host = this.originConfig.getNecessaryValue(Key.HOST, FtpReaderErrorCode.REQUIRED_VALUE); this.username = this.originConfig.getNecessaryValue(Key.USERNAME, FtpReaderErrorCode.REQUIRED_VALUE); this.password = this.originConfig.getNecessaryValue(Key.PASSWORD, FtpReaderErrorCode.REQUIRED_VALUE); this.timeout = originConfig.getInt(Key.TIMEOUT, Constant.DEFAULT_TIMEOUT); this.maxTraversalLevel = originConfig.getInt(Key.MAXTRAVERSALLEVEL, Constant.DEFAULT_MAX_TRAVERSAL_LEVEL); // only support connect pattern this.connectPattern = this.originConfig.getUnnecessaryValue(Key.CONNECTPATTERN, Constant.DEFAULT_FTP_CONNECT_PATTERN, null); boolean connectPatternTag = "PORT".equals(connectPattern) || "PASV".equals(connectPattern); if (!connectPatternTag) { throw DataXException.asDataXException(FtpReaderErrorCode.ILLEGAL_VALUE, String.format("不支持您配置的ftp传输模式: [%s]", connectPattern)); }else{ this.originConfig.set(Key.CONNECTPATTERN, connectPattern); } //path check String pathInString = this.originConfig.getNecessaryValue(Key.PATH, FtpReaderErrorCode.REQUIRED_VALUE); if (!pathInString.startsWith("[") && !pathInString.endsWith("]")) { path = new ArrayList(); path.add(pathInString); } else { path = this.originConfig.getList(Key.PATH, String.class); if (null == path || path.size() == 0) { throw DataXException.asDataXException(FtpReaderErrorCode.REQUIRED_VALUE, "您需要指定待读取的源目录或文件"); } for (String eachPath : path) { if(!eachPath.startsWith("/")){ String message = String.format("请检查参数path:[%s],需要配置为绝对路径", eachPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.ILLEGAL_VALUE, message); } } } } @Override public void prepare() { LOG.debug("prepare() begin..."); this.sourceFiles = ftpHelper.getAllFiles(path, 0, maxTraversalLevel); LOG.info(String.format("您即将读取的文件数为: [%s]", this.sourceFiles.size())); } @Override public void post() { } @Override public void destroy() { try { this.ftpHelper.logoutFtpServer(); } catch (Exception e) { String message = String.format( "关闭与ftp服务器连接失败: [%s] host=%s, username=%s, port=%s", e.getMessage(), host, username, port); LOG.error(message, e); } } // warn: 如果源目录为空会报错,拖空目录意图=>空文件显示指定此意图 @Override public List split(int adviceNumber) { LOG.debug("split() begin..."); List readerSplitConfigs = new ArrayList(); // warn:每个slice拖且仅拖一个文件, // int splitNumber = adviceNumber; int splitNumber = this.sourceFiles.size(); if (0 == splitNumber) { throw DataXException.asDataXException(FtpReaderErrorCode.EMPTY_DIR_EXCEPTION, String.format("未能找到待读取的文件,请确认您的配置项path: %s", this.originConfig.getString(Key.PATH))); } List> splitedSourceFiles = this.splitSourceFiles(new ArrayList(this.sourceFiles), splitNumber); for (List files : splitedSourceFiles) { Configuration splitedConfig = this.originConfig.clone(); splitedConfig.set(Constant.SOURCE_FILES, files); readerSplitConfigs.add(splitedConfig); } LOG.debug("split() ok and end..."); return readerSplitConfigs; } private List> splitSourceFiles(final List sourceList, int adviceNumber) { List> splitedList = new ArrayList>(); int averageLength = sourceList.size() / adviceNumber; averageLength = averageLength == 0 ? 1 : averageLength; for (int begin = 0, end = 0; begin < sourceList.size(); begin = end) { end = begin + averageLength; if (end > sourceList.size()) { end = sourceList.size(); } splitedList.add(sourceList.subList(begin, end)); } return splitedList; } } public static class Task extends Reader.Task { private static Logger LOG = LoggerFactory.getLogger(Task.class); private String host; private int port; private String username; private String password; private String protocol; private int timeout; private String connectPattern; private Configuration readerSliceConfig; private List sourceFiles; private FtpHelper ftpHelper = null; @Override public void init() {//连接重试 /* for ftp connection */ this.readerSliceConfig = this.getPluginJobConf(); this.host = readerSliceConfig.getString(Key.HOST); this.protocol = readerSliceConfig.getString(Key.PROTOCOL); this.username = readerSliceConfig.getString(Key.USERNAME); this.password = readerSliceConfig.getString(Key.PASSWORD); this.timeout = readerSliceConfig.getInt(Key.TIMEOUT, Constant.DEFAULT_TIMEOUT); this.sourceFiles = this.readerSliceConfig.getList(Constant.SOURCE_FILES, String.class); if ("sftp".equals(protocol)) { //sftp协议 this.port = readerSliceConfig.getInt(Key.PORT, Constant.DEFAULT_SFTP_PORT); this.ftpHelper = new SftpHelper(); } else if ("ftp".equals(protocol)) { // ftp 协议 this.port = readerSliceConfig.getInt(Key.PORT, Constant.DEFAULT_FTP_PORT); this.connectPattern = readerSliceConfig.getString(Key.CONNECTPATTERN, Constant.DEFAULT_FTP_CONNECT_PATTERN);// 默认为被动模式 this.ftpHelper = new StandardFtpHelper(); } ftpHelper.loginFtpServer(host, username, password, port, timeout, connectPattern); } @Override public void prepare() { } @Override public void post() { } @Override public void destroy() { try { this.ftpHelper.logoutFtpServer(); } catch (Exception e) { String message = String.format( "关闭与ftp服务器连接失败: [%s] host=%s, username=%s, port=%s", e.getMessage(), host, username, port); LOG.error(message, e); } } @Override public void startRead(RecordSender recordSender) { LOG.debug("start read source files..."); for (String fileName : this.sourceFiles) { LOG.info(String.format("reading file : [%s]", fileName)); InputStream inputStream = null; inputStream = ftpHelper.getInputStream(fileName); UnstructuredStorageReaderUtil.readFromStream(inputStream, fileName, this.readerSliceConfig, recordSender, this.getTaskPluginCollector()); recordSender.flush(); } LOG.debug("end read source files..."); } } } ================================================ FILE: ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/FtpReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.ftpreader; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by haiwei.luo on 14-9-20. */ public enum FtpReaderErrorCode implements ErrorCode { REQUIRED_VALUE("FtpReader-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("FtpReader-01", "您填写的参数值不合法."), MIXED_INDEX_VALUE("FtpReader-02", "您的列信息配置同时包含了index,value."), NO_INDEX_VALUE("FtpReader-03","您明确的配置列信息,但未填写相应的index,value."), FILE_NOT_EXISTS("FtpReader-04", "您配置的目录文件路径不存在或者没有权限读取."), OPEN_FILE_WITH_CHARSET_ERROR("FtpReader-05", "您配置的文件编码和实际文件编码不符合."), OPEN_FILE_ERROR("FtpReader-06", "您配置的文件在打开时异常."), READ_FILE_IO_ERROR("FtpReader-07", "您配置的文件在读取时出现IO异常."), SECURITY_NOT_ENOUGH("FtpReader-08", "您缺少权限执行相应的文件操作."), CONFIG_INVALID_EXCEPTION("FtpReader-09", "您的参数配置错误."), RUNTIME_EXCEPTION("FtpReader-10", "出现运行时异常, 请联系我们"), EMPTY_DIR_EXCEPTION("FtpReader-11", "您尝试读取的文件目录为空."), FAIL_LOGIN("FtpReader-12", "登录失败,无法与ftp服务器建立连接."), FAIL_DISCONNECT("FtpReader-13", "关闭ftp连接失败,无法与ftp服务器断开连接."), COMMAND_FTP_IO_EXCEPTION("FtpReader-14", "与ftp服务器连接异常."), OUT_MAX_DIRECTORY_LEVEL("FtpReader-15", "超出允许的最大目录层数."), LINK_FILE("FtpReader-16", "您尝试读取的文件为链接文件."),; private final String code; private final String description; private FtpReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.ftpreader; public class Key { public static final String PROTOCOL = "protocol"; public static final String HOST = "host"; public static final String USERNAME = "username"; public static final String PASSWORD = "password"; public static final String PORT = "port"; public static final String TIMEOUT = "timeout"; public static final String CONNECTPATTERN = "connectPattern"; public static final String PATH = "path"; public static final String MAXTRAVERSALLEVEL = "maxTraversalLevel"; } ================================================ FILE: ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/SftpHelper.java ================================================ package com.alibaba.datax.plugin.reader.ftpreader; import java.io.InputStream; import java.util.HashSet; import java.util.Properties; import java.util.Vector; import org.apache.commons.io.IOUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; import com.jcraft.jsch.ChannelSftp; import com.jcraft.jsch.JSch; import com.jcraft.jsch.JSchException; import com.jcraft.jsch.Session; import com.jcraft.jsch.SftpATTRS; import com.jcraft.jsch.SftpException; import com.jcraft.jsch.ChannelSftp.LsEntry; public class SftpHelper extends FtpHelper { private static final Logger LOG = LoggerFactory.getLogger(SftpHelper.class); Session session = null; ChannelSftp channelSftp = null; @Override public void loginFtpServer(String host, String username, String password, int port, int timeout, String connectMode) { JSch jsch = new JSch(); // 创建JSch对象 try { session = jsch.getSession(username, host, port); // 根据用户名,主机ip,端口获取一个Session对象 // 如果服务器连接不上,则抛出异常 if (session == null) { throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, "session is null,无法通过sftp与服务器建立链接,请检查主机名和用户名是否正确."); } session.setPassword(password); // 设置密码 Properties config = new Properties(); config.put("StrictHostKeyChecking", "no"); session.setConfig(config); // 为Session对象设置properties session.setTimeout(timeout); // 设置timeout时间 session.connect(); // 通过Session建立链接 channelSftp = (ChannelSftp) session.openChannel("sftp"); // 打开SFTP通道 channelSftp.connect(); // 建立SFTP通道的连接 //设置命令传输编码 //String fileEncoding = System.getProperty("file.encoding"); //channelSftp.setFilenameEncoding(fileEncoding); } catch (JSchException e) { if(null != e.getCause()){ String cause = e.getCause().toString(); String unknownHostException = "java.net.UnknownHostException: " + host; String illegalArgumentException = "java.lang.IllegalArgumentException: port out of range:" + port; String wrongPort = "java.net.ConnectException: Connection refused"; if (unknownHostException.equals(cause)) { String message = String.format("请确认ftp服务器地址是否正确,无法连接到地址为: [%s] 的ftp服务器", host); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message, e); } else if (illegalArgumentException.equals(cause) || wrongPort.equals(cause) ) { String message = String.format("请确认连接ftp服务器端口是否正确,错误的端口: [%s] ", port); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message, e); }else{ throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, "", e); } }else { if("Auth fail".equals(e.getMessage())){ String message = String.format("与ftp服务器建立连接失败,请检查用户名和密码是否正确: [%s]", "message:host =" + host + ",username = " + username + ",port =" + port); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message); }else{ String message = String.format("与ftp服务器建立连接失败 : [%s]", "message:host =" + host + ",username = " + username + ",port =" + port); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message, e); } } } } @Override public void logoutFtpServer() { if (channelSftp != null) { channelSftp.disconnect(); } if (session != null) { session.disconnect(); } } @Override public boolean isDirExist(String directoryPath) { try { SftpATTRS sftpATTRS = channelSftp.lstat(directoryPath); return sftpATTRS.isDir(); } catch (SftpException e) { if (e.getMessage().toLowerCase().equals("no such file")) { String message = String.format("请确认您的配置项path:[%s]存在,且配置的用户有权限读取", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } String message = String.format("进入目录:[%s]时发生I/O异常,请确认与ftp服务器的连接正常", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } @Override public boolean isFileExist(String filePath) { boolean isExitFlag = false; try { SftpATTRS sftpATTRS = channelSftp.lstat(filePath); if(sftpATTRS.getSize() >= 0){ isExitFlag = true; } } catch (SftpException e) { if (e.getMessage().toLowerCase().equals("no such file")) { String message = String.format("请确认您的配置项path:[%s]存在,且配置的用户有权限读取", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } else { String message = String.format("获取文件:[%s] 属性时发生I/O异常,请确认与ftp服务器的连接正常", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } return isExitFlag; } @Override public boolean isSymbolicLink(String filePath) { try { SftpATTRS sftpATTRS = channelSftp.lstat(filePath); return sftpATTRS.isLink(); } catch (SftpException e) { if (e.getMessage().toLowerCase().equals("no such file")) { String message = String.format("请确认您的配置项path:[%s]存在,且配置的用户有权限读取", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } else { String message = String.format("获取文件:[%s] 属性时发生I/O异常,请确认与ftp服务器的连接正常", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } } HashSet sourceFiles = new HashSet(); @Override public HashSet getListFiles(String directoryPath, int parentLevel, int maxTraversalLevel) { if(parentLevel < maxTraversalLevel){ String parentPath = null;// 父级目录,以'/'结尾 int pathLen = directoryPath.length(); if (directoryPath.contains("*") || directoryPath.contains("?")) {//*和?的限制 // path是正则表达式 String subPath = UnstructuredStorageReaderUtil.getRegexPathParentPath(directoryPath); if (isDirExist(subPath)) { parentPath = subPath; } else { String message = String.format("不能进入目录:[%s]," + "请确认您的配置项path:[%s]存在,且配置的用户有权限进入", subPath, directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } } else if (isDirExist(directoryPath)) { // path是目录 if (directoryPath.charAt(pathLen - 1) == IOUtils.DIR_SEPARATOR) { parentPath = directoryPath; } else { parentPath = directoryPath + IOUtils.DIR_SEPARATOR; } } else if(isSymbolicLink(directoryPath)){ //path是链接文件 String message = String.format("文件:[%s]是链接文件,当前不支持链接文件的读取", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.LINK_FILE, message); }else if (isFileExist(directoryPath)) { // path指向具体文件 sourceFiles.add(directoryPath); return sourceFiles; } else { String message = String.format("请确认您的配置项path:[%s]存在,且配置的用户有权限读取", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } try { Vector vector = channelSftp.ls(directoryPath); for (int i = 0; i < vector.size(); i++) { LsEntry le = (LsEntry) vector.get(i); String strName = le.getFilename(); String filePath = parentPath + strName; if (isDirExist(filePath)) { // 是子目录 if (!(strName.equals(".") || strName.equals(".."))) { //递归处理 getListFiles(filePath, parentLevel+1, maxTraversalLevel); } } else if(isSymbolicLink(filePath)){ //是链接文件 String message = String.format("文件:[%s]是链接文件,当前不支持链接文件的读取", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.LINK_FILE, message); }else if (isFileExist(filePath)) { // 是文件 sourceFiles.add(filePath); } else { String message = String.format("请确认path:[%s]存在,且配置的用户有权限读取", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } } // end for vector } catch (SftpException e) { String message = String.format("获取path:[%s] 下文件列表时发生I/O异常,请确认与ftp服务器的连接正常", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } return sourceFiles; }else{ //超出最大递归层数 String message = String.format("获取path:[%s] 下文件列表时超出最大层数,请确认路径[%s]下不存在软连接文件", directoryPath, directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.OUT_MAX_DIRECTORY_LEVEL, message); } } @Override public InputStream getInputStream(String filePath) { try { return channelSftp.get(filePath); } catch (SftpException e) { String message = String.format("读取文件 : [%s] 时出错,请确认文件:[%s]存在且配置的用户有权限读取", filePath, filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.OPEN_FILE_ERROR, message); } } } ================================================ FILE: ftpreader/src/main/java/com/alibaba/datax/plugin/reader/ftpreader/StandardFtpHelper.java ================================================ package com.alibaba.datax.plugin.reader.ftpreader; import java.io.IOException; import java.io.InputStream; import java.net.UnknownHostException; import java.util.HashSet; import org.apache.commons.io.IOUtils; import org.apache.commons.net.ftp.FTP; import org.apache.commons.net.ftp.FTPClient; import org.apache.commons.net.ftp.FTPClientConfig; import org.apache.commons.net.ftp.FTPFile; import org.apache.commons.net.ftp.FTPReply; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; public class StandardFtpHelper extends FtpHelper { private static final Logger LOG = LoggerFactory.getLogger(StandardFtpHelper.class); FTPClient ftpClient = null; @Override public void loginFtpServer(String host, String username, String password, int port, int timeout, String connectMode) { ftpClient = new FTPClient(); try { // 连接 ftpClient.connect(host, port); // 登录 ftpClient.login(username, password); // 不需要写死ftp server的OS TYPE,FTPClient getSystemType()方法会自动识别 // ftpClient.configure(new FTPClientConfig(FTPClientConfig.SYST_UNIX)); ftpClient.setConnectTimeout(timeout); ftpClient.setDataTimeout(timeout); if ("PASV".equals(connectMode)) { ftpClient.enterRemotePassiveMode(); ftpClient.enterLocalPassiveMode(); } else if ("PORT".equals(connectMode)) { ftpClient.enterLocalActiveMode(); // ftpClient.enterRemoteActiveMode(host, port); } int reply = ftpClient.getReplyCode(); if (!FTPReply.isPositiveCompletion(reply)) { ftpClient.disconnect(); String message = String.format("与ftp服务器建立连接失败,请检查用户名和密码是否正确: [%s]", "message:host =" + host + ",username = " + username + ",port =" + port); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message); } //设置命令传输编码 String fileEncoding = System.getProperty("file.encoding"); ftpClient.setControlEncoding(fileEncoding); } catch (UnknownHostException e) { String message = String.format("请确认ftp服务器地址是否正确,无法连接到地址为: [%s] 的ftp服务器", host); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message, e); } catch (IllegalArgumentException e) { String message = String.format("请确认连接ftp服务器端口是否正确,错误的端口: [%s] ", port); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message, e); } catch (Exception e) { String message = String.format("与ftp服务器建立连接失败 : [%s]", "message:host =" + host + ",username = " + username + ",port =" + port); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_LOGIN, message, e); } } @Override public void logoutFtpServer() { if (ftpClient.isConnected()) { try { //todo ftpClient.completePendingCommand();//打开流操作之后必须,原因还需要深究 ftpClient.logout(); } catch (IOException e) { String message = "与ftp服务器断开连接失败"; LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_DISCONNECT, message, e); }finally { if(ftpClient.isConnected()){ try { ftpClient.disconnect(); } catch (IOException e) { String message = "与ftp服务器断开连接失败"; LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FAIL_DISCONNECT, message, e); } } } } } @Override public boolean isDirExist(String directoryPath) { try { return ftpClient.changeWorkingDirectory(new String(directoryPath.getBytes(),FTP.DEFAULT_CONTROL_ENCODING)); } catch (IOException e) { String message = String.format("进入目录:[%s]时发生I/O异常,请确认与ftp服务器的连接正常", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } @Override public boolean isFileExist(String filePath) { boolean isExitFlag = false; try { FTPFile[] ftpFiles = ftpClient.listFiles(new String(filePath.getBytes(),FTP.DEFAULT_CONTROL_ENCODING)); if (ftpFiles.length == 1 && ftpFiles[0].isFile()) { isExitFlag = true; } } catch (IOException e) { String message = String.format("获取文件:[%s] 属性时发生I/O异常,请确认与ftp服务器的连接正常", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } return isExitFlag; } @Override public boolean isSymbolicLink(String filePath) { boolean isExitFlag = false; try { FTPFile[] ftpFiles = ftpClient.listFiles(new String(filePath.getBytes(),FTP.DEFAULT_CONTROL_ENCODING)); if (ftpFiles.length == 1 && ftpFiles[0].isSymbolicLink()) { isExitFlag = true; } } catch (IOException e) { String message = String.format("获取文件:[%s] 属性时发生I/O异常,请确认与ftp服务器的连接正常", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } return isExitFlag; } HashSet sourceFiles = new HashSet(); @Override public HashSet getListFiles(String directoryPath, int parentLevel, int maxTraversalLevel) { if(parentLevel < maxTraversalLevel){ String parentPath = null;// 父级目录,以'/'结尾 int pathLen = directoryPath.length(); if (directoryPath.contains("*") || directoryPath.contains("?")) { // path是正则表达式 String subPath = UnstructuredStorageReaderUtil.getRegexPathParentPath(directoryPath); if (isDirExist(subPath)) { parentPath = subPath; } else { String message = String.format("不能进入目录:[%s]," + "请确认您的配置项path:[%s]存在,且配置的用户有权限进入", subPath, directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } } else if (isDirExist(directoryPath)) { // path是目录 if (directoryPath.charAt(pathLen - 1) == IOUtils.DIR_SEPARATOR) { parentPath = directoryPath; } else { parentPath = directoryPath + IOUtils.DIR_SEPARATOR; } } else if (isFileExist(directoryPath)) { // path指向具体文件 sourceFiles.add(directoryPath); return sourceFiles; } else if(isSymbolicLink(directoryPath)){ //path是链接文件 String message = String.format("文件:[%s]是链接文件,当前不支持链接文件的读取", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.LINK_FILE, message); }else { String message = String.format("请确认您的配置项path:[%s]存在,且配置的用户有权限读取", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } try { FTPFile[] fs = ftpClient.listFiles(new String(directoryPath.getBytes(),FTP.DEFAULT_CONTROL_ENCODING)); for (FTPFile ff : fs) { String strName = ff.getName(); String filePath = parentPath + strName; if (ff.isDirectory()) { if (!(strName.equals(".") || strName.equals(".."))) { //递归处理 getListFiles(filePath, parentLevel+1, maxTraversalLevel); } } else if (ff.isFile()) { // 是文件 sourceFiles.add(filePath); } else if(ff.isSymbolicLink()){ //是链接文件 String message = String.format("文件:[%s]是链接文件,当前不支持链接文件的读取", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.LINK_FILE, message); }else { String message = String.format("请确认path:[%s]存在,且配置的用户有权限读取", filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.FILE_NOT_EXISTS, message); } } // end for FTPFile } catch (IOException e) { String message = String.format("获取path:[%s] 下文件列表时发生I/O异常,请确认与ftp服务器的连接正常", directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } return sourceFiles; } else{ //超出最大递归层数 String message = String.format("获取path:[%s] 下文件列表时超出最大层数,请确认路径[%s]下不存在软连接文件", directoryPath, directoryPath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.OUT_MAX_DIRECTORY_LEVEL, message); } } @Override public InputStream getInputStream(String filePath) { try { return ftpClient.retrieveFileStream(new String(filePath.getBytes(),FTP.DEFAULT_CONTROL_ENCODING)); } catch (IOException e) { String message = String.format("读取文件 : [%s] 时出错,请确认文件:[%s]存在且配置的用户有权限读取", filePath, filePath); LOG.error(message); throw DataXException.asDataXException(FtpReaderErrorCode.OPEN_FILE_ERROR, message); } } } ================================================ FILE: ftpreader/src/main/resources/plugin-template.json ================================================ { "name": "ftpreader", "parameter": { "host": "", "port": "", "username": "", "password": "", "protocol": "", "path": [ "" ], "encoding": "UTF-8", "column": [ { "index": 0, "type": "long" }, { "index": 1, "type": "boolean" }, { "index": 2, "type": "double" }, { "index": 3, "type": "string" }, { "index": 4, "type": "date", "format": "yyyy.MM.dd" } ], "fieldDelimiter": "," } } ================================================ FILE: ftpreader/src/main/resources/plugin.json ================================================ { "name": "ftpreader", "class": "com.alibaba.datax.plugin.reader.ftpreader.FtpReader", "description": "useScene: test. mechanism: use datax framework to transport data from txt file. warn: The more you know about the data, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: ftpreader/src/main/resources/plugin_job_template.json ================================================ { "name": "ftpreader", "parameter": { "host": "", "protocol": "sftp", "port":"", "username": "", "password": "", "path": [], "column": [ { "index": 0, "type": "" } ], "fieldDelimiter": ",", "encoding": "UTF-8" } } ================================================ FILE: ftpwriter/doc/.gitkeep ================================================ ================================================ FILE: ftpwriter/doc/ftpwriter.md ================================================ # DataX FtpWriter 说明 ------------ ## 1 快速介绍 FtpWriter提供了向远程FTP文件写入CSV格式的一个或者多个文件,在底层实现上,FtpWriter将DataX传输协议下的数据转换为csv格式,并使用FTP相关的网络协议写出到远程FTP服务器。 **写入FTP文件内容存放的是一张逻辑意义上的二维表,例如CSV格式的文本信息。** ## 2 功能与限制 FtpWriter实现了从DataX协议转为FTP文件功能,FTP文件本身是无结构化数据存储,FtpWriter如下几个方面约定: 1. 支持且仅支持写入文本类型(不支持BLOB如视频数据)的文件,且要求文本中shema为一张二维表。 2. 支持类CSV格式文件,自定义分隔符。 3. 写出时不支持文本压缩。 6. 支持多线程写入,每个线程写入不同子文件。 我们不能做到: 1. 单个文件并发写入。 ## 3 功能说明 ### 3.1 配置样例 ```json { "setting": {}, "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": {}, "writer": { "name": "ftpwriter", "parameter": { "protocol": "sftp", "host": "***", "port": 22, "username": "xxx", "password": "xxx", "timeout": "60000", "connectPattern": "PASV", "path": "/tmp/data/", "fileName": "yixiao", "writeMode": "truncate|append|nonConflict", "fieldDelimiter": ",", "encoding": "UTF-8", "nullFormat": "null", "dateFormat": "yyyy-MM-dd", "fileFormat": "csv", "suffix": ".csv", "header": [] } } } ] } } ``` ### 3.2 参数说明 * **protocol** * 描述:ftp服务器协议,目前支持传输协议有ftp和sftp。
* 必选:是
* 默认值:无
* **host** * 描述:ftp服务器地址。
* 必选:是
* 默认值:无
* **port** * 描述:ftp服务器端口。
* 必选:否
* 默认值:若传输协议是sftp协议,默认值是22;若传输协议是标准ftp协议,默认值是21
* **timeout** * 描述:连接ftp服务器连接超时时间,单位毫秒。
* 必选:否
* 默认值:60000(1分钟)
* **username** * 描述:ftp服务器访问用户名。
* 必选:是
* 默认值:无
* **password** * 描述:ftp服务器访问密码。
* 必选:是
* 默认值:无
* **path** * 描述:FTP文件系统的路径信息,FtpWriter会写入Path目录下属多个文件。
* 必选:是
* 默认值:无
* **fileName** * 描述:FtpWriter写入的文件名,该文件名会添加随机的后缀作为每个线程写入实际文件名。
* 必选:是
* 默认值:无
* **writeMode** * 描述:FtpWriter写入前数据清理处理模式:
* truncate,写入前清理目录下一fileName前缀的所有文件。 * append,写入前不做任何处理,DataX FtpWriter直接使用filename写入,并保证文件名不冲突。 * nonConflict,如果目录下有fileName前缀的文件,直接报错。 * 必选:是
* 默认值:无
* **fieldDelimiter** * 描述:读取的字段分隔符
* 必选:否
* 默认值:,
* **compress** * 描述:文本压缩类型,暂时不支持。
* 必选:否
* 默认值:无压缩
* **encoding** * 描述:读取文件的编码配置。
* 必选:否
* 默认值:utf-8
* **nullFormat** * 描述:文本文件中无法使用标准字符串定义null(空指针),DataX提供nullFormat定义哪些字符串可以表示为null。
例如如果用户配置: nullFormat="\N",那么如果源头数据是"\N",DataX视作null字段。 * 必选:否
* 默认值:\N
* **dateFormat** * 描述:日期类型的数据序列化到文件中时的格式,例如 "dateFormat": "yyyy-MM-dd"。
* 必选:否
* 默认值:无
* **fileFormat** * 描述:文件写出的格式,包括csv (http://zh.wikipedia.org/wiki/%E9%80%97%E5%8F%B7%E5%88%86%E9%9A%94%E5%80%BC) 和text两种,csv是严格的csv格式,如果待写数据包括列分隔符,则会按照csv的转义语法转义,转义符号为双引号";text格式是用列分隔符简单分割待写数据,对于待写数据包括列分隔符情况下不做转义。
* 必选:否
* 默认值:text
* **suffix** * 描述:最后输出文件的后缀,当前支持 ".text"以及".csv" * 必选:否
* 默认值:""
* **header** * 描述:txt写出时的表头,示例['id', 'name', 'age']。
* 必选:否
* 默认值:无
### 3.3 类型转换 FTP文件本身不提供数据类型,该类型是DataX FtpWriter定义: | DataX 内部类型| FTP文件 数据类型 | | -------- | ----- | | | Long |Long -> 字符串序列化表示| | Double |Double -> 字符串序列化表示| | String |String -> 字符串序列化表示| | Boolean |Boolean -> 字符串序列化表示| | Date |Date -> 字符串序列化表示| 其中: * FTP文件 Long是指FTP文件文本中使用整形的字符串表示形式,例如"19901219"。 * FTP文件 Double是指FTP文件文本中使用Double的字符串表示形式,例如"3.1415"。 * FTP文件 Boolean是指FTP文件文本中使用Boolean的字符串表示形式,例如"true"、"false"。不区分大小写。 * FTP文件 Date是指FTP文件文本中使用Date的字符串表示形式,例如"2014-12-31",Date可以指定format格式。 ## 4 性能报告 ## 5 约束限制 略 ## 6 FAQ 略 ================================================ FILE: ftpwriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT ftpwriter ftpwriter FtpWriter提供了写数据到指定ftp服务器文件功能。 jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 com.jcraft jsch 0.1.54 commons-net commons-net 3.3 junit junit test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: ftpwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/ftpwriter target/ ftpwriter-0.0.1-SNAPSHOT.jar plugin/writer/ftpwriter false plugin/writer/ftpwriter/libs runtime ================================================ FILE: ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/FtpWriter.java ================================================ package com.alibaba.datax.plugin.writer.ftpwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.unstructuredstorage.writer.UnstructuredStorageWriterUtil; import com.alibaba.datax.plugin.writer.ftpwriter.util.Constant; import com.alibaba.datax.plugin.writer.ftpwriter.util.IFtpHelper; import com.alibaba.datax.plugin.writer.ftpwriter.util.SftpHelperImpl; import com.alibaba.datax.plugin.writer.ftpwriter.util.StandardFtpHelperImpl; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.OutputStream; import java.util.HashSet; import java.util.List; import java.util.Set; import java.util.concurrent.Callable; public class FtpWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration writerSliceConfig = null; private Set allFileExists = null; private String protocol; private String host; private int port; private String username; private String password; private int timeout; private IFtpHelper ftpHelper = null; @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.validateParameter(); UnstructuredStorageWriterUtil .validateParameter(this.writerSliceConfig); try { RetryUtil.executeWithRetry(new Callable() { @Override public Void call() throws Exception { ftpHelper.loginFtpServer(host, username, password, port, timeout); return null; } }, 3, 4000, true); } catch (Exception e) { String message = String .format("与ftp服务器建立连接失败, host:%s, username:%s, port:%s, errorMessage:%s", host, username, port, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message, e); } } private void validateParameter() { this.writerSliceConfig .getNecessaryValue( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME, FtpWriterErrorCode.REQUIRED_VALUE); String path = this.writerSliceConfig.getNecessaryValue(Key.PATH, FtpWriterErrorCode.REQUIRED_VALUE); if (!path.startsWith("/")) { String message = String.format("请检查参数path:%s,需要配置为绝对路径", path); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.ILLEGAL_VALUE, message); } this.host = this.writerSliceConfig.getNecessaryValue(Key.HOST, FtpWriterErrorCode.REQUIRED_VALUE); this.username = this.writerSliceConfig.getNecessaryValue( Key.USERNAME, FtpWriterErrorCode.REQUIRED_VALUE); this.password = this.writerSliceConfig.getNecessaryValue( Key.PASSWORD, FtpWriterErrorCode.REQUIRED_VALUE); this.timeout = this.writerSliceConfig.getInt(Key.TIMEOUT, Constant.DEFAULT_TIMEOUT); this.protocol = this.writerSliceConfig.getNecessaryValue( Key.PROTOCOL, FtpWriterErrorCode.REQUIRED_VALUE); if ("sftp".equalsIgnoreCase(this.protocol)) { this.port = this.writerSliceConfig.getInt(Key.PORT, Constant.DEFAULT_SFTP_PORT); this.ftpHelper = new SftpHelperImpl(); } else if ("ftp".equalsIgnoreCase(this.protocol)) { this.port = this.writerSliceConfig.getInt(Key.PORT, Constant.DEFAULT_FTP_PORT); this.ftpHelper = new StandardFtpHelperImpl(); } else { throw DataXException.asDataXException( FtpWriterErrorCode.ILLEGAL_VALUE, String.format( "仅支持 ftp和sftp 传输协议 , 不支持您配置的传输协议: [%s]", protocol)); } this.writerSliceConfig.set(Key.PORT, this.port); } @Override public void prepare() { String path = this.writerSliceConfig.getString(Key.PATH); // warn: 这里用户需要配一个目录 this.ftpHelper.mkDirRecursive(path); String fileName = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME); String writeMode = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.WRITE_MODE); Set allFileExists = this.ftpHelper.getAllFilesInDir(path, fileName); this.allFileExists = allFileExists; // truncate option handler if ("truncate".equals(writeMode)) { LOG.info(String.format( "由于您配置了writeMode truncate, 开始清理 [%s] 下面以 [%s] 开头的内容", path, fileName)); Set fullFileNameToDelete = new HashSet(); for (String each : allFileExists) { fullFileNameToDelete.add(UnstructuredStorageWriterUtil .buildFilePath(path, each, null)); } LOG.info(String.format( "删除目录path:[%s] 下指定前缀fileName:[%s] 文件列表如下: [%s]", path, fileName, StringUtils.join(fullFileNameToDelete.iterator(), ", "))); this.ftpHelper.deleteFiles(fullFileNameToDelete); } else if ("append".equals(writeMode)) { LOG.info(String .format("由于您配置了writeMode append, 写入前不做清理工作, [%s] 目录下写入相应文件名前缀 [%s] 的文件", path, fileName)); LOG.info(String.format( "目录path:[%s] 下已经存在的指定前缀fileName:[%s] 文件列表如下: [%s]", path, fileName, StringUtils.join(allFileExists.iterator(), ", "))); } else if ("nonConflict".equals(writeMode)) { LOG.info(String.format( "由于您配置了writeMode nonConflict, 开始检查 [%s] 下面的内容", path)); if (!allFileExists.isEmpty()) { LOG.info(String.format( "目录path:[%s] 下指定前缀fileName:[%s] 冲突文件列表如下: [%s]", path, fileName, StringUtils.join(allFileExists.iterator(), ", "))); throw DataXException .asDataXException( FtpWriterErrorCode.ILLEGAL_VALUE, String.format( "您配置的path: [%s] 目录不为空, 下面存在其他文件或文件夹.", path)); } } else { throw DataXException .asDataXException( FtpWriterErrorCode.ILLEGAL_VALUE, String.format( "仅支持 truncate, append, nonConflict 三种模式, 不支持您配置的 writeMode 模式 : [%s]", writeMode)); } } @Override public void post() { } @Override public void destroy() { try { this.ftpHelper.logoutFtpServer(); } catch (Exception e) { String message = String .format("关闭与ftp服务器连接失败, host:%s, username:%s, port:%s, errorMessage:%s", host, username, port, e.getMessage()); LOG.error(message, e); } } @Override public List split(int mandatoryNumber) { return UnstructuredStorageWriterUtil.split(this.writerSliceConfig, this.allFileExists, mandatoryNumber); } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration writerSliceConfig; private String path; private String fileName; private String suffix; private String protocol; private String host; private int port; private String username; private String password; private int timeout; private IFtpHelper ftpHelper = null; @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.path = this.writerSliceConfig.getString(Key.PATH); this.fileName = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME); this.suffix = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.SUFFIX); this.host = this.writerSliceConfig.getString(Key.HOST); this.port = this.writerSliceConfig.getInt(Key.PORT); this.username = this.writerSliceConfig.getString(Key.USERNAME); this.password = this.writerSliceConfig.getString(Key.PASSWORD); this.timeout = this.writerSliceConfig.getInt(Key.TIMEOUT, Constant.DEFAULT_TIMEOUT); this.protocol = this.writerSliceConfig.getString(Key.PROTOCOL); if ("sftp".equalsIgnoreCase(this.protocol)) { this.ftpHelper = new SftpHelperImpl(); } else if ("ftp".equalsIgnoreCase(this.protocol)) { this.ftpHelper = new StandardFtpHelperImpl(); } try { RetryUtil.executeWithRetry(new Callable() { @Override public Void call() throws Exception { ftpHelper.loginFtpServer(host, username, password, port, timeout); return null; } }, 3, 4000, true); } catch (Exception e) { String message = String .format("与ftp服务器建立连接失败, host:%s, username:%s, port:%s, errorMessage:%s", host, username, port, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message, e); } } @Override public void prepare() { } @Override public void startWrite(RecordReceiver lineReceiver) { LOG.info("begin do write..."); String fileFullPath = UnstructuredStorageWriterUtil.buildFilePath( this.path, this.fileName, this.suffix); LOG.info(String.format("write to file : [%s]", fileFullPath)); OutputStream outputStream = null; try { outputStream = this.ftpHelper.getOutputStream(fileFullPath); UnstructuredStorageWriterUtil.writeToStream(lineReceiver, outputStream, this.writerSliceConfig, this.fileName, this.getTaskPluginCollector()); } catch (Exception e) { throw DataXException.asDataXException( FtpWriterErrorCode.WRITE_FILE_IO_ERROR, String.format("无法创建待写文件 : [%s]", this.fileName), e); } finally { IOUtils.closeQuietly(outputStream); } LOG.info("end do write"); } @Override public void post() { } @Override public void destroy() { try { this.ftpHelper.logoutFtpServer(); } catch (Exception e) { String message = String .format("关闭与ftp服务器连接失败, host:%s, username:%s, port:%s, errorMessage:%s", host, username, port, e.getMessage()); LOG.error(message, e); } } } } ================================================ FILE: ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/FtpWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.ftpwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum FtpWriterErrorCode implements ErrorCode { REQUIRED_VALUE("FtpWriter-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("FtpWriter-01", "您填写的参数值不合法."), MIXED_INDEX_VALUE("FtpWriter-02", "您的列信息配置同时包含了index,value."), NO_INDEX_VALUE("FtpWriter-03","您明确的配置列信息,但未填写相应的index,value."), FILE_NOT_EXISTS("FtpWriter-04", "您配置的目录文件路径不存在或者没有权限读取."), OPEN_FILE_WITH_CHARSET_ERROR("FtpWriter-05", "您配置的文件编码和实际文件编码不符合."), OPEN_FILE_ERROR("FtpWriter-06", "您配置的文件在打开时异常."), WRITE_FILE_IO_ERROR("FtpWriter-07", "您配置的文件在读取时出现IO异常."), SECURITY_NOT_ENOUGH("FtpWriter-08", "您缺少权限执行相应的文件操作."), CONFIG_INVALID_EXCEPTION("FtpWriter-09", "您的参数配置错误."), RUNTIME_EXCEPTION("FtpWriter-10", "出现运行时异常, 请联系我们"), EMPTY_DIR_EXCEPTION("FtpWriter-11", "您尝试读取的文件目录为空."), FAIL_LOGIN("FtpWriter-12", "登录失败,无法与ftp服务器建立连接."), FAIL_DISCONNECT("FtpWriter-13", "关闭ftp连接失败,无法与ftp服务器断开连接."), COMMAND_FTP_IO_EXCEPTION("FtpWriter-14", "与ftp服务器连接异常."), OUT_MAX_DIRECTORY_LEVEL("FtpWriter-15", "超出允许的最大目录层数."), LINK_FILE("FtpWriter-16", "您尝试读取的文件为链接文件."), COMMAND_FTP_ENCODING_EXCEPTION("FtpWriter-17", "与ftp服务器连接,使用指定编码异常."), FAIL_LOGOUT("FtpWriter-18", "登出失败,关闭与ftp服务器建立连接失败,但这不影响任务同步."),; private final String code; private final String description; private FtpWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.ftpwriter; public class Key { public static final String PROTOCOL = "protocol"; public static final String HOST = "host"; public static final String USERNAME = "username"; public static final String PASSWORD = "password"; public static final String PORT = "port"; public static final String TIMEOUT = "timeout"; public static final String CONNECTPATTERN = "connectPattern"; public static final String PATH = "path"; } ================================================ FILE: ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/Constant.java ================================================ package com.alibaba.datax.plugin.writer.ftpwriter.util; public class Constant { public static final int DEFAULT_FTP_PORT = 21; public static final int DEFAULT_SFTP_PORT = 22; public static final int DEFAULT_TIMEOUT = 60000; public static final int DEFAULT_MAX_TRAVERSAL_LEVEL = 100; public static final String DEFAULT_FTP_CONNECT_PATTERN = "PASV"; public static final String CONTROL_ENCODING = "utf8"; } ================================================ FILE: ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/IFtpHelper.java ================================================ package com.alibaba.datax.plugin.writer.ftpwriter.util; import java.io.OutputStream; import java.util.Set; public interface IFtpHelper { //使用被动方式 public void loginFtpServer(String host, String username, String password, int port, int timeout); public void logoutFtpServer(); /** * warn: 不支持递归创建, 比如 mkdir -p * */ public void mkdir(String directoryPath); /** * 支持目录递归创建 */ public void mkDirRecursive(String directoryPath); public OutputStream getOutputStream(String filePath); public String getRemoteFileContent(String filePath); public Set getAllFilesInDir(String dir, String prefixFileName); /** * warn: 不支持文件夹删除, 比如 rm -rf * */ public void deleteFiles(Set filesToDelete); public void completePendingCommand(); } ================================================ FILE: ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/SftpHelperImpl.java ================================================ package com.alibaba.datax.plugin.writer.ftpwriter.util; import java.io.ByteArrayOutputStream; import java.io.OutputStream; import java.util.HashSet; import java.util.Properties; import java.util.Set; import java.util.Vector; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.writer.ftpwriter.FtpWriterErrorCode; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONWriter; import com.jcraft.jsch.ChannelSftp; import com.jcraft.jsch.JSch; import com.jcraft.jsch.JSchException; import com.jcraft.jsch.Session; import com.jcraft.jsch.SftpATTRS; import com.jcraft.jsch.SftpException; import com.jcraft.jsch.ChannelSftp.LsEntry; public class SftpHelperImpl implements IFtpHelper { private static final Logger LOG = LoggerFactory .getLogger(SftpHelperImpl.class); private Session session = null; private ChannelSftp channelSftp = null; @Override public void loginFtpServer(String host, String username, String password, int port, int timeout) { JSch jsch = new JSch(); try { this.session = jsch.getSession(username, host, port); if (this.session == null) { throw DataXException .asDataXException(FtpWriterErrorCode.FAIL_LOGIN, "创建ftp连接this.session失败,无法通过sftp与服务器建立链接,请检查主机名和用户名是否正确."); } this.session.setPassword(password); Properties config = new Properties(); config.put("StrictHostKeyChecking", "no"); // config.put("PreferredAuthentications", "password"); this.session.setConfig(config); this.session.setTimeout(timeout); this.session.connect(); this.channelSftp = (ChannelSftp) this.session.openChannel("sftp"); this.channelSftp.connect(); } catch (JSchException e) { if (null != e.getCause()) { String cause = e.getCause().toString(); String unknownHostException = "java.net.UnknownHostException: " + host; String illegalArgumentException = "java.lang.IllegalArgumentException: port out of range:" + port; String wrongPort = "java.net.ConnectException: Connection refused"; if (unknownHostException.equals(cause)) { String message = String .format("请确认ftp服务器地址是否正确,无法连接到地址为: [%s] 的ftp服务器, errorMessage:%s", host, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message, e); } else if (illegalArgumentException.equals(cause) || wrongPort.equals(cause)) { String message = String.format( "请确认连接ftp服务器端口是否正确,错误的端口: [%s], errorMessage:%s", port, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message, e); } } else { String message = String .format("与ftp服务器建立连接失败,请检查主机、用户名、密码是否正确, host:%s, port:%s, username:%s, errorMessage:%s", host, port, username, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message); } } } @Override public void logoutFtpServer() { if (this.channelSftp != null) { this.channelSftp.disconnect(); this.channelSftp = null; } if (this.session != null) { this.session.disconnect(); this.session = null; } } @Override public void mkdir(String directoryPath) { boolean isDirExist = false; try { this.printWorkingDirectory(); SftpATTRS sftpATTRS = this.channelSftp.lstat(directoryPath); isDirExist = sftpATTRS.isDir(); } catch (SftpException e) { if (e.getMessage().toLowerCase().equals("no such file")) { LOG.warn(String.format( "您的配置项path:[%s]不存在,将尝试进行目录创建, errorMessage:%s", directoryPath, e.getMessage()), e); isDirExist = false; } } if (!isDirExist) { try { // warn 检查mkdir -p this.channelSftp.mkdir(directoryPath); } catch (SftpException e) { String message = String .format("创建目录:%s时发生I/O异常,请确认与ftp服务器的连接正常,拥有目录创建权限, errorMessage:%s", directoryPath, e.getMessage()); LOG.error(message, e); throw DataXException .asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } } @Override public void mkDirRecursive(String directoryPath){ boolean isDirExist = false; try { this.printWorkingDirectory(); SftpATTRS sftpATTRS = this.channelSftp.lstat(directoryPath); isDirExist = sftpATTRS.isDir(); } catch (SftpException e) { if (e.getMessage().toLowerCase().equals("no such file")) { LOG.warn(String.format( "您的配置项path:[%s]不存在,将尝试进行目录创建, errorMessage:%s", directoryPath, e.getMessage()), e); isDirExist = false; } } if (!isDirExist) { StringBuilder dirPath = new StringBuilder(); dirPath.append(IOUtils.DIR_SEPARATOR_UNIX); String[] dirSplit = StringUtils.split(directoryPath,IOUtils.DIR_SEPARATOR_UNIX); try { // ftp server不支持递归创建目录,只能一级一级创建 for(String dirName : dirSplit){ dirPath.append(dirName); mkDirSingleHierarchy(dirPath.toString()); dirPath.append(IOUtils.DIR_SEPARATOR_UNIX); } } catch (SftpException e) { String message = String .format("创建目录:%s时发生I/O异常,请确认与ftp服务器的连接正常,拥有目录创建权限, errorMessage:%s", directoryPath, e.getMessage()); LOG.error(message, e); throw DataXException .asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } } public boolean mkDirSingleHierarchy(String directoryPath) throws SftpException { boolean isDirExist = false; try { SftpATTRS sftpATTRS = this.channelSftp.lstat(directoryPath); isDirExist = sftpATTRS.isDir(); } catch (SftpException e) { if(!isDirExist){ LOG.info(String.format("正在逐级创建目录 [%s]",directoryPath)); this.channelSftp.mkdir(directoryPath); return true; } } if(!isDirExist){ LOG.info(String.format("正在逐级创建目录 [%s]",directoryPath)); this.channelSftp.mkdir(directoryPath); } return true; } @Override public OutputStream getOutputStream(String filePath) { try { this.printWorkingDirectory(); String parentDir = filePath.substring(0, StringUtils.lastIndexOf(filePath, IOUtils.DIR_SEPARATOR)); this.channelSftp.cd(parentDir); this.printWorkingDirectory(); OutputStream writeOutputStream = this.channelSftp.put(filePath, ChannelSftp.APPEND); String message = String.format( "打开FTP文件[%s]获取写出流时出错,请确认文件%s有权限创建,有权限写出等", filePath, filePath); if (null == writeOutputStream) { throw DataXException.asDataXException( FtpWriterErrorCode.OPEN_FILE_ERROR, message); } return writeOutputStream; } catch (SftpException e) { String message = String.format( "写出文件[%s] 时出错,请确认文件%s有权限写出, errorMessage:%s", filePath, filePath, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.OPEN_FILE_ERROR, message); } } @Override public String getRemoteFileContent(String filePath) { try { this.completePendingCommand(); this.printWorkingDirectory(); String parentDir = filePath.substring(0, StringUtils.lastIndexOf(filePath, IOUtils.DIR_SEPARATOR)); this.channelSftp.cd(parentDir); this.printWorkingDirectory(); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(22); this.channelSftp.get(filePath, outputStream); String result = outputStream.toString(); IOUtils.closeQuietly(outputStream); return result; } catch (SftpException e) { String message = String.format( "写出文件[%s] 时出错,请确认文件%s有权限写出, errorMessage:%s", filePath, filePath, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.OPEN_FILE_ERROR, message); } } @Override public Set getAllFilesInDir(String dir, String prefixFileName) { Set allFilesWithPointedPrefix = new HashSet(); try { this.printWorkingDirectory(); @SuppressWarnings("rawtypes") Vector allFiles = this.channelSftp.ls(dir); LOG.debug(String.format("ls: %s", JSON.toJSONString(allFiles, JSONWriter.Feature.UseSingleQuotes))); for (int i = 0; i < allFiles.size(); i++) { LsEntry le = (LsEntry) allFiles.get(i); String strName = le.getFilename(); if (strName.startsWith(prefixFileName)) { allFilesWithPointedPrefix.add(strName); } } } catch (SftpException e) { String message = String .format("获取path:[%s] 下文件列表时发生I/O异常,请确认与ftp服务器的连接正常,拥有目录ls权限, errorMessage:%s", dir, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } return allFilesWithPointedPrefix; } @Override public void deleteFiles(Set filesToDelete) { String eachFile = null; try { this.printWorkingDirectory(); for (String each : filesToDelete) { LOG.info(String.format("delete file [%s].", each)); eachFile = each; this.channelSftp.rm(each); } } catch (SftpException e) { String message = String.format( "删除文件:[%s] 时发生异常,请确认指定文件有删除权限,以及网络交互正常, errorMessage:%s", eachFile, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } private void printWorkingDirectory() { try { LOG.info(String.format("current working directory:%s", this.channelSftp.pwd())); } catch (Exception e) { LOG.warn(String.format("printWorkingDirectory error:%s", e.getMessage())); } } @Override public void completePendingCommand() { } } ================================================ FILE: ftpwriter/src/main/java/com/alibaba/datax/plugin/writer/ftpwriter/util/StandardFtpHelperImpl.java ================================================ package com.alibaba.datax.plugin.writer.ftpwriter.util; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.OutputStream; import java.net.UnknownHostException; import java.util.HashSet; import java.util.Set; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.net.ftp.FTPClient; import org.apache.commons.net.ftp.FTPClientConfig; import org.apache.commons.net.ftp.FTPFile; import org.apache.commons.net.ftp.FTPReply; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.writer.ftpwriter.FtpWriterErrorCode; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONWriter; public class StandardFtpHelperImpl implements IFtpHelper { private static final Logger LOG = LoggerFactory .getLogger(StandardFtpHelperImpl.class); FTPClient ftpClient = null; @Override public void loginFtpServer(String host, String username, String password, int port, int timeout) { this.ftpClient = new FTPClient(); try { this.ftpClient.setControlEncoding("UTF-8"); // 不需要写死ftp server的OS TYPE,FTPClient getSystemType()方法会自动识别 // this.ftpClient.configure(new FTPClientConfig(FTPClientConfig.SYST_UNIX)); this.ftpClient.setDefaultTimeout(timeout); this.ftpClient.setConnectTimeout(timeout); this.ftpClient.setDataTimeout(timeout); // 连接登录 this.ftpClient.connect(host, port); this.ftpClient.login(username, password); this.ftpClient.enterRemotePassiveMode(); this.ftpClient.enterLocalPassiveMode(); int reply = this.ftpClient.getReplyCode(); if (!FTPReply.isPositiveCompletion(reply)) { this.ftpClient.disconnect(); String message = String .format("与ftp服务器建立连接失败,host:%s, port:%s, username:%s, replyCode:%s", host, port, username, reply); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message); } } catch (UnknownHostException e) { String message = String.format( "请确认ftp服务器地址是否正确,无法连接到地址为: [%s] 的ftp服务器, errorMessage:%s", host, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message, e); } catch (IllegalArgumentException e) { String message = String.format( "请确认连接ftp服务器端口是否正确,错误的端口: [%s], errorMessage:%s", port, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message, e); } catch (Exception e) { String message = String .format("与ftp服务器建立连接失败,host:%s, port:%s, username:%s, errorMessage:%s", host, port, username, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_LOGIN, message, e); } } @Override public void logoutFtpServer() { if (this.ftpClient.isConnected()) { try { this.ftpClient.logout(); } catch (IOException e) { String message = String.format( "与ftp服务器断开连接失败, errorMessage:%s", e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_DISCONNECT, message, e); } finally { if (this.ftpClient.isConnected()) { try { this.ftpClient.disconnect(); } catch (IOException e) { String message = String.format( "与ftp服务器断开连接失败, errorMessage:%s", e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.FAIL_DISCONNECT, message, e); } } this.ftpClient = null; } } } @Override public void mkdir(String directoryPath) { String message = String.format("创建目录:%s时发生异常,请确认与ftp服务器的连接正常,拥有目录创建权限", directoryPath); try { this.printWorkingDirectory(); boolean isDirExist = this.ftpClient .changeWorkingDirectory(directoryPath); if (!isDirExist) { int replayCode = this.ftpClient.mkd(directoryPath); message = String .format("%s,replayCode:%s", message, replayCode); if (replayCode != FTPReply.COMMAND_OK && replayCode != FTPReply.PATHNAME_CREATED) { throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message); } } } catch (IOException e) { message = String.format("%s, errorMessage:%s", message, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } @Override public void mkDirRecursive(String directoryPath){ StringBuilder dirPath = new StringBuilder(); dirPath.append(IOUtils.DIR_SEPARATOR_UNIX); String[] dirSplit = StringUtils.split(directoryPath,IOUtils.DIR_SEPARATOR_UNIX); String message = String.format("创建目录:%s时发生异常,请确认与ftp服务器的连接正常,拥有目录创建权限", directoryPath); try { // ftp server不支持递归创建目录,只能一级一级创建 for(String dirName : dirSplit){ dirPath.append(dirName); boolean mkdirSuccess = mkDirSingleHierarchy(dirPath.toString()); dirPath.append(IOUtils.DIR_SEPARATOR_UNIX); if(!mkdirSuccess){ throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message); } } } catch (IOException e) { message = String.format("%s, errorMessage:%s", message, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } public boolean mkDirSingleHierarchy(String directoryPath) throws IOException { boolean isDirExist = this.ftpClient .changeWorkingDirectory(directoryPath); // 如果directoryPath目录不存在,则创建 if (!isDirExist) { int replayCode = this.ftpClient.mkd(directoryPath); if (replayCode != FTPReply.COMMAND_OK && replayCode != FTPReply.PATHNAME_CREATED) { return false; } } return true; } @Override public OutputStream getOutputStream(String filePath) { try { this.printWorkingDirectory(); String parentDir = filePath.substring(0, StringUtils.lastIndexOf(filePath, IOUtils.DIR_SEPARATOR)); this.ftpClient.changeWorkingDirectory(parentDir); this.printWorkingDirectory(); OutputStream writeOutputStream = this.ftpClient .appendFileStream(filePath); String message = String.format( "打开FTP文件[%s]获取写出流时出错,请确认文件%s有权限创建,有权限写出等", filePath, filePath); if (null == writeOutputStream) { throw DataXException.asDataXException( FtpWriterErrorCode.OPEN_FILE_ERROR, message); } return writeOutputStream; } catch (IOException e) { String message = String.format( "写出文件 : [%s] 时出错,请确认文件:[%s]存在且配置的用户有权限写, errorMessage:%s", filePath, filePath, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.OPEN_FILE_ERROR, message); } } @Override public String getRemoteFileContent(String filePath) { try { this.completePendingCommand(); this.printWorkingDirectory(); String parentDir = filePath.substring(0, StringUtils.lastIndexOf(filePath, IOUtils.DIR_SEPARATOR)); this.ftpClient.changeWorkingDirectory(parentDir); this.printWorkingDirectory(); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(22); this.ftpClient.retrieveFile(filePath, outputStream); String result = outputStream.toString(); IOUtils.closeQuietly(outputStream); return result; } catch (IOException e) { String message = String.format( "读取文件 : [%s] 时出错,请确认文件:[%s]存在且配置的用户有权限读取, errorMessage:%s", filePath, filePath, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.OPEN_FILE_ERROR, message); } } @Override public Set getAllFilesInDir(String dir, String prefixFileName) { Set allFilesWithPointedPrefix = new HashSet(); try { boolean isDirExist = this.ftpClient.changeWorkingDirectory(dir); if (!isDirExist) { throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, String.format("进入目录[%s]失败", dir)); } this.printWorkingDirectory(); FTPFile[] fs = this.ftpClient.listFiles(dir); // LOG.debug(JSON.toJSONString(this.ftpClient.listNames(dir))); LOG.debug(String.format("ls: %s", JSON.toJSONString(fs, JSONWriter.Feature.UseSingleQuotes))); for (FTPFile ff : fs) { String strName = ff.getName(); if (strName.startsWith(prefixFileName)) { allFilesWithPointedPrefix.add(strName); } } } catch (IOException e) { String message = String .format("获取path:[%s] 下文件列表时发生I/O异常,请确认与ftp服务器的连接正常,拥有目录ls权限, errorMessage:%s", dir, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } return allFilesWithPointedPrefix; } @Override public void deleteFiles(Set filesToDelete) { String eachFile = null; boolean deleteOk = false; try { this.printWorkingDirectory(); for (String each : filesToDelete) { LOG.info(String.format("delete file [%s].", each)); eachFile = each; deleteOk = this.ftpClient.deleteFile(each); if (!deleteOk) { String message = String.format( "删除文件:[%s] 时失败,请确认指定文件有删除权限", eachFile); throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message); } } } catch (IOException e) { String message = String.format( "删除文件:[%s] 时发生异常,请确认指定文件有删除权限,以及网络交互正常, errorMessage:%s", eachFile, e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } private void printWorkingDirectory() { try { LOG.info(String.format("current working directory:%s", this.ftpClient.printWorkingDirectory())); } catch (Exception e) { LOG.warn(String.format("printWorkingDirectory error:%s", e.getMessage())); } } @Override public void completePendingCommand() { /* * Q:After I perform a file transfer to the server, * printWorkingDirectory() returns null. A:You need to call * completePendingCommand() after transferring the file. wiki: * http://wiki.apache.org/commons/Net/FrequentlyAskedQuestions */ try { boolean isOk = this.ftpClient.completePendingCommand(); if (!isOk) { throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, "完成ftp completePendingCommand操作发生异常"); } } catch (IOException e) { String message = String.format( "完成ftp completePendingCommand操作发生异常, errorMessage:%s", e.getMessage()); LOG.error(message); throw DataXException.asDataXException( FtpWriterErrorCode.COMMAND_FTP_IO_EXCEPTION, message, e); } } } ================================================ FILE: ftpwriter/src/main/resources/plugin.json ================================================ { "name": "ftpwriter", "class": "com.alibaba.datax.plugin.writer.ftpwriter.FtpWriter", "description": "useScene: test. mechanism: use datax framework to transport data from ftp txt file. warn: The more you know about the data, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: ftpwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "ftpwriter", "parameter": { "name": "ftpwriter", "parameter": { "protocol": "", "host": "", "port": "", "username": "", "password": "", "timeout": "", "connectPattern": "", "path": "", "fileName": "", "writeMode": "", "fieldDelimiter": "", "encoding": "", "nullFormat": "", "dateFormat": "", "fileFormat": "", "header": [] } } } ================================================ FILE: gaussdbreader/doc/gaussdbreader.md ================================================ # GaussDbReader 插件文档 ___ ## 1 快速介绍 GaussDbReader插件实现了从GaussDB读取数据。在底层实现上,GaussDbReader通过JDBC连接远程GaussDB数据库,并执行相应的sql语句将数据从GaussDB库中SELECT出来。 ## 2 实现原理 简而言之,GaussDbReader通过JDBC连接器连接到远程的GaussDB数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程GaussDB数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,GaussDbReader将其拼接为SQL语句发送到GaussDB数据库;对于用户配置querySql信息,GaussDbReader直接将其发送到GaussDB数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从GaussDB数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte": 1048576 }, //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { "name": "gaussdbreader", "parameter": { // 数据库连接用户名 "username": "xx", // 数据库连接密码 "password": "xx", "column": [ "id","name" ], //切分主键 "splitPk": "id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:opengauss://host:port/database" ] } ] } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print":true, } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ```json { "job": { "setting": { "speed": 1048576 }, "content": [ { "reader": { "name": "gaussdbreader", "parameter": { "username": "xx", "password": "xx", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;" ], "jdbcUrl": [ "jdbc:opengauss://host:port/database", "jdbc:opengauss://host:port/database" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,GaussDbReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,GaussDbReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 jdbcUrl按照GaussDB官方规范,并可以填写连接附件控制信息。具体请参看[GaussDB官方文档](https://docs.opengauss.org/zh/docs/3.1.0/docs/Developerguide/java-sql-Connection.html)。 * 必选:是
* 默认值:无
* **username** * 描述:数据源的用户名
* 必选:是
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,GaussDbReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照GaussDB语法格式: ["id", "'hello'::varchar", "true", "2.5::real", "power(2,3)"] id为普通列名,'hello'::varchar为字符串常量,true为布尔值,2.5为浮点数, power(2,3)为函数。 **column必须用户显示指定同步的列集合,不允许为空!** * 必选:是
* 默认值:无
* **splitPk** * 描述:GaussDbReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提高数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形数据切分,`不支持浮点、字符串型、日期等其他类型`。如果用户指定其他非支持类型,GaussDbReader将报错! splitPk设置为空,底层将视作用户不允许对单表进行切分,因此使用单通道进行抽取。 * 必选:否
* 默认值:空
* **where** * 描述:筛选条件,GaussDbReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
where条件可以有效地进行业务增量同步。 where条件不配置或者为空,视作全表同步数据。 * 必选:否
* 默认值:无
* **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
`当用户配置querySql时,GaussDbReader直接忽略table、column、where条件的配置`。 * 必选:否
* 默认值:无
* **fetchSize** * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
`注意,该值过大(>2048)可能造成DataX进程OOM。`。 * 必选:否
* 默认值:1024
### 3.3 类型转换 目前GaussDbReader支持大部分GaussDB类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出GaussDbReader针对GaussDB类型转换列表: | DataX 内部类型| GaussDB 数据类型 | | -------- | ----- | | Long |bigint, bigserial, integer, smallint, serial | | Double |double precision, money, numeric, real | | String |varchar, char, text, bit, inet| | Date |date, time, timestamp | | Boolean |bool| | Bytes |bytea| 请注意: * `除上述罗列字段类型外,其他类型均不支持; money,inet,bit需用户使用a_inet::varchar类似的语法转换`。 ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: create table pref_test( id serial, a_bigint bigint, a_bit bit(10), a_boolean boolean, a_char character(5), a_date date, a_double double precision, a_integer integer, a_money money, a_num numeric(10,2), a_real real, a_smallint smallint, a_text text, a_time time, a_timestamp timestamp ) #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 16核 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 2. mem: MemTotal: 24676836kB MemFree: 6365080kB 3. net: 百兆双网卡 * GaussDB数据库机器参数为: D12 24逻辑核 192G内存 12*480G SSD 阵列 ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数 | 是否按照主键切分 | DataX速度(Rec/s) | DataX流量(MB/s) | DataX机器运行负载 | |--------|--------| --------|--------|--------| |1| 否 | 10211 | 0.63 | 0.2 | |1| 是 | 10211 | 0.63 | 0.2 | |4| 否 | 10211 | 0.63 | 0.2 | |4| 是 | 40000 | 2.48 | 0.5 | |8| 否 | 10211 | 0.63 | 0.2 | |8| 是 | 78048 | 4.84 | 0.8 | 说明: 1. 这里的单表,主键类型为 serial,数据分布均匀。 2. 对单表如果没有按照主键切分,那么配置通道个数不会提升速度,效果与1个通道一样。 ================================================ FILE: gaussdbreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 gaussdbreader gaussdbreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} org.opengauss opengauss-jdbc 3.0.0 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: gaussdbreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/gaussdbreader target/ gaussdbreader-0.0.1-SNAPSHOT.jar plugin/reader/gaussdbreader false plugin/reader/gaussdbreader/libs runtime ================================================ FILE: gaussdbreader/src/main/java/com/alibaba/datax/plugin/reader/gaussdbreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.gaussdbreader; public class Constant { public static final int DEFAULT_FETCH_SIZE = 1000; } ================================================ FILE: gaussdbreader/src/main/java/com/alibaba/datax/plugin/reader/gaussdbreader/GaussDbReader.java ================================================ package com.alibaba.datax.plugin.reader.gaussdbreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class GaussDbReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.GaussDB; public static class Job extends Reader.Job { private Configuration originalConfig; private CommonRdbmsReader.Job commonRdbmsReaderMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); int fetchSize = this.originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constant.DEFAULT_FETCH_SIZE); if (fetchSize < 1) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, String.format("您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1.", fetchSize)); } this.originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); this.commonRdbmsReaderMaster.init(this.originalConfig); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderMaster.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderSlave; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE,super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderSlave.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderSlave.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig); } } } ================================================ FILE: gaussdbreader/src/main/resources/plugin.json ================================================ { "name": "gaussdbreader", "class": "com.alibaba.datax.plugin.reader.gaussdbreader.GaussDbReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: gaussdbreader/src/main/resources/plugin_job_template.json ================================================ { "name": "gaussdbreader", "parameter": { "username": "", "password": "", "connection": [ { "table": [], "jdbcUrl": [] } ] } } ================================================ FILE: gaussdbwriter/doc/gaussdbwriter.md ================================================ # DataX GaussDbWriter --- ## 1 快速介绍 GaussDbWriter插件实现了写入数据到 GaussDB主库目的表的功能。在底层实现上,GaussDbWriter通过JDBC连接远程 GaussDB 数据库,并执行相应的 insert into ... sql 语句将数据写入 GaussDB,内部会分批次提交入库。 GaussDbWriter面向ETL开发工程师,他们使用GaussDbWriter从数仓导入数据到GaussDB。同时 GaussDbWriter亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 GaussDbWriter通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL插入语句 * `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行)
注意: 1. 目的表所在数据库必须是主库才能写入数据;整个任务至少需具备 insert into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 2. GaussDbWriter和MysqlWriter不同,不支持配置writeMode参数。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 GaussDbWriter导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "gaussdbwriter", "parameter": { "username": "xx", "password": "xx", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "jdbc:opengauss://127.0.0.1:3002/datax", "table": [ "test" ] } ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。 注意:1、在一个数据库上只能配置一个值。 2、jdbcUrl按照GaussDB官方规范,并可以填写连接附加参数信息。具体请参看GaussDB官方文档或者咨询对应 DBA。 * 必选:是
* 默认值:无
* **username** * 描述:目的数据库的用户名
* 必选:是
* 默认值:无
* **password** * 描述:目的数据库的密码
* 必选:是
* 默认值:无
* **table** * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
* 默认值:无
* **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用\*表示, 例如: "column": ["\*"] 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 * 必选:是
* 默认值:否
* **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
* 必选:否
* 默认值:无
* **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
* 必选:否
* 默认值:无
* **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与GaussDB的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
* 必选:否
* 默认值:1024
### 3.3 类型转换 目前 GaussDbWriter支持大部分 GaussDB类型,但也存在部分没有支持的情况,请注意检查你的类型。 下面列出 GaussDbWriter针对 GaussDB类型转换列表: | DataX 内部类型| GaussDB 数据类型 | | -------- | ----- | | Long |bigint, bigserial, integer, smallint, serial | | Double |double precision, money, numeric, real | | String |varchar, char, text, bit| | Date |date, time, timestamp | | Boolean |bool| | Bytes |bytea| ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: create table pref_test( id serial, a_bigint bigint, a_bit bit(10), a_boolean boolean, a_char character(5), a_date date, a_double double precision, a_integer integer, a_money money, a_num numeric(10,2), a_real real, a_smallint smallint, a_text text, a_time time, a_timestamp timestamp ) #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 16核 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 2. mem: MemTotal: 24676836kB MemFree: 6365080kB 3. net: 百兆双网卡 * GaussDB数据库机器参数为: D12 24逻辑核 192G内存 12*480G SSD 阵列 ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数| 批量提交batchSize | DataX速度(Rec/s)| DataX流量(M/s) | DataX机器运行负载 |--------|--------| --------|--------|--------|--------| |1| 128 | 9259 | 0.55 | 0.3 |1| 512 | 10869 | 0.653 | 0.3 |1| 2048 | 9803 | 0.589 | 0.8 |4| 128 | 30303 | 1.82 | 1 |4| 512 | 36363 | 2.18 | 1 |4| 2048 | 36363 | 2.18 | 1 |8| 128 | 57142 | 3.43 | 2 |8| 512 | 66666 | 4.01 | 1.5 |8| 2048 | 66666 | 4.01 | 1.1 |16| 128 | 88888 | 5.34 | 1.8 |16| 2048 | 94117 | 5.65 | 2.5 |32| 512 | 76190 | 4.58 | 3 #### 4.2.2 性能测试小结 1. `channel数对性能影响很大` 2. `通常不建议写入数据库时,通道个数 > 32` ## FAQ *** **Q: GaussDbWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 *** **Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。 第二种,向临时表导入数据,完成后再 rename 到线上表。 *** ================================================ FILE: gaussdbwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 gaussdbwriter gaussdbwriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} org.opengauss opengauss-jdbc 3.0.0 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: gaussdbwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/gaussdbwriter target/ gaussdbwriter-0.0.1-SNAPSHOT.jar plugin/writer/gaussdbwriter false plugin/writer/gaussdbwriter/libs runtime ================================================ FILE: gaussdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gaussdbwriter/GaussDbWriter.java ================================================ package com.alibaba.datax.plugin.writer.gaussdbwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.util.List; public class GaussDbWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.GaussDB; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); // warn:not like mysql, GaussDB only support insert mode, don't use String writeMode = this.originalConfig.getString(Key.WRITE_MODE); if (null != writeMode) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, String.format("写入模式(writeMode)配置有误. 因为GaussDB不支持配置参数项 writeMode: %s, GaussDB仅使用insert sql 插入数据. 请检查您的配置并作出修改.", writeMode)); } this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterMaster.init(this.originalConfig); } @Override public void prepare() { this.commonRdbmsWriterMaster.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterMaster.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterSlave; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE){ @Override public String calcValueHolder(String columnType){ if("serial".equalsIgnoreCase(columnType)){ return "?::int"; }else if("bigserial".equalsIgnoreCase(columnType)){ return "?::int8"; }else if("bit".equalsIgnoreCase(columnType)){ return "?::bit varying"; } return "?::" + columnType; } }; this.commonRdbmsWriterSlave.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); } public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterSlave.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); } } } ================================================ FILE: gaussdbwriter/src/main/resources/plugin.json ================================================ { "name": "gaussdbwriter", "class": "com.alibaba.datax.plugin.writer.gaussdbwriter.GaussDbWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: gaussdbwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "gaussdbwriter", "parameter": { "username": "", "password": "", "column": [], "connection": [ { "jdbcUrl": "", "table": [] } ], "preSql": [], "postSql": [] } } ================================================ FILE: gdbreader/doc/gdbreader.md ================================================ # DataX GDBReader ## 1. 快速介绍 GDBReader插件实现读取GDB实例数据的功能,通过`Gremlin Client`连接远程GDB实例,按配置提供的`label`生成查询DSL,遍历点或边数据,包括属性数据,并将数据写入到Record中给到Writer使用。 ## 2. 实现原理 GDBReader使用`Gremlin Client`连接GDB实例,按`label`分不同Task取点或边数据。 单个Task中按`label`遍历点或边的id,再切分范围分多次请求查询点或边和属性数据,最后将点或边数据根据配置转换成指定格式记录发送给下游写插件。 GDBReader按`label`切分多个Task并发,同一个`label`的数据批量异步获取来加快读取速度。如果配置读取的`label`列表为空,任务启动前会从GDB查询所有`label`再切分Task。 ## 3. 功能说明 GDB中点和边不同,读取需要区分点和边点配置。 ### 3.1 点配置样例 ``` { "job": { "setting": { "speed": { "channel": 1 } "errorLimit": { "record": 1 } }, "content": [ { "reader": { "name": "gdbreader", "parameter": { "host": "10.218.145.24", "port": 8182, "username": "***", "password": "***", "fetchBatchSize": 100, "rangeSplitSize": 1000, "labelType": "VERTEX", "labels": ["label1", "label2"], "column": [ { "name": "id", "type": "string", "columnType": "primaryKey" }, { "name": "label", "type": "string", "columnType": "primaryLabel" }, { "name": "age", "type": "int", "columnType": "vertexProperty" } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": true } } } ] } } ``` ### 3.2 边配置样例 ``` { "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 1 } }, "content": [ { "reader": { "name": "gdbreader", "parameter": { "host": "10.218.145.24", "port": 8182, "username": "***", "password": "***", "fetchBatchSize": 100, "rangeSplitSize": 1000, "labelType": "EDGE", "labels": ["label1", "label2"], "column": [ { "name": "id", "type": "string", "columnType": "primaryKey" }, { "name": "label", "type": "string", "columnType": "primaryLabel" }, { "name": "srcId", "type": "string", "columnType": "srcPrimaryKey" }, { "name": "srcLabel", "type": "string", "columnType": "srcPrimaryLabel" }, { "name": "dstId", "type": "string", "columnType": "srcPrimaryKey" }, { "name": "dstLabel", "type": "string", "columnType": "srcPrimaryLabel" }, { "name": "name", "type": "string", "columnType": "edgeProperty" }, { "name": "weight", "type": "double", "columnType": "edgeProperty" } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": true } } } ] } } ``` ### 3.3 参数说明 * **host** * 描述:GDB实例连接地址,对应'实例管理'->'基本信息'页面的网络地址 * 必选:是 * 默认值:无 * **port** * 描述:GDB实例连接地址对应的端口 * 必选:是 * 默认值:8182 * **username** * 描述:GDB实例账号名 * 必选:是 * 默认值:无 * **password** * 描述:GDB实例账号名对应的密码 * 必选:是 * 默认值:无 * **fetchBatchSize** * 描述:一次GDB请求读取点或边的数量,响应包含点或边以及属性 * 必选:是 * 默认值:100 * **rangeSplitSize** * 描述:id遍历,一次遍历请求扫描的id个数 * 必选:是 * 默认值:10 \* fetchBatchSize * **labels** * 描述:标签数组,即需要导出的点或边标签,支持读取多个标签,用数组表示。如果留空([]),表示GDB中所有点或边标签 * 必选:是 * 默认值:无 * **labelType** * 描述:数据标签类型,支持点、边两种枚举值 * VERTEX:表示点 * EDGE:表示边 * 必选:是 * 默认值:无 * **column** * 描述:点或边字段映射关系配置 * 必选:是 * 默认值:无 * **column -> name** * 描述:点或边映射关系的字段名,指定属性时表示读取的属性名,读取其他字段时会被忽略 * 必选:是 * 默认值:无 * **column -> type** * 描述:点或边映射关系的字段类型 * id, label在GDB中都是string类型,配置非string类型时可能会转换失败 * 普通属性支持基础类型,包括int, long, float, double, boolean, string * GDBReader尽量将读取到的数据转换成配置要求的类型,但转换失败会导致该条记录错误 * 必选:是 * 默认值:无 * **column -> columnType** * 描述:GDB点或边数据到列数据的映射关系,支持以下枚举值: * primaryKey: 表示该字段是点或边的id * primaryLabel: 表示该字段是点或边的label * srcPrimaryKey: 表示该字段是边关联的起点id,只在读取边时使用 * srcPrimaryLabel: 表示该字段是边关联的起点label,只在读取边时使用 * dstPrimaryKey: 表示该字段是边关联的终点id,只在读取边时使用 * dstPrimaryLabel: 表示该字段是边关联的终点label,只在读取边时使用 * vertexProperty: 表示该字段是点的属性,只在读取点时使用,应用到SET属性时只读取其中的一个属性值 * vertexJsonProperty: 表示该字段是点的属性集合,只在读取点时使用。属性集合使用JSON格式输出,包含所有的属性,不能与其他vertexProperty配置一起使用 * edgeProperty: 表示该字段是边的属性,只在读取边时使用 * edgeJsonProperty: 表示该字段是边的属性集合,只在读取边时使用。属性集合使用JSON格式输出,包含所有的属性,不能与其他edgeProperty配置一起使用 * 必选:是 * 默认值:无 * vertexJsonProperty格式示例,新增`c`字段区分SET属性,但是SET属性只包含单个属性值时会标记成普通属性 ``` {"properties":[ {"k":"name","t","string","v":"Jack","c":"set"}, {"k":"name","t","string","v":"Luck","c":"set"}, {"k":"age","t","int","v":"20","c":"single"} ]} ``` * edgeJsonProperty格式示例,边不支持多值属性 ``` {"properties":[ {"k":"created_at","t","long","v":"153498653"}, {"k":"weight","t","double","v":"3.14"} ]} ## 4 性能报告 (TODO) ## 5 使用约束 无 ## 6 FAQ 无 ================================================ FILE: gdbreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 gdbreader com.alibaba.datax 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax datax-core ${datax-project-version} test slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.tinkerpop gremlin-driver 3.4.1 org.projectlombok lombok 1.18.8 org.junit.jupiter junit-jupiter-api 5.4.0 test org.junit.jupiter junit-jupiter-engine 5.4.0 test maven-compiler-plugin 1.6 1.6 ${project-sourceEncoding} org.apache.maven.plugins maven-surefire-plugin 2.22.0 **/*Test*.class maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single org.apache.maven.plugins maven-compiler-plugin 8 8 ================================================ FILE: gdbreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/gdbreader target/ gdbreader-0.0.1-SNAPSHOT.jar plugin/reader/gdbreader false plugin/reader/gdbreader/libs runtime ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/GdbReader.java ================================================ package com.alibaba.datax.plugin.reader.gdbreader; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.gdbreader.mapping.DefaultGdbMapper; import com.alibaba.datax.plugin.reader.gdbreader.mapping.MappingRule; import com.alibaba.datax.plugin.reader.gdbreader.mapping.MappingRuleFactory; import com.alibaba.datax.plugin.reader.gdbreader.model.GdbElement; import com.alibaba.datax.plugin.reader.gdbreader.model.GdbGraph; import com.alibaba.datax.plugin.reader.gdbreader.model.ScriptGdbGraph; import com.alibaba.datax.plugin.reader.gdbreader.util.ConfigHelper; import org.apache.tinkerpop.gremlin.driver.ResultSet; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.LinkedList; import java.util.List; public class GdbReader extends Reader { private final static int DEFAULT_FETCH_BATCH_SIZE = 200; private static GdbGraph graph; private static Key.ExportType exportType; /** * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 *

* 整个 Reader 执行流程是: *

     * Job类init-->prepare-->split
     *
     *                            Task类init-->prepare-->startRead-->post-->destroy
     *                            Task类init-->prepare-->startRead-->post-->destroy
     *
     *                                                                             Job类post-->destroy
     * 
*/ public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration jobConfig = null; @Override public void init() { this.jobConfig = super.getPluginJobConf(); /** * 注意:此方法仅执行一次。 * 最佳实践:通常在这里对用户的配置进行校验:是否缺失必填项?有无错误值?有没有无关配置项?... * 并给出清晰的报错/警告提示。校验通常建议采用静态工具类进行,以保证本类结构清晰。 */ ConfigHelper.assertGdbClient(jobConfig); ConfigHelper.assertLabels(jobConfig); try { exportType = Key.ExportType.valueOf(jobConfig.getString(Key.EXPORT_TYPE)); } catch (NullPointerException | IllegalArgumentException e) { throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, Key.EXPORT_TYPE); } } @Override public void prepare() { /** * 注意:此方法仅执行一次。 * 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ try { graph = new ScriptGdbGraph(jobConfig, exportType); } catch (RuntimeException e) { throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_CLIENT_CONNECT, e.getMessage()); } } @Override public List split(int adviceNumber) { /** * 注意:此方法仅执行一次。 * 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。 * 这里的 adviceNumber 是框架根据用户的同步速度的要求建议的切分份数,仅供参考,不是强制必须切分的份数。 */ List labels = ConfigHelper.assertLabels(jobConfig); /** * 配置label列表为空时,尝试查询GDB中所有label,添加到读取列表 */ if (labels.isEmpty()) { try { labels.addAll(graph.getLabels().keySet()); } catch (RuntimeException ex) { throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_LABELS, ex.getMessage()); } } if (labels.isEmpty()) { throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_LABELS, "none labels to read"); } return ConfigHelper.splitConfig(jobConfig, labels); } @Override public void post() { /** * 注意:此方法仅执行一次。 * 最佳实践:如果 Job 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ } @Override public void destroy() { /** * 注意:此方法仅执行一次。 * 最佳实践:通常配合 Job 中的 post() 方法一起完成 Job 的资源释放。 */ try { graph.close(); } catch (Exception ex) { LOG.error("Failed to close client : {}", ex); } } } public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private static MappingRule rule; private Configuration taskConfig; private String fetchLabel = null; private int rangeSplitSize; private int fetchBatchSize; @Override public void init() { this.taskConfig = super.getPluginJobConf(); /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 startRead()做准备。 */ fetchLabel = taskConfig.getString(Key.LABEL); fetchBatchSize = taskConfig.getInt(Key.FETCH_BATCH_SIZE, DEFAULT_FETCH_BATCH_SIZE); rangeSplitSize = taskConfig.getInt(Key.RANGE_SPLIT_SIZE, fetchBatchSize * 10); rule = MappingRuleFactory.getInstance().create(taskConfig, exportType); } @Override public void prepare() { /** * 注意:此方法仅执行一次。 * 最佳实践:如果 Job 中有需要进行数据同步之后的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ } @Override public void startRead(RecordSender recordSender) { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:此处适当封装确保简洁清晰完成数据读取工作。 */ String start = ""; while (true) { List ids; try { ids = graph.fetchIds(fetchLabel, start, rangeSplitSize); if (ids.isEmpty()) { break; } start = ids.get(ids.size() - 1); } catch (Exception ex) { throw DataXException.asDataXException(GdbReaderErrorCode.FAIL_FETCH_IDS, ex.getMessage()); } // send range fetch async int count = ids.size(); List resultSets = new LinkedList<>(); for (int pos = 0; pos < count; pos += fetchBatchSize) { int rangeSize = Math.min(fetchBatchSize, count - pos); String endId = ids.get(pos + rangeSize - 1); String beginId = ids.get(pos); List propNames = rule.isHasProperty() ? rule.getPropertyNames() : null; try { resultSets.add(graph.fetchElementsAsync(fetchLabel, beginId, endId, propNames)); } catch (Exception ex) { // just print error logs and continues LOG.error("failed to request label: {}, start: {}, end: {}, e: {}", fetchLabel, beginId, endId, ex); } } // get range fetch dsl results resultSets.forEach(results -> { try { List elements = graph.getElement(results); elements.forEach(element -> { Record record = recordSender.createRecord(); DefaultGdbMapper.getMapper(rule).accept(element, record); recordSender.sendToWriter(record); }); recordSender.flush(); } catch (Exception ex) { LOG.error("failed to send records e {}", ex); } }); } } @Override public void post() { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ } @Override public void destroy() { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。 */ } } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/GdbReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.gdbreader; import com.alibaba.datax.common.spi.ErrorCode; public enum GdbReaderErrorCode implements ErrorCode { /** * */ BAD_CONFIG_VALUE("GdbReader-00", "The value you configured is invalid."), FAIL_CLIENT_CONNECT("GdbReader-02", "GDB connection is abnormal."), UNSUPPORTED_TYPE("GdbReader-03", "Unsupported data type conversion."), FAIL_FETCH_LABELS("GdbReader-04", "Error pulling all labels, it is recommended to configure the specified label pull."), FAIL_FETCH_IDS("GdbReader-05", "Pull range id error."), ; private final String code; private final String description; private GdbReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.gdbreader; public final class Key { /** * 此处声明插件用到的需要插件使用者提供的配置项 */ public final static String HOST = "host"; public final static String PORT = "port"; public final static String USERNAME = "username"; public static final String PASSWORD = "password"; public static final String LABEL = "labels"; public static final String EXPORT_TYPE = "labelType"; public static final String RANGE_SPLIT_SIZE = "RangeSplitSize"; public static final String FETCH_BATCH_SIZE = "fetchBatchSize"; public static final String COLUMN = "column"; public static final String COLUMN_NAME = "name"; public static final String COLUMN_TYPE = "type"; public static final String COLUMN_NODE_TYPE = "columnType"; public enum ExportType { /** * Import vertices */ VERTEX, /** * Import edges */ EDGE } public enum ColumnType { /** * vertex or edge id */ primaryKey, /** * vertex or edge label */ primaryLabel, /** * vertex property */ vertexProperty, /** * collects all vertex property to Json list */ vertexJsonProperty, /** * start vertex id of edge */ srcPrimaryKey, /** * start vertex label of edge */ srcPrimaryLabel, /** * end vertex id of edge */ dstPrimaryKey, /** * end vertex label of edge */ dstPrimaryLabel, /** * edge property */ edgeProperty, /** * collects all edge property to Json list */ edgeJsonProperty, } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/DefaultGdbMapper.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.mapping; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.reader.gdbreader.model.GdbElement; import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceProperty; import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertexProperty; import java.util.List; import java.util.Map; import java.util.function.BiConsumer; import java.util.function.Function; import java.util.stream.Collectors; /** * @author : Liu Jianping * @date : 2019/9/6 */ public class DefaultGdbMapper { public static BiConsumer getMapper(MappingRule rule) { return (gdbElement, record) -> rule.getColumns().forEach(columnMappingRule -> { Object value = null; ValueType type = columnMappingRule.getValueType(); String name = columnMappingRule.getName(); Map props = gdbElement.getProperties(); switch (columnMappingRule.getColumnType()) { case dstPrimaryKey: value = gdbElement.getTo(); break; case srcPrimaryKey: value = gdbElement.getFrom(); break; case primaryKey: value = gdbElement.getId(); break; case primaryLabel: value = gdbElement.getLabel(); break; case dstPrimaryLabel: value = gdbElement.getToLabel(); break; case srcPrimaryLabel: value = gdbElement.getFromLabel(); break; case vertexProperty: value = forVertexOnePropertyValue().apply(props.get(name)); break; case edgeProperty: value = forEdgePropertyValue().apply(props.get(name)); break; case edgeJsonProperty: value = forEdgeJsonProperties().apply(props); break; case vertexJsonProperty: value = forVertexJsonProperties().apply(props); break; default: break; } record.addColumn(type.applyObject(value)); }); } /** * parser ReferenceProperty value for edge * * @return property value */ private static Function forEdgePropertyValue() { return prop -> { if (prop instanceof ReferenceProperty) { return ((ReferenceProperty) prop).value(); } return null; }; } /** * parser ReferenceVertexProperty value for vertex * * @return the first property value in list */ private static Function forVertexOnePropertyValue() { return props -> { if (props instanceof List) { // get the first one property if more than one Object o = ((List) props).get(0); if (o instanceof ReferenceVertexProperty) { return ((ReferenceVertexProperty) o).value(); } } return null; }; } /** * parser all edge properties to json string * * @return json string */ private static Function, String> forEdgeJsonProperties() { return props -> "{\"properties\":[" + props.entrySet().stream().filter(p -> p.getValue() instanceof ReferenceProperty) .map(p -> "{\"k\":\"" + ((ReferenceProperty) p.getValue()).key() + "\"," + "\"t\":\"" + ((ReferenceProperty) p.getValue()).value().getClass().getSimpleName().toLowerCase() + "\"," + "\"v\":\"" + String.valueOf(((ReferenceProperty) p.getValue()).value()) + "\"}") .collect(Collectors.joining(",")) + "]}"; } /** * parser all vertex properties to json string, include set-property * * @return json string */ private static Function, String> forVertexJsonProperties() { return props -> "{\"properties\":[" + props.entrySet().stream().filter(p -> p.getValue() instanceof List) .map(p -> forVertexPropertyStr().apply((List) p.getValue())) .collect(Collectors.joining(",")) + "]}"; } /** * parser one vertex property to json string item, set 'cardinality' * * @return json string item */ private static Function, String> forVertexPropertyStr() { return vp -> { final String setFlag = vp.size() > 1 ? "set" : "single"; return vp.stream().filter(p -> p instanceof ReferenceVertexProperty) .map(p -> "{\"k\":\"" + ((ReferenceVertexProperty) p).key() + "\"," + "\"t\":\"" + ((ReferenceVertexProperty) p).value().getClass().getSimpleName().toLowerCase() + "\"," + "\"v\":\"" + String.valueOf(((ReferenceVertexProperty) p).value()) + "\"," + "\"c\":\"" + setFlag + "\"}") .collect(Collectors.joining(",")); }; } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/MappingRule.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.mapping; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode; import com.alibaba.datax.plugin.reader.gdbreader.Key.ColumnType; import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType; import lombok.Data; import java.util.ArrayList; import java.util.List; /** * @author : Liu Jianping * @date : 2019/9/6 */ @Data public class MappingRule { private boolean hasRelation = false; private boolean hasProperty = false; private ExportType type = ExportType.VERTEX; /** * property names for property key-value */ private List propertyNames = new ArrayList<>(); private List columns = new ArrayList<>(); void addColumn(ColumnType columnType, ValueType type, String name) { ColumnMappingRule rule = new ColumnMappingRule(); rule.setColumnType(columnType); rule.setName(name); rule.setValueType(type); if (columnType == ColumnType.vertexProperty || columnType == ColumnType.edgeProperty) { propertyNames.add(name); hasProperty = true; } boolean hasTo = columnType == ColumnType.dstPrimaryKey || columnType == ColumnType.dstPrimaryLabel; boolean hasFrom = columnType == ColumnType.srcPrimaryKey || columnType == ColumnType.srcPrimaryLabel; if (hasTo || hasFrom) { hasRelation = true; } columns.add(rule); } void addJsonColumn(ColumnType columnType) { ColumnMappingRule rule = new ColumnMappingRule(); rule.setColumnType(columnType); rule.setName("json"); rule.setValueType(ValueType.STRING); if (!propertyNames.isEmpty()) { throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, "JsonProperties should be only property"); } columns.add(rule); hasProperty = true; } @Data protected static class ColumnMappingRule { private String name = null; private ValueType valueType = null; private ColumnType columnType = null; } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/MappingRuleFactory.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.mapping; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode; import com.alibaba.datax.plugin.reader.gdbreader.Key; import com.alibaba.datax.plugin.reader.gdbreader.Key.ColumnType; import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType; import com.alibaba.datax.plugin.reader.gdbreader.util.ConfigHelper; import java.util.List; /** * @author : Liu Jianping * @date : 2019/9/20 */ public class MappingRuleFactory { private static final MappingRuleFactory instance = new MappingRuleFactory(); public static MappingRuleFactory getInstance() { return instance; } public MappingRule create(Configuration config, ExportType exportType) { MappingRule rule = new MappingRule(); rule.setType(exportType); List configurationList = config.getListConfiguration(Key.COLUMN); for (Configuration column : configurationList) { ColumnType columnType; try { columnType = ColumnType.valueOf(column.getString(Key.COLUMN_NODE_TYPE)); } catch (NullPointerException | IllegalArgumentException e) { throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, Key.COLUMN_NODE_TYPE); } if (exportType == ExportType.VERTEX) { // only id/label/property column allow when vertex ConfigHelper.assertConfig(Key.COLUMN_NODE_TYPE, () -> columnType == ColumnType.primaryKey || columnType == ColumnType.primaryLabel || columnType == ColumnType.vertexProperty || columnType == ColumnType.vertexJsonProperty); } else if (exportType == ExportType.EDGE) { // edge ConfigHelper.assertConfig(Key.COLUMN_NODE_TYPE, () -> columnType == ColumnType.primaryKey || columnType == ColumnType.primaryLabel || columnType == ColumnType.srcPrimaryKey || columnType == ColumnType.srcPrimaryLabel || columnType == ColumnType.dstPrimaryKey || columnType == ColumnType.dstPrimaryLabel || columnType == ColumnType.edgeProperty || columnType == ColumnType.edgeJsonProperty); } if (columnType == ColumnType.edgeProperty || columnType == ColumnType.vertexProperty) { String name = column.getString(Key.COLUMN_NAME); ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); ConfigHelper.assertConfig(Key.COLUMN_NAME, () -> name != null); if (propType == null) { throw DataXException.asDataXException(GdbReaderErrorCode.UNSUPPORTED_TYPE, Key.COLUMN_TYPE); } rule.addColumn(columnType, propType, name); } else if (columnType == ColumnType.vertexJsonProperty || columnType == ColumnType.edgeJsonProperty) { rule.addJsonColumn(columnType); } else { rule.addColumn(columnType, ValueType.STRING, null); } } return rule; } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/mapping/ValueType.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.mapping; import com.alibaba.datax.common.element.BoolColumn; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.StringColumn; import java.util.HashMap; import java.util.Map; import java.util.function.Function; /** * @author : Liu Jianping * @date : 2019/9/6 */ public enum ValueType { /** * transfer gdb element object value to DataX Column data *

* int, long -> LongColumn * float, double -> DoubleColumn * bool -> BooleanColumn * string -> StringColumn */ INT(Integer.class, "int", ValueTypeHolder::longColumnMapper), INTEGER(Integer.class, "integer", ValueTypeHolder::longColumnMapper), LONG(Long.class, "long", ValueTypeHolder::longColumnMapper), DOUBLE(Double.class, "double", ValueTypeHolder::doubleColumnMapper), FLOAT(Float.class, "float", ValueTypeHolder::doubleColumnMapper), BOOLEAN(Boolean.class, "boolean", ValueTypeHolder::boolColumnMapper), STRING(String.class, "string", ValueTypeHolder::stringColumnMapper), ; private Class type = null; private String shortName = null; private Function columnFunc = null; ValueType(Class type, String name, Function columnFunc) { this.type = type; this.shortName = name; this.columnFunc = columnFunc; ValueTypeHolder.shortName2type.put(shortName, this); } public static ValueType fromShortName(String name) { return ValueTypeHolder.shortName2type.get(name); } public Column applyObject(Object value) { if (value == null) { return null; } return columnFunc.apply(value); } private static class ValueTypeHolder { private static Map shortName2type = new HashMap<>(); private static LongColumn longColumnMapper(Object o) { long v; if (o instanceof Integer) { v = (int) o; } else if (o instanceof Long) { v = (long) o; } else if (o instanceof String) { v = Long.valueOf((String) o); } else { throw new RuntimeException("Failed to cast " + o.getClass() + " to Long"); } return new LongColumn(v); } private static DoubleColumn doubleColumnMapper(Object o) { double v; if (o instanceof Integer) { v = (double) (int) o; } else if (o instanceof Long) { v = (double) (long) o; } else if (o instanceof Float) { v = (double) (float) o; } else if (o instanceof Double) { v = (double) o; } else if (o instanceof String) { v = Double.valueOf((String) o); } else { throw new RuntimeException("Failed to cast " + o.getClass() + " to Double"); } return new DoubleColumn(v); } private static BoolColumn boolColumnMapper(Object o) { boolean v; if (o instanceof Integer) { v = ((int) o != 0); } else if (o instanceof Long) { v = ((long) o != 0); } else if (o instanceof Boolean) { v = (boolean) o; } else if (o instanceof String) { v = Boolean.valueOf((String) o); } else { throw new RuntimeException("Failed to cast " + o.getClass() + " to Boolean"); } return new BoolColumn(v); } private static StringColumn stringColumnMapper(Object o) { if (o instanceof String) { return new StringColumn((String) o); } else { return new StringColumn(String.valueOf(o)); } } } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/AbstractGdbGraph.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.model; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.gdbreader.Key; import org.apache.tinkerpop.gremlin.driver.Client; import org.apache.tinkerpop.gremlin.driver.Cluster; import org.apache.tinkerpop.gremlin.driver.RequestOptions; import org.apache.tinkerpop.gremlin.driver.Result; import org.apache.tinkerpop.gremlin.driver.ResultSet; import org.apache.tinkerpop.gremlin.driver.ser.Serializers; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; /** * @author : Liu Jianping * @date : 2019/9/6 */ public abstract class AbstractGdbGraph implements GdbGraph { final static int DEFAULT_TIMEOUT = 30000; private static final Logger log = LoggerFactory.getLogger(AbstractGdbGraph.class); private Client client; AbstractGdbGraph() { } AbstractGdbGraph(Configuration config) { log.info("init graphdb client"); String host = config.getString(Key.HOST); int port = config.getInt(Key.PORT); String username = config.getString(Key.USERNAME); String password = config.getString(Key.PASSWORD); try { Cluster cluster = Cluster.build(host).port(port).credentials(username, password) .serializer(Serializers.GRAPHBINARY_V1D0) .maxContentLength(1024 * 1024) .resultIterationBatchSize(64) .create(); client = cluster.connect().init(); warmClient(); } catch (RuntimeException e) { log.error("Failed to connect to GDB {}:{}, due to {}", host, port, e); throw e; } } protected List runInternal(String dsl, Map params) throws Exception { return runInternalAsync(dsl, params).all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS); } protected ResultSet runInternalAsync(String dsl, Map params) throws Exception { RequestOptions.Builder options = RequestOptions.build().timeout(DEFAULT_TIMEOUT); if (params != null && !params.isEmpty()) { params.forEach(options::addParameter); } return client.submitAsync(dsl, options.create()).get(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS); } private void warmClient() { try { runInternal("g.V('test')", null); log.info("warm graphdb client over"); } catch (Exception e) { log.error("warmClient error"); throw new RuntimeException(e); } } @Override public void close() throws Exception { if (client != null) { log.info("close graphdb client"); client.close(); } } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/GdbElement.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.model; import lombok.Data; import java.util.HashMap; import java.util.Map; /** * @author : Liu Jianping * @date : 2019/9/6 */ @Data public class GdbElement { String id = null; String label = null; String to = null; String from = null; String toLabel = null; String fromLabel = null; Map properties = new HashMap<>(); public GdbElement() { } public GdbElement(String id, String label) { this.id = id; this.label = label; } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/GdbGraph.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.model; import org.apache.tinkerpop.gremlin.driver.ResultSet; import java.util.List; import java.util.Map; /** * @author : Liu Jianping * @date : 2019/9/6 */ public interface GdbGraph extends AutoCloseable { /** * Get All labels of GraphDB * * @return labels map included numbers */ Map getLabels(); /** * Get the Ids list of special 'label', size up to 'limit' * * @param label is Label of Vertex or Edge * @param start of Ids range to get * @param limit size of Ids list * @return Ids list */ List fetchIds(String label, String start, long limit); /** * Fetch element in async mode, just send query dsl to server * * @param label node label to filter * @param start range begin(included) * @param end range end(included) * @param propNames propKey list to fetch * @return future to get result later */ ResultSet fetchElementsAsync(String label, String start, String end, List propNames); /** * Get get element from Response @{ResultSet} * * @param results Response of Server * @return element sets */ List getElement(ResultSet results); /** * close graph client * * @throws Exception if fails */ @Override void close() throws Exception; } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/model/ScriptGdbGraph.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.model; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.gdbreader.Key.ExportType; import org.apache.tinkerpop.gremlin.driver.Result; import org.apache.tinkerpop.gremlin.driver.ResultSet; import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceEdge; import org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertex; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; /** * @author : Liu Jianping * @date : 2019/9/6 */ public class ScriptGdbGraph extends AbstractGdbGraph { private static final Logger log = LoggerFactory.getLogger(ScriptGdbGraph.class); private final static String LABEL = "GDB___LABEL"; private final static String START_ID = "GDB___ID"; private final static String END_ID = "GDB___ID_END"; private final static String LIMIT = "GDB___LIMIT"; private final static String FETCH_VERTEX_IDS_DSL = "g.V().hasLabel(" + LABEL + ").has(id, gt(" + START_ID + ")).limit(" + LIMIT + ").id()"; private final static String FETCH_EDGE_IDS_DSL = "g.E().hasLabel(" + LABEL + ").has(id, gt(" + START_ID + ")).limit(" + LIMIT + ").id()"; private final static String FETCH_VERTEX_LABELS_DSL = "g.V().groupCount().by(label)"; private final static String FETCH_EDGE_LABELS_DSL = "g.E().groupCount().by(label)"; /** * fetch node range [START_ID, END_ID] */ private final static String FETCH_RANGE_VERTEX_DSL = "g.V().hasLabel(" + LABEL + ").has(id, gte(" + START_ID + ")).has(id, lte(" + END_ID + "))"; private final static String FETCH_RANGE_EDGE_DSL = "g.E().hasLabel(" + LABEL + ").has(id, gte(" + START_ID + ")).has(id, lte(" + END_ID + "))"; private final static String PART_WITH_PROP_DSL = ".as('a').project('node', 'props').by(select('a')).by(select('a').propertyMap("; private final ExportType exportType; public ScriptGdbGraph(ExportType exportType) { super(); this.exportType = exportType; } public ScriptGdbGraph(Configuration config, ExportType exportType) { super(config); this.exportType = exportType; } @Override public List fetchIds(final String label, final String start, long limit) { Map params = new HashMap(3) {{ put(LABEL, label); put(START_ID, start); put(LIMIT, limit); }}; String fetchDsl = exportType == ExportType.VERTEX ? FETCH_VERTEX_IDS_DSL : FETCH_EDGE_IDS_DSL; List ids = new ArrayList<>(); try { List results = runInternal(fetchDsl, params); // transfer result to id string results.forEach(id -> ids.add(id.getString())); } catch (Exception e) { log.error("fetch range node failed, label {}, start {}", label, start); throw new RuntimeException(e); } return ids; } @Override public ResultSet fetchElementsAsync(final String label, final String start, final String end, final List propNames) { Map params = new HashMap<>(3); params.put(LABEL, label); params.put(START_ID, start); params.put(END_ID, end); String prefixDsl = exportType == ExportType.VERTEX ? FETCH_RANGE_VERTEX_DSL : FETCH_RANGE_EDGE_DSL; StringBuilder fetchDsl = new StringBuilder(prefixDsl); if (propNames != null) { fetchDsl.append(PART_WITH_PROP_DSL); for (int i = 0; i < propNames.size(); i++) { String propName = "GDB___PK" + String.valueOf(i); params.put(propName, propNames.get(i)); fetchDsl.append(propName); if (i != propNames.size() - 1) { fetchDsl.append(", "); } } fetchDsl.append("))"); } try { return runInternalAsync(fetchDsl.toString(), params); } catch (Exception e) { log.error("Failed to fetch range node startId {}, end {} , e {}", start, end, e); throw new RuntimeException(e); } } @Override @SuppressWarnings("unchecked") public List getElement(ResultSet results) { List elements = new LinkedList<>(); try { List resultList = results.all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS); resultList.forEach(n -> { Object o = n.getObject(); GdbElement element = new GdbElement(); if (o instanceof Map) { // project response Object node = ((Map) o).get("node"); Object props = ((Map) o).get("props"); mapNodeToElement(node, element); mapPropToElement((Map) props, element); } else { // range node response mapNodeToElement(n.getObject(), element); } if (element.getId() != null) { elements.add(element); } }); } catch (Exception e) { log.error("Failed to get node: {}", e); throw new RuntimeException(e); } return elements; } private void mapNodeToElement(Object node, GdbElement element) { if (node instanceof ReferenceVertex) { ReferenceVertex v = (ReferenceVertex) node; element.setId((String) v.id()); element.setLabel(v.label()); } else if (node instanceof ReferenceEdge) { ReferenceEdge e = (ReferenceEdge) node; element.setId((String) e.id()); element.setLabel(e.label()); element.setTo((String) e.inVertex().id()); element.setToLabel(e.inVertex().label()); element.setFrom((String) e.outVertex().id()); element.setFromLabel(e.outVertex().label()); } } private void mapPropToElement(Map props, GdbElement element) { element.setProperties(props); } @Override public Map getLabels() { String dsl = exportType == ExportType.VERTEX ? FETCH_VERTEX_LABELS_DSL : FETCH_EDGE_LABELS_DSL; try { List results = runInternal(dsl, null); Map labelMap = new HashMap<>(2); Map labels = results.get(0).get(Map.class); labels.forEach((k, v) -> { String label = (String) k; Long count = (Long) v; labelMap.put(label, count); }); return labelMap; } catch (Exception e) { log.error("Failed to fetch label list, please give special labels and run again, e {}", e); throw new RuntimeException(e); } } } ================================================ FILE: gdbreader/src/main/java/com/alibaba/datax/plugin/reader/gdbreader/util/ConfigHelper.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ package com.alibaba.datax.plugin.reader.gdbreader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.gdbreader.GdbReaderErrorCode; import com.alibaba.datax.plugin.reader.gdbreader.Key; import org.apache.commons.lang3.StringUtils; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.List; import java.util.function.Supplier; /** * @author : Liu Jianping * @date : 2019/9/6 */ public interface ConfigHelper { static void assertConfig(String key, Supplier f) { if (!f.get()) { throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, key); } } static void assertHasContent(Configuration config, String key) { assertConfig(key, () -> StringUtils.isNotBlank(config.getString(key))); } static void assertGdbClient(Configuration config) { assertHasContent(config, Key.HOST); assertConfig(Key.PORT, () -> config.getInt(Key.PORT) > 0); assertHasContent(config, Key.USERNAME); assertHasContent(config, Key.PASSWORD); } static List assertLabels(Configuration config) { Object labels = config.get(Key.LABEL); if (!(labels instanceof List)) { throw DataXException.asDataXException(GdbReaderErrorCode.BAD_CONFIG_VALUE, "labels should be List"); } List list = (List) labels; List configLabels = new ArrayList<>(0); list.forEach(n -> configLabels.add(String.valueOf(n))); return configLabels; } static List splitConfig(Configuration config, List labels) { List configs = new ArrayList<>(); for (String label : labels) { Configuration conf = config.clone(); conf.set(Key.LABEL, label); configs.add(conf); } return configs; } static Configuration fromClasspath(String name) { try (InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) { return Configuration.from(is); } catch (IOException e) { throw new IllegalArgumentException("File not found: " + name); } } } ================================================ FILE: gdbreader/src/main/resources/plugin.json ================================================ { "name": "gdbreader", "class": "com.alibaba.datax.plugin.reader.gdbreader.GdbReader", "description": "useScene: prod. mechanism: connect GDB with gremlin-client, execute 'g.V().propertyMap() or g.E().propertyMap()' to get record", "developer": "alibaba" } ================================================ FILE: gdbreader/src/main/resources/plugin_job_template.json ================================================ { "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 1 } }, "content": [ { "reader": { "name": "gdbreader", "parameter": { "host": "10.218.145.24", "port": 8182, "username": "***", "password": "***", "labelType": "EDGE", "labels": ["label1", "label2"], "column": [ { "name": "id", "type": "string", "columnType": "primaryKey" }, { "name": "label", "type": "string", "columnType": "primaryLabel" }, { "name": "srcId", "type": "string", "columnType": "srcPrimaryKey" }, { "name": "srcLabel", "type": "string", "columnType": "srcPrimaryLabel" }, { "name": "dstId", "type": "string", "columnType": "srcPrimaryKey" }, { "name": "dstLabel", "type": "string", "columnType": "srcPrimaryLabel" }, { "name": "name", "type": "string", "columnType": "edgeProperty" }, { "name": "weight", "type": "double", "columnType": "edgeProperty" } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": true } } } ] } } ================================================ FILE: gdbwriter/doc/gdbwriter.md ================================================ # DataX GDBWriter ## 1 快速介绍 GDBWriter插件实现了写入数据到GDB实例的功能。GDBWriter通过`Gremlin Client`连接远程GDB实例,获取Reader的数据,生成写入DSL语句,将数据写入到GDB。 ## 2 实现原理 GDBWriter通过DataX框架获取Reader生成的协议数据,使用`g.addV/E(GDB___label).property(id, GDB___id).property(GDB___PK1, GDB___PV1)...`语句写入数据到GDB实例。 可以配置`Gremlin Client`工作在session模式,由客户端控制事务,在一次事务中实现多个记录的批量写入。 ## 3 功能说明 因为GDB中点和边的配置不同,导入时需要区分点和边的配置。 ### 3.1 点配置样例 * 这里是一份从内存生成点数据导入GDB实例的配置 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "random": "1,100", "type": "double" }, { "random": "1000,1200", "type": "long" }, { "random": "60,64", "type": "string" }, { "random": "100,1000", "type": "long" }, { "random": "32,48", "type": "string" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "gdbwriter", "parameter": { "host": "gdb-endpoint", "port": 8182, "username": "root", "password": "***", "writeMode": "INSERT", "labelType": "VERTEX", "label": "#{1}", "idTransRule": "none", "session": true, "maxRecordsInBatch": 64, "column": [ { "name": "id", "value": "#{0}", "type": "string", "columnType": "primaryKey" }, { "name": "vertex_propKey", "value": "#{2}", "type": "string", "columnType": "vertexSetProperty" }, { "name": "vertex_propKey", "value": "#{3}", "type": "long", "columnType": "vertexSetProperty" }, { "name": "vertex_propKey2", "value": "#{4}", "type": "string", "columnType": "vertexProperty" } ] } } } ] } } ``` ### 3.2 边配置样例 * 这里是一份从内存生成边数据导入GDB实例的配置 > **注意** > 下面配置导入边时,需要提前在GDB实例中写入点,要求分别存在id为`person-{{i}}`和`book-{{i}}`的点,其中i取值0~100。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "random": "100,200", "type": "double" }, { "random": "1,100", "type": "long" }, { "random": "1,100", "type": "long" }, { "random": "2000,2200", "type": "long" }, { "random": "60,64", "type": "string" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "gdbwriter", "parameter": { "host": "gdb-endpoint", "port": 8182, "username": "root", "password": "***", "writeMode": "INSERT", "labelType": "EDGE", "label": "#{3}", "idTransRule": "none", "srcIdTransRule": "labelPrefix", "dstIdTransRule": "labelPrefix", "srcLabel":"person-", "dstLabel":"book-", "session":false, "column": [ { "name": "id", "value": "#{0}", "type": "string", "columnType": "primaryKey" }, { "name": "id", "value": "#{1}", "type": "string", "columnType": "srcPrimaryKey" }, { "name": "id", "value": "#{2}", "type": "string", "columnType": "dstPrimaryKey" }, { "name": "edge_propKey", "value": "#{4}", "type": "string", "columnType": "edgeProperty" } ] } } } ] } } ``` ### 3.3 参数说明 * **host** * 描述:GDB实例连接域名,对应阿里云控制台->"图数据库 GDB"->"实例管理"->"基本信息" 中的"内网地址"; * 必选:是 * 默认值:无 * **port** * 描述:GDB实例连接端口 * 必选:是 * 默认值:8182 * **username** * 描述:GDB实例账号名 * 必选:是 * 默认值:无 * **password** * 描述:图实例账号名对应密码 * 必选:是 * 默认值:无 * **label** * 描述:类型名,即点/边名称; label支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始; * 必选:是 * 默认值:无 * **labelType** * 描述:label类型; * 枚举值"VERTEX"表示点 * 枚举值"EDGE"表示边 * 必选:是 * 默认值:无 * **srcLabel** * 描述:当label为边时,表示起点的点名称;srcLabel支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始; * 必选:labelType为边,srcIdTransRule为none时可不填写,否则必填; * 默认值:无 * **dstLabel** * 描述:当label为边时,表示终点的点名称;dstLabel支持从源列中读取,如#{0},表示取第一列字段作为label名。源列索引从0开始; * 必选:labelType为边,dstIdTransRule为none时可不填写,否则必填; * 默认值:无 * **writeMode** * 描述:导入id重复时的处理模式; * 枚举值"INSERT"表示会报错,错误记录数加1; * 枚举值"MERGE"表示更新属性值,不计入错误; * 枚举值"SKIP"表示跳过,不计入错误 * 必选:是 * 默认值:INSERT * **idTransRule** * 描述:主键id转换规则; * 枚举值"labelPrefix"表示将映射的值转换为{label名}{源字段} * 枚举值"none"表示映射的值不做转换 * 必选:是 * 默认值:"none" * **srcIdTransRule** * 描述:当label为边时,表示起点的主键id转换规则; * 枚举值"labelPrefix"表示映射的值转换为为{label名}{源字段} * 枚举值"none"表示映射的值不做转换,此时srcLabel 可不填写 * 必选:label为边时必选 * 默认值:"none" * **dstIdTransRule** * 描述:当label为边时,表示终点的主键id转换规则; * 枚举值"labelPrefix"表示映射的值转换为为{label名}{源字段} * 枚举值"none"表示映射的值不做转换,此时dstLabel 可不填写 * 必选:label为边时必选 * 默认值:"none" * **session** * 描述:是否使用`Gremlin Client`的session模式写入数据 * 必选:否 * 默认值:false * **maxRecordsInBatch** * 描述:使用`Gremlin Client`的session模式时,一次事务处理的记录数 * 必选:否 * 默认值:16 * **column** * 描述:点/边字段映射关系配置 * 必选:是 * 默认值:无 * **column -> name** * 描述:点/边映射关系的字段名 * 必选:是 * 默认值:无 * **column -> value** * 描述:点/边映射关系的字段值; * #{N}表示直接映射源端值,N为源端column索引,从0开始;#{0}表示映射源端column第1个字段; * test-#{0} 表示源端值做拼接转换,#{0}值前/后可添加固定字符串; * #{0}-#{1}表示做多字段拼接,也可在任意位置添加固定字符串,如test-#{0}-test1-#{1}-test2 * 必选:是 * 默认值:无 * **column -> type** * 描述:点/边映射关系的字段值类型; * 主键id只支持string类型,GDBWriter插件会强制转换,源id必须保证可转换为string; * 普通属性支持类型:int, long, float, double, boolean, string * 必选:是 * 默认值:无 * **column -> columnType** * 描述:点/边映射关系字段对应到GDB点/边数据的类型,支持以下几类枚举值: * 公共枚举值: * primaryKey:表示该字段是主键id * 点枚举值: * vertexProperty:labelType为点时,表示该字段是点的普通属性 * vertexSetProperty:labelType为点时,表示该字段是点的SET属性,value是SET属性中的一个属性值 * vertexJsonProperty:labelType为点时,表示是点json属性,value结构请见备注**json properties示例**,点配置最多只允许出现一个json属性; * 边枚举值: * srcPrimaryKey:labelType为边时,表示该字段是起点主键id * dstPrimaryKey:labelType为边时,表示该字段是终点主键id * edgeProperty:labelType为边时,表示该字段是边的普通属性 * edgeJsonProperty:labelType为边时,表示是边json属性,value结构请见备注**json properties示例**,边配置最多只允许出现一个json属性; * 必选:是 * 默认值:无 * 备注:**json properties示例** > ```json > {"properties":[ > {"k":"name","t":"string","v":"tom"}, > {"k":"age","t":"int","v":"20"}, > {"k":"sex","t":"string","v":"male"} > ]} > > # json格式同样支持给点添加SET属性,格式如下 > {"properties":[ > {"k":"name","t":"string","v":"tom","c":"set"}, > {"k":"name","t":"string","v":"jack","c":"set"}, > {"k":"age","t":"int","v":"20"}, > {"k":"sex","t":"string","v":"male"} > ]} > ``` ## 4 性能报告 ### 4.1 环境参数 GDB实例规格 - 16core 128GB, 1TB SSD DataX压测机器 - cpu: 4 * Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz - mem: 16GB - net: 千兆双网卡 - os: CentOS 7, 3.10.0-957.5.1.el7.x86_64 - jvm: -Xms4g -Xmx4g ### 4.2 数据特征 ``` { id: random double(1~10000) from: random long(1~40000000) to: random long(1~40000000) label: random long(20000000 ~ 20005000) propertyKey: random string(len: 120~128) propertyName: random string(len: 120~128) } ``` - 点/边都有一个属性,属性key和value都是长度120~128字节的随机字符串 - label是范围20000000 ~ 20005000的随机整数转换的字符串 - id是浮点数转换的字符串,防止重复 - 边包含关联起点和终点,测试边时已经提前导入twitter数据集的点数据(4200W) ### 4.3 任务配置 分点和边的配置,具体配置与上述的示例配置相似,下面列出关键的差异点 - 增加并发任务数量 > "channel": 32 - 使用session模式 > "session": true - 增加事务批量处理记录个数 > "maxRecordsInBatch": 128 ### 4.4 测试结果 点导入性能: - 任务平均流量: 4.07MB/s - 任务总计耗时: 412s - 记录写入速度: 15609rec/s - 读出记录总数: 6400000 边导入性能: - 任务平均流量: 2.76MB/s - 任务总计耗时: 1602s - 记录写入速度: 10000rec/s - 读出记录总数: 16000000 ## 5 约束限制 - 导入边记录前要求GDB中已经存在边关联的起点/终点 - GDBWriter插件与用户查询DSL使用相同的GDB实例端口,导入时可能会影响查询性能 ## FAQ 1. 使用SET属性需要升级GDB实例到`1.0.20`版本及以上。 2. 边只支持普通单值属性,不能给边写SET属性数据。 ================================================ FILE: gdbwriter/pom.xml ================================================ 4.0.0 datax-all com.alibaba.datax 0.0.1-SNAPSHOT gdbwriter gdbwriter jar 1.8 1.8 3.4.1 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax datax-core ${datax-project-version} test slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.tinkerpop gremlin-driver ${gremlin.version} org.projectlombok lombok 1.18.8 com.github.ben-manes.caffeine caffeine 2.4.0 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: gdbwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/gdbwriter target/ gdbwriter-0.0.1-SNAPSHOT.jar plugin/writer/gdbwriter false plugin/writer/gdbwriter/libs runtime ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriter.java ================================================ package com.alibaba.datax.plugin.writer.gdbwriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbGraphManager; import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig; import com.alibaba.datax.plugin.writer.gdbwriter.mapping.DefaultGdbMapper; import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRule; import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRuleFactory; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbGraph; import groovy.lang.Tuple2; import io.netty.util.concurrent.DefaultThreadFactory; import lombok.extern.slf4j.Slf4j; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Future; import java.util.concurrent.LinkedBlockingDeque; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import java.util.function.Function; public class GdbWriter extends Writer { private static final Logger log = LoggerFactory.getLogger(GdbWriter.class); private static Function mapper = null; private static GdbGraph globalGraph = null; private static boolean session = false; /** * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 *

* 整个 Writer 执行流程是: * *

     * Job类init-->prepare-->split
     *
     *                          Task类init-->prepare-->startWrite-->post-->destroy
     *                          Task类init-->prepare-->startWrite-->post-->destroy
     *
     *                                                                            Job类post-->destroy
     * 
*/ public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration jobConfig = null; @Override public void init() { LOG.info("GDB datax plugin writer job init begin ..."); this.jobConfig = getPluginJobConf(); GdbWriterConfig.of(this.jobConfig); LOG.info("GDB datax plugin writer job init end."); /** * 注意:此方法仅执行一次。 * 最佳实践:通常在这里对用户的配置进行校验:是否缺失必填项?有无错误值?有没有无关配置项?... * 并给出清晰的报错/警告提示。校验通常建议采用静态工具类进行,以保证本类结构清晰。 */ } @Override public void prepare() { /** * 注意:此方法仅执行一次。 * 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ super.prepare(); final MappingRule rule = MappingRuleFactory.getInstance().createV2(this.jobConfig); mapper = new DefaultGdbMapper(this.jobConfig).getMapper(rule); session = this.jobConfig.getBool(Key.SESSION_STATE, false); /** * client connect check before task */ try { globalGraph = GdbGraphManager.instance().getGraph(this.jobConfig, false); } catch (final RuntimeException e) { throw DataXException.asDataXException(GdbWriterErrorCode.FAIL_CLIENT_CONNECT, e.getMessage()); } } @Override public List split(final int mandatoryNumber) { /** * 注意:此方法仅执行一次。 * 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。 * 这里的 mandatoryNumber 是强制必须切分的份数。 */ LOG.info("split begin..."); final List configurationList = new ArrayList(); for (int i = 0; i < mandatoryNumber; i++) { configurationList.add(this.jobConfig.clone()); } LOG.info("split end..."); return configurationList; } @Override public void post() { /** * 注意:此方法仅执行一次。 * 最佳实践:如果 Job 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ globalGraph.close(); } @Override public void destroy() { /** * 注意:此方法仅执行一次。 * 最佳实践:通常配合 Job 中的 post() 方法一起完成 Job 的资源释放。 */ } } @Slf4j public static class Task extends Writer.Task { private Configuration taskConfig; private int failed = 0; private int batchRecords; private ExecutorService submitService = null; private GdbGraph graph; @Override public void init() { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 startWrite()做准备。 */ this.taskConfig = super.getPluginJobConf(); this.batchRecords = this.taskConfig.getInt(Key.MAX_RECORDS_IN_BATCH, GdbWriterConfig.DEFAULT_RECORD_NUM_IN_BATCH); this.submitService = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingDeque<>(), new DefaultThreadFactory("submit-dsl")); if (!session) { this.graph = globalGraph; } else { /** * 分批创建session client,由于服务端groovy编译性能的限制 */ try { Thread.sleep((getTaskId() / 10) * 10000); } catch (final Exception e) { // ... } this.graph = GdbGraphManager.instance().getGraph(this.taskConfig, session); } } @Override public void prepare() { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:如果 Task 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ super.prepare(); } @Override public void startWrite(final RecordReceiver recordReceiver) { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:此处适当封装确保简洁清晰完成数据写入工作。 */ Record r; Future future = null; List> records = new ArrayList<>(this.batchRecords); while ((r = recordReceiver.getFromReader()) != null) { try { records.add(new Tuple2<>(r, mapper.apply(r))); } catch (final Exception ex) { getTaskPluginCollector().collectDirtyRecord(r, ex); continue; } if (records.size() >= this.batchRecords) { wait4Submit(future); final List> batch = records; future = this.submitService.submit(() -> batchCommitRecords(batch)); records = new ArrayList<>(this.batchRecords); } } wait4Submit(future); if (!records.isEmpty()) { final List> batch = records; future = this.submitService.submit(() -> batchCommitRecords(batch)); wait4Submit(future); } } private void wait4Submit(final Future future) { if (future == null) { return; } try { future.get(); } catch (final Exception e) { e.printStackTrace(); } } private boolean batchCommitRecords(final List> records) { final TaskPluginCollector collector = getTaskPluginCollector(); try { final List> errors = this.graph.add(records); errors.forEach(t -> collector.collectDirtyRecord(t.getFirst(), t.getSecond())); this.failed += errors.size(); } catch (final Exception e) { records.forEach(t -> collector.collectDirtyRecord(t.getFirst(), e)); this.failed += records.size(); } records.clear(); return true; } @Override public void post() { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ log.info("Task done, dirty record count - {}", this.failed); } @Override public void destroy() { /** * 注意:此方法每个 Task 都会执行一次。 * 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。 */ if (session) { this.graph.close(); } this.submitService.shutdown(); } } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/GdbWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.gdbwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum GdbWriterErrorCode implements ErrorCode { BAD_CONFIG_VALUE("GdbWriter-00", "您配置的值不合法."), CONFIG_ITEM_MISS("GdbWriter-01", "您配置项缺失."), FAIL_CLIENT_CONNECT("GdbWriter-02", "GDB连接异常."),; private final String code; private final String description; private GdbWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.gdbwriter; public final class Key { /** * 此处声明插件用到的需要插件使用者提供的配置项 */ public final static String HOST = "host"; public final static String PORT = "port"; public final static String USERNAME = "username"; public static final String PASSWORD = "password"; /** * import type and mode */ public static final String IMPORT_TYPE = "labelType"; public static final String UPDATE_MODE = "writeMode"; /** * label prefix issue */ public static final String ID_TRANS_RULE = "idTransRule"; public static final String SRC_ID_TRANS_RULE = "srcIdTransRule"; public static final String DST_ID_TRANS_RULE = "dstIdTransRule"; public static final String LABEL = "label"; public static final String SRC_LABEL = "srcLabel"; public static final String DST_LABEL = "dstLabel"; public static final String MAPPING = "mapping"; /** * column define in Gdb */ public static final String COLUMN = "column"; public static final String COLUMN_NAME = "name"; public static final String COLUMN_VALUE = "value"; public static final String COLUMN_TYPE = "type"; public static final String COLUMN_NODE_TYPE = "columnType"; /** * Gdb Vertex/Edge elements */ public static final String ID = "id"; public static final String FROM = "from"; public static final String TO = "to"; public static final String PROPERTIES = "properties"; public static final String PROP_KEY = "name"; public static final String PROP_VALUE = "value"; public static final String PROP_TYPE = "type"; public static final String PROPERTIES_JSON_STR = "propertiesJsonStr"; public static final String MAX_PROPERTIES_BATCH_NUM = "maxPropertiesBatchNumber"; /** * session less client configure for connect pool */ public static final String MAX_IN_PROCESS_PER_CONNECTION = "maxInProcessPerConnection"; public static final String MAX_CONNECTION_POOL_SIZE = "maxConnectionPoolSize"; public static final String MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = "maxSimultaneousUsagePerConnection"; public static final String MAX_RECORDS_IN_BATCH = "maxRecordsInBatch"; public static final String SESSION_STATE = "session"; /** * request length limit, include gdb element string length GDB字段长度限制配置,可分别配置各字段的限制,超过限制的记录会当脏数据处理 */ public static final String MAX_GDB_STRING_LENGTH = "maxStringLengthLimit"; public static final String MAX_GDB_ID_LENGTH = "maxIdStringLengthLimit"; public static final String MAX_GDB_LABEL_LENGTH = "maxLabelStringLengthLimit"; public static final String MAX_GDB_PROP_KEY_LENGTH = "maxPropKeyStringLengthLimit"; public static final String MAX_GDB_PROP_VALUE_LENGTH = "maxPropValueStringLengthLimit"; public static final String MAX_GDB_REQUEST_LENGTH = "maxRequestLengthLimit"; public static enum ImportType { /** * Import vertices */ VERTEX, /** * Import edges */ EDGE; } public static enum UpdateMode { /** * Insert new records, fail if exists */ INSERT, /** * Skip this record if exists */ SKIP, /** * Update property of this record if exists */ MERGE; } public static enum ColumnType { /** * vertex or edge id */ primaryKey, /** * vertex property */ vertexProperty, /** * vertex setProperty */ vertexSetProperty, /** * start vertex id of edge */ srcPrimaryKey, /** * end vertex id of edge */ dstPrimaryKey, /** * edge property */ edgeProperty, /** * vertex json style property */ vertexJsonProperty, /** * edge json style property */ edgeJsonProperty } public static enum IdTransRule { /** * vertex or edge id with 'label' prefix */ labelPrefix, /** * vertex or edge id raw */ none } public static enum PropertyType { /** * single Vertex Property */ single, /** * set Vertex Property */ set } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbGraphManager.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.client; import java.util.ArrayList; import java.util.List; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbGraph; import com.alibaba.datax.plugin.writer.gdbwriter.model.ScriptGdbGraph; /** * @author jerrywang * */ public class GdbGraphManager implements AutoCloseable { private static final GdbGraphManager INSTANCE = new GdbGraphManager(); private List graphs = new ArrayList<>(); public static GdbGraphManager instance() { return INSTANCE; } public GdbGraph getGraph(final Configuration config, final boolean session) { final GdbGraph graph = new ScriptGdbGraph(config, session); this.graphs.add(graph); return graph; } @Override public void close() { for (final GdbGraph graph : this.graphs) { graph.close(); } this.graphs.clear(); } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/client/GdbWriterConfig.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.client; import static com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper.assertConfig; import static com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper.assertHasContent; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.Key; /** * @author jerrywang * */ public class GdbWriterConfig { public static final int DEFAULT_MAX_IN_PROCESS_PER_CONNECTION = 4; public static final int DEFAULT_MAX_CONNECTION_POOL_SIZE = 8; public static final int DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION = 8; public static final int DEFAULT_BATCH_PROPERTY_NUM = 30; public static final int DEFAULT_RECORD_NUM_IN_BATCH = 16; public static final int MAX_STRING_LENGTH = 10240; public static final int MAX_REQUEST_LENGTH = 65535 - 1000; private Configuration config; private GdbWriterConfig(final Configuration config) { this.config = config; validate(); } public static GdbWriterConfig of(final Configuration config) { return new GdbWriterConfig(config); } private void validate() { assertHasContent(this.config, Key.HOST); assertConfig(Key.PORT, () -> this.config.getInt(Key.PORT) > 0); assertHasContent(this.config, Key.USERNAME); assertHasContent(this.config, Key.PASSWORD); } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/DefaultGdbMapper.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.mapping; import static com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType.VERTEX; import java.util.ArrayList; import java.util.List; import java.util.UUID; import java.util.function.BiConsumer; import java.util.function.Function; import java.util.regex.Matcher; import java.util.regex.Pattern; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.Key; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbEdge; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbVertex; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import lombok.extern.slf4j.Slf4j; /** * @author jerrywang * */ @Slf4j public class DefaultGdbMapper implements GdbMapper { private static final Pattern STR_DOLLAR_PATTERN = Pattern.compile("\\$\\{(\\d+)}"); private static final Pattern NORMAL_DOLLAR_PATTERN = Pattern.compile("^\\$\\{(\\d+)}$"); private static final Pattern STR_NUM_PATTERN = Pattern.compile("#\\{(\\d+)}"); private static final Pattern NORMAL_NUM_PATTERN = Pattern.compile("^#\\{(\\d+)}$"); public DefaultGdbMapper() {} public DefaultGdbMapper(final Configuration config) { MapperConfig.getInstance().updateConfig(config); } private static BiConsumer forElement(final MappingRule rule) { final boolean numPattern = rule.isNumPattern(); final List> properties = new ArrayList<>(); for (final MappingRule.PropertyMappingRule propRule : rule.getProperties()) { final Function keyFunc = forStrColumn(numPattern, propRule.getKey()); if (propRule.getValueType() == ValueType.STRING) { final Function valueFunc = forStrColumn(numPattern, propRule.getValue()); properties.add((r, e) -> { e.addProperty(keyFunc.apply(r), valueFunc.apply(r), propRule.getPType()); }); } else { final Function valueFunc = forObjColumn(numPattern, propRule.getValue(), propRule.getValueType()); properties.add((r, e) -> { e.addProperty(keyFunc.apply(r), valueFunc.apply(r), propRule.getPType()); }); } } if (rule.getPropertiesJsonStr() != null) { final Function jsonFunc = forStrColumn(numPattern, rule.getPropertiesJsonStr()); properties.add((r, e) -> { final String propertiesStr = jsonFunc.apply(r); final JSONObject root = (JSONObject)JSONObject.parse(propertiesStr); final JSONArray propertiesList = root.getJSONArray("properties"); for (final Object object : propertiesList) { final JSONObject jsonObject = (JSONObject)object; final String key = jsonObject.getString("k"); final String name = jsonObject.getString("v"); final String type = jsonObject.getString("t"); final String card = jsonObject.getString("c"); if (key == null || name == null) { continue; } addToProperties(e, key, name, type, card); } }); } final BiConsumer ret = (r, e) -> { final String label = forStrColumn(numPattern, rule.getLabel()).apply(r); String id = forStrColumn(numPattern, rule.getId()).apply(r); if (rule.getImportType() == Key.ImportType.EDGE) { final String to = forStrColumn(numPattern, rule.getTo()).apply(r); final String from = forStrColumn(numPattern, rule.getFrom()).apply(r); if (to == null || from == null) { log.error("invalid record to: {} , from: {}", to, from); throw new IllegalArgumentException("to or from missed in edge"); } ((GdbEdge)e).setTo(to); ((GdbEdge)e).setFrom(from); // generate UUID for edge if (id == null) { id = UUID.randomUUID().toString(); } } if (id == null || label == null) { log.error("invalid record id: {} , label: {}", id, label); throw new IllegalArgumentException("id or label missed"); } e.setId(id); e.setLabel(label); properties.forEach(p -> p.accept(r, e)); }; return ret; } private static Function forObjColumn(final boolean numPattern, final String rule, final ValueType type) { final Pattern pattern = numPattern ? NORMAL_NUM_PATTERN : NORMAL_DOLLAR_PATTERN; final Matcher m = pattern.matcher(rule); if (m.matches()) { final int index = Integer.valueOf(m.group(1)); return r -> type.applyColumn(r.getColumn(index)); } else { return r -> type.fromStrFunc(rule); } } private static Function forStrColumn(final boolean numPattern, final String rule) { final List> list = new ArrayList<>(); final Pattern pattern = numPattern ? STR_NUM_PATTERN : STR_DOLLAR_PATTERN; final Matcher m = pattern.matcher(rule); int last = 0; while (m.find()) { final String index = m.group(1); // as simple integer index. final int i = Integer.parseInt(index); final int tmp = last; final int start = m.start(); list.add((sb, record) -> { sb.append(rule.subSequence(tmp, start)); if (record.getColumn(i) != null && record.getColumn(i).getByteSize() > 0) { sb.append(record.getColumn(i).asString()); } }); last = m.end(); } final int tmp = last; list.add((sb, record) -> { sb.append(rule.subSequence(tmp, rule.length())); }); return r -> { final StringBuilder sb = new StringBuilder(); list.forEach(c -> c.accept(sb, r)); final String res = sb.toString(); return res.isEmpty() ? null : res; }; } private static boolean addToProperties(final GdbElement e, final String key, final String value, final String type, final String card) { final Object pValue; final ValueType valueType = ValueType.fromShortName(type); if (valueType == ValueType.STRING) { pValue = value; } else if (valueType == ValueType.INT || valueType == ValueType.INTEGER) { pValue = Integer.valueOf(value); } else if (valueType == ValueType.LONG) { pValue = Long.valueOf(value); } else if (valueType == ValueType.DOUBLE) { pValue = Double.valueOf(value); } else if (valueType == ValueType.FLOAT) { pValue = Float.valueOf(value); } else if (valueType == ValueType.BOOLEAN) { pValue = Boolean.valueOf(value); } else { log.error("invalid property key {}, value {}, type {}", key, value, type); return false; } // apply vertexSetProperty if (Key.PropertyType.set.name().equals(card) && (e instanceof GdbVertex)) { e.addProperty(key, pValue, Key.PropertyType.set); } else { e.addProperty(key, pValue); } return true; } @Override public Function getMapper(final MappingRule rule) { return r -> { final GdbElement e = (rule.getImportType() == VERTEX) ? new GdbVertex() : new GdbEdge(); forElement(rule).accept(r, e); return e; }; } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/GdbMapper.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.mapping; import java.util.function.Function; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.writer.gdbwriter.model.GdbElement; /** * @author jerrywang * */ public interface GdbMapper { Function getMapper(MappingRule rule); } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MapperConfig.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public * License version 2 as published by the Free Software Foundation. */ package com.alibaba.datax.plugin.writer.gdbwriter.mapping; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.Key; import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig; /** * @author : Liu Jianping * @date : 2019/10/15 */ public class MapperConfig { private static MapperConfig instance = new MapperConfig(); private int maxIdLength; private int maxLabelLength; private int maxPropKeyLength; private int maxPropValueLength; private MapperConfig() { this.maxIdLength = GdbWriterConfig.MAX_STRING_LENGTH; this.maxLabelLength = GdbWriterConfig.MAX_STRING_LENGTH; this.maxPropKeyLength = GdbWriterConfig.MAX_STRING_LENGTH; this.maxPropValueLength = GdbWriterConfig.MAX_STRING_LENGTH; } public static MapperConfig getInstance() { return instance; } public void updateConfig(final Configuration config) { final int length = config.getInt(Key.MAX_GDB_STRING_LENGTH, GdbWriterConfig.MAX_STRING_LENGTH); Integer sLength = config.getInt(Key.MAX_GDB_ID_LENGTH); this.maxIdLength = sLength == null ? length : sLength; sLength = config.getInt(Key.MAX_GDB_LABEL_LENGTH); this.maxLabelLength = sLength == null ? length : sLength; sLength = config.getInt(Key.MAX_GDB_PROP_KEY_LENGTH); this.maxPropKeyLength = sLength == null ? length : sLength; sLength = config.getInt(Key.MAX_GDB_PROP_VALUE_LENGTH); this.maxPropValueLength = sLength == null ? length : sLength; } public int getMaxIdLength() { return this.maxIdLength; } public int getMaxLabelLength() { return this.maxLabelLength; } public int getMaxPropKeyLength() { return this.maxPropKeyLength; } public int getMaxPropValueLength() { return this.maxPropValueLength; } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRule.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.mapping; import java.util.ArrayList; import java.util.List; import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType; import com.alibaba.datax.plugin.writer.gdbwriter.Key.PropertyType; import lombok.Data; /** * @author jerrywang * */ @Data public class MappingRule { private String id = null; private String label = null; private ImportType importType = null; private String from = null; private String to = null; private List properties = new ArrayList<>(); private String propertiesJsonStr = null; private boolean numPattern = false; @Data public static class PropertyMappingRule { private String key = null; private String value = null; private ValueType valueType = null; private PropertyType pType = PropertyType.single; } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/MappingRuleFactory.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.mapping; import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode; import com.alibaba.datax.plugin.writer.gdbwriter.Key; import com.alibaba.datax.plugin.writer.gdbwriter.Key.ColumnType; import com.alibaba.datax.plugin.writer.gdbwriter.Key.IdTransRule; import com.alibaba.datax.plugin.writer.gdbwriter.Key.ImportType; import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MappingRule.PropertyMappingRule; import com.alibaba.datax.plugin.writer.gdbwriter.util.ConfigHelper; import lombok.extern.slf4j.Slf4j; /** * @author jerrywang * */ @Slf4j public class MappingRuleFactory { private static final MappingRuleFactory instance = new MappingRuleFactory(); private static final Pattern STR_PATTERN = Pattern.compile("\\$\\{(\\d+)}"); private static final Pattern STR_NUM_PATTERN = Pattern.compile("#\\{(\\d+)}"); public static MappingRuleFactory getInstance() { return instance; } private static boolean isPattern(final String value, final MappingRule rule, final boolean checked) { if (checked) { return true; } if (value == null || value.isEmpty()) { return false; } Matcher m = STR_PATTERN.matcher(value); if (m.find()) { rule.setNumPattern(false); return true; } m = STR_NUM_PATTERN.matcher(value); if (m.find()) { rule.setNumPattern(true); return true; } return false; } @Deprecated public MappingRule create(final Configuration config, final ImportType type) { final MappingRule rule = new MappingRule(); rule.setId(config.getString(Key.ID)); rule.setLabel(config.getString(Key.LABEL)); if (type == ImportType.EDGE) { rule.setFrom(config.getString(Key.FROM)); rule.setTo(config.getString(Key.TO)); } rule.setImportType(type); final List configurations = config.getListConfiguration(Key.PROPERTIES); if (configurations != null) { for (final Configuration prop : config.getListConfiguration(Key.PROPERTIES)) { final PropertyMappingRule propRule = new PropertyMappingRule(); propRule.setKey(prop.getString(Key.PROP_KEY)); propRule.setValue(prop.getString(Key.PROP_VALUE)); propRule.setValueType(ValueType.fromShortName(prop.getString(Key.PROP_TYPE).toLowerCase())); rule.getProperties().add(propRule); } } final String propertiesJsonStr = config.getString(Key.PROPERTIES_JSON_STR, null); if (propertiesJsonStr != null) { rule.setPropertiesJsonStr(propertiesJsonStr); } return rule; } public MappingRule createV2(final Configuration config) { try { final ImportType type = ImportType.valueOf(config.getString(Key.IMPORT_TYPE)); return createV2(config, type); } catch (final NullPointerException e) { throw DataXException.asDataXException(GdbWriterErrorCode.CONFIG_ITEM_MISS, Key.IMPORT_TYPE); } catch (final IllegalArgumentException e) { throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, Key.IMPORT_TYPE); } } public MappingRule createV2(final Configuration config, final ImportType type) { final MappingRule rule = new MappingRule(); boolean patternChecked = false; ConfigHelper.assertHasContent(config, Key.LABEL); rule.setLabel(config.getString(Key.LABEL)); rule.setImportType(type); patternChecked = isPattern(rule.getLabel(), rule, patternChecked); IdTransRule srcTransRule = IdTransRule.none; IdTransRule dstTransRule = IdTransRule.none; if (type == ImportType.EDGE) { ConfigHelper.assertHasContent(config, Key.SRC_ID_TRANS_RULE); ConfigHelper.assertHasContent(config, Key.DST_ID_TRANS_RULE); srcTransRule = IdTransRule.valueOf(config.getString(Key.SRC_ID_TRANS_RULE)); dstTransRule = IdTransRule.valueOf(config.getString(Key.DST_ID_TRANS_RULE)); if (srcTransRule == IdTransRule.labelPrefix) { ConfigHelper.assertHasContent(config, Key.SRC_LABEL); } if (dstTransRule == IdTransRule.labelPrefix) { ConfigHelper.assertHasContent(config, Key.DST_LABEL); } } ConfigHelper.assertHasContent(config, Key.ID_TRANS_RULE); final IdTransRule transRule = IdTransRule.valueOf(config.getString(Key.ID_TRANS_RULE)); final List configurationList = config.getListConfiguration(Key.COLUMN); ConfigHelper.assertConfig(Key.COLUMN, () -> (configurationList != null && !configurationList.isEmpty())); for (final Configuration column : configurationList) { ConfigHelper.assertHasContent(column, Key.COLUMN_NAME); ConfigHelper.assertHasContent(column, Key.COLUMN_VALUE); ConfigHelper.assertHasContent(column, Key.COLUMN_TYPE); ConfigHelper.assertHasContent(column, Key.COLUMN_NODE_TYPE); final String columnValue = column.getString(Key.COLUMN_VALUE); final ColumnType columnType = ColumnType.valueOf(column.getString(Key.COLUMN_NODE_TYPE)); if (columnValue == null || columnValue.isEmpty()) { // only allow edge empty id ConfigHelper.assertConfig("empty column value", () -> (type == ImportType.EDGE && columnType == ColumnType.primaryKey)); } patternChecked = isPattern(columnValue, rule, patternChecked); if (columnType == ColumnType.primaryKey) { final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING)); if (transRule == IdTransRule.labelPrefix) { rule.setId(config.getString(Key.LABEL) + columnValue); } else { rule.setId(columnValue); } } else if (columnType == ColumnType.edgeJsonProperty || columnType == ColumnType.vertexJsonProperty) { // only support one json property in column ConfigHelper.assertConfig("multi JsonProperty", () -> (rule.getPropertiesJsonStr() == null)); rule.setPropertiesJsonStr(columnValue); } else if (columnType == ColumnType.vertexProperty || columnType == ColumnType.edgeProperty || columnType == ColumnType.vertexSetProperty) { final PropertyMappingRule propertyMappingRule = new PropertyMappingRule(); propertyMappingRule.setKey(column.getString(Key.COLUMN_NAME)); propertyMappingRule.setValue(columnValue); final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); ConfigHelper.assertConfig("unsupported property type", () -> propType != null); if (columnType == ColumnType.vertexSetProperty) { propertyMappingRule.setPType(Key.PropertyType.set); } propertyMappingRule.setValueType(propType); rule.getProperties().add(propertyMappingRule); } else if (columnType == ColumnType.srcPrimaryKey) { if (type != ImportType.EDGE) { continue; } final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING)); if (srcTransRule == IdTransRule.labelPrefix) { rule.setFrom(config.getString(Key.SRC_LABEL) + columnValue); } else { rule.setFrom(columnValue); } } else if (columnType == ColumnType.dstPrimaryKey) { if (type != ImportType.EDGE) { continue; } final ValueType propType = ValueType.fromShortName(column.getString(Key.COLUMN_TYPE)); ConfigHelper.assertConfig("only string is allowed in primary key", () -> (propType == ValueType.STRING)); if (dstTransRule == IdTransRule.labelPrefix) { rule.setTo(config.getString(Key.DST_LABEL) + columnValue); } else { rule.setTo(columnValue); } } } if (rule.getImportType() == ImportType.EDGE) { if (rule.getId() == null) { rule.setId(""); log.info("edge id is missed, uuid be default"); } ConfigHelper.assertConfig("to needed in edge", () -> (rule.getTo() != null)); ConfigHelper.assertConfig("from needed in edge", () -> (rule.getFrom() != null)); } ConfigHelper.assertConfig("id needed", () -> (rule.getId() != null)); return rule; } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/mapping/ValueType.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.mapping; import java.util.HashMap; import java.util.Map; import java.util.function.Function; import com.alibaba.datax.common.element.Column; import lombok.extern.slf4j.Slf4j; /** * @author jerrywang * */ @Slf4j public enum ValueType { /** * property value type */ INT(Integer.class, "int", Column::asLong, Integer::valueOf), INTEGER(Integer.class, "integer", Column::asLong, Integer::valueOf), LONG(Long.class, "long", Column::asLong, Long::valueOf), DOUBLE(Double.class, "double", Column::asDouble, Double::valueOf), FLOAT(Float.class, "float", Column::asDouble, Float::valueOf), BOOLEAN(Boolean.class, "boolean", Column::asBoolean, Boolean::valueOf), STRING(String.class, "string", Column::asString, String::valueOf); private Class type = null; private String shortName = null; private Function columnFunc = null; private Function fromStrFunc = null; private ValueType(final Class type, final String name, final Function columnFunc, final Function fromStrFunc) { this.type = type; this.shortName = name; this.columnFunc = columnFunc; this.fromStrFunc = fromStrFunc; ValueTypeHolder.shortName2type.put(name, this); } public static ValueType fromShortName(final String name) { return ValueTypeHolder.shortName2type.get(name); } public Class type() { return this.type; } public String shortName() { return this.shortName; } public Object applyColumn(final Column column) { try { if (column == null) { return null; } return this.columnFunc.apply(column); } catch (final Exception e) { log.error("applyColumn error {}, column {}", e.toString(), column); throw e; } } public Object fromStrFunc(final String str) { return this.fromStrFunc.apply(str); } private static class ValueTypeHolder { private static Map shortName2type = new HashMap<>(); } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/AbstractGdbGraph.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.model; import static com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig.DEFAULT_BATCH_PROPERTY_NUM; import static com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig.MAX_REQUEST_LENGTH; import java.util.Map; import java.util.UUID; import java.util.concurrent.TimeUnit; import org.apache.tinkerpop.gremlin.driver.Client; import org.apache.tinkerpop.gremlin.driver.Cluster; import org.apache.tinkerpop.gremlin.driver.RequestOptions; import org.apache.tinkerpop.gremlin.driver.ResultSet; import org.apache.tinkerpop.gremlin.driver.ser.Serializers; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.Key; import com.alibaba.datax.plugin.writer.gdbwriter.client.GdbWriterConfig; import lombok.extern.slf4j.Slf4j; /** * @author jerrywang * */ @Slf4j public abstract class AbstractGdbGraph implements GdbGraph { private final static int DEFAULT_TIMEOUT = 30000; protected Client client = null; protected Key.UpdateMode updateMode = Key.UpdateMode.INSERT; protected int propertiesBatchNum = DEFAULT_BATCH_PROPERTY_NUM; protected boolean session = false; protected int maxRequestLength = GdbWriterConfig.MAX_REQUEST_LENGTH; protected AbstractGdbGraph() {} protected AbstractGdbGraph(final Configuration config, final boolean session) { initClient(config, session); } protected void initClient(final Configuration config, final boolean session) { this.updateMode = Key.UpdateMode.valueOf(config.getString(Key.UPDATE_MODE, "INSERT")); log.info("init graphdb client"); final String host = config.getString(Key.HOST); final int port = config.getInt(Key.PORT); final String username = config.getString(Key.USERNAME); final String password = config.getString(Key.PASSWORD); int maxDepthPerConnection = config.getInt(Key.MAX_IN_PROCESS_PER_CONNECTION, GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION); int maxConnectionPoolSize = config.getInt(Key.MAX_CONNECTION_POOL_SIZE, GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE); int maxSimultaneousUsagePerConnection = config.getInt(Key.MAX_SIMULTANEOUS_USAGE_PER_CONNECTION, GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION); this.session = session; if (this.session) { maxConnectionPoolSize = GdbWriterConfig.DEFAULT_MAX_CONNECTION_POOL_SIZE; maxDepthPerConnection = GdbWriterConfig.DEFAULT_MAX_IN_PROCESS_PER_CONNECTION; maxSimultaneousUsagePerConnection = GdbWriterConfig.DEFAULT_MAX_SIMULTANEOUS_USAGE_PER_CONNECTION; } try { final Cluster cluster = Cluster.build(host).port(port).credentials(username, password) .serializer(Serializers.GRAPHBINARY_V1D0).maxContentLength(1048576) .maxInProcessPerConnection(maxDepthPerConnection).minInProcessPerConnection(0) .maxConnectionPoolSize(maxConnectionPoolSize).minConnectionPoolSize(maxConnectionPoolSize) .maxSimultaneousUsagePerConnection(maxSimultaneousUsagePerConnection).resultIterationBatchSize(64) .create(); this.client = session ? cluster.connect(UUID.randomUUID().toString()).init() : cluster.connect().init(); warmClient(maxConnectionPoolSize * maxDepthPerConnection); } catch (final RuntimeException e) { log.error("Failed to connect to GDB {}:{}, due to {}", host, port, e); throw e; } this.propertiesBatchNum = config.getInt(Key.MAX_PROPERTIES_BATCH_NUM, DEFAULT_BATCH_PROPERTY_NUM); this.maxRequestLength = config.getInt(Key.MAX_GDB_REQUEST_LENGTH, MAX_REQUEST_LENGTH); } /** * @param dsl * @param parameters */ protected void runInternal(final String dsl, final Map parameters) throws Exception { final RequestOptions.Builder options = RequestOptions.build().timeout(DEFAULT_TIMEOUT); if (parameters != null && !parameters.isEmpty()) { parameters.forEach(options::addParameter); } final ResultSet results = this.client.submitAsync(dsl, options.create()).get(DEFAULT_TIMEOUT, TimeUnit.MILLISECONDS); results.all().get(DEFAULT_TIMEOUT + 1000, TimeUnit.MILLISECONDS); } void beginTx() { if (!this.session) { return; } final String dsl = "g.tx().open()"; this.client.submit(dsl).all().join(); } void doCommit() { if (!this.session) { return; } try { final String dsl = "g.tx().commit()"; this.client.submit(dsl).all().join(); } catch (final Exception e) { throw new RuntimeException(e); } } void doRollback() { if (!this.session) { return; } final String dsl = "g.tx().rollback()"; this.client.submit(dsl).all().join(); } private void warmClient(final int num) { try { beginTx(); runInternal("g.V('test')", null); doCommit(); log.info("warm graphdb client over"); } catch (final Exception e) { log.error("warmClient error"); doRollback(); throw new RuntimeException(e); } } @Override public void close() { if (this.client != null) { log.info("close graphdb client"); this.client.close(); } } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbEdge.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.model; import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MapperConfig; import lombok.EqualsAndHashCode; import lombok.ToString; /** * @author jerrywang * */ @EqualsAndHashCode(callSuper = true) @ToString(callSuper = true) public class GdbEdge extends GdbElement { private String from = null; private String to = null; public String getFrom() { return this.from; } public void setFrom(final String from) { final int maxIdLength = MapperConfig.getInstance().getMaxIdLength(); if (from.length() > maxIdLength) { throw new IllegalArgumentException("from length over limit(" + maxIdLength + ")"); } this.from = from; } public String getTo() { return this.to; } public void setTo(final String to) { final int maxIdLength = MapperConfig.getInstance().getMaxIdLength(); if (to.length() > maxIdLength) { throw new IllegalArgumentException("to length over limit(" + maxIdLength + ")"); } this.to = to; } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbElement.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.model; import java.util.LinkedList; import java.util.List; import com.alibaba.datax.plugin.writer.gdbwriter.Key.PropertyType; import com.alibaba.datax.plugin.writer.gdbwriter.mapping.MapperConfig; /** * @author jerrywang * */ public class GdbElement { private String id = null; private String label = null; private List properties = new LinkedList<>(); public String getId() { return this.id; } public void setId(final String id) { final int maxIdLength = MapperConfig.getInstance().getMaxIdLength(); if (id.length() > maxIdLength) { throw new IllegalArgumentException("id length over limit(" + maxIdLength + ")"); } this.id = id; } public String getLabel() { return this.label; } public void setLabel(final String label) { final int maxLabelLength = MapperConfig.getInstance().getMaxLabelLength(); if (label.length() > maxLabelLength) { throw new IllegalArgumentException("label length over limit(" + maxLabelLength + ")"); } this.label = label; } public List getProperties() { return this.properties; } public void addProperty(final String propKey, final Object propValue, final PropertyType card) { if (propKey == null || propValue == null) { return; } final int maxPropKeyLength = MapperConfig.getInstance().getMaxPropKeyLength(); if (propKey.length() > maxPropKeyLength) { throw new IllegalArgumentException("property key length over limit(" + maxPropKeyLength + ")"); } if (propValue instanceof String) { final int maxPropValueLength = MapperConfig.getInstance().getMaxPropValueLength(); if (((String)propValue).length() > maxPropKeyLength) { throw new IllegalArgumentException("property value length over limit(" + maxPropValueLength + ")"); } } this.properties.add(new GdbProperty(propKey, propValue, card)); } public void addProperty(final String propKey, final Object propValue) { addProperty(propKey, propValue, PropertyType.single); } @Override public String toString() { final StringBuffer sb = new StringBuffer(this.id + "[" + this.label + "]{"); this.properties.forEach(n -> { sb.append(n.cardinality.name()); sb.append("["); sb.append(n.key); sb.append(" - "); sb.append(String.valueOf(n.value)); sb.append("]"); }); return sb.toString(); } public static class GdbProperty { private String key; private Object value; private PropertyType cardinality; private GdbProperty(final String key, final Object value, final PropertyType card) { this.key = key; this.value = value; this.cardinality = card; } public PropertyType getCardinality() { return this.cardinality; } public String getKey() { return this.key; } public Object getValue() { return this.value; } } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbGraph.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.model; import java.util.List; import com.alibaba.datax.common.element.Record; import groovy.lang.Tuple2; /** * @author jerrywang * */ public interface GdbGraph extends AutoCloseable { List> add(List> records); @Override void close(); } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/GdbVertex.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.model; import lombok.EqualsAndHashCode; import lombok.ToString; /** * @author jerrywang * */ @EqualsAndHashCode(callSuper = true) @ToString(callSuper = true) public class GdbVertex extends GdbElement { } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/model/ScriptGdbGraph.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.model; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Random; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.Key; import com.alibaba.datax.plugin.writer.gdbwriter.util.GdbDuplicateIdException; import groovy.lang.Tuple2; import lombok.extern.slf4j.Slf4j; /** * @author jerrywang * */ @Slf4j public class ScriptGdbGraph extends AbstractGdbGraph { private static final String VAR_PREFIX = "GDB___"; private static final String VAR_ID = VAR_PREFIX + "id"; private static final String VAR_LABEL = VAR_PREFIX + "label"; private static final String VAR_FROM = VAR_PREFIX + "from"; private static final String VAR_TO = VAR_PREFIX + "to"; private static final String VAR_PROP_KEY = VAR_PREFIX + "PK"; private static final String VAR_PROP_VALUE = VAR_PREFIX + "PV"; private static final String ADD_V_START = "g.addV(" + VAR_LABEL + ").property(id, " + VAR_ID + ")"; private static final String ADD_E_START = "g.addE(" + VAR_LABEL + ").property(id, " + VAR_ID + ").from(V(" + VAR_FROM + ")).to(V(" + VAR_TO + "))"; private static final String UPDATE_V_START = "g.V(" + VAR_ID + ")"; private static final String UPDATE_E_START = "g.E(" + VAR_ID + ")"; private Random random; public ScriptGdbGraph() { this.random = new Random(); } public ScriptGdbGraph(final Configuration config, final boolean session) { super(config, session); this.random = new Random(); log.info("Init as ScriptGdbGraph."); } /** * Apply list of {@link GdbElement} to GDB, return the failed records * * @param records * list of element to apply * @return */ @Override public List> add(final List> records) { final List> errors = new ArrayList<>(); try { beginTx(); for (final Tuple2 elementTuple2 : records) { try { addInternal(elementTuple2.getSecond()); } catch (final Exception e) { errors.add(new Tuple2<>(elementTuple2.getFirst(), e)); } } doCommit(); } catch (final Exception ex) { doRollback(); throw new RuntimeException(ex); } return errors; } private void addInternal(final GdbElement element) { try { addInternal(element, false); } catch (final GdbDuplicateIdException e) { if (this.updateMode == Key.UpdateMode.SKIP) { log.debug("Skip duplicate id {}", element.getId()); } else if (this.updateMode == Key.UpdateMode.INSERT) { throw new RuntimeException(e); } else if (this.updateMode == Key.UpdateMode.MERGE) { if (element.getProperties().isEmpty()) { return; } try { addInternal(element, true); } catch (final GdbDuplicateIdException e1) { log.error("duplicate id {} while update...", element.getId()); throw new RuntimeException(e1); } } } } private void addInternal(final GdbElement element, final boolean update) throws GdbDuplicateIdException { boolean firstAdd = !update; final boolean isVertex = (element instanceof GdbVertex); final List params = element.getProperties(); final List subParams = new ArrayList<>(this.propertiesBatchNum); final int idLength = element.getId().length(); int attachLength = element.getLabel().length(); if (element instanceof GdbEdge) { attachLength += ((GdbEdge)element).getFrom().length(); attachLength += ((GdbEdge)element).getTo().length(); } int requestLength = idLength; for (final GdbElement.GdbProperty entry : params) { final String propKey = entry.getKey(); final Object propValue = entry.getValue(); int appendLength = propKey.length(); if (propValue instanceof String) { appendLength += ((String)propValue).length(); } if (checkSplitDsl(firstAdd, requestLength, attachLength, appendLength, subParams.size())) { setGraphDbElement(element, subParams, isVertex, firstAdd); firstAdd = false; subParams.clear(); requestLength = idLength; } requestLength += appendLength; subParams.add(entry); } if (!subParams.isEmpty() || firstAdd) { checkSplitDsl(firstAdd, requestLength, attachLength, 0, 0); setGraphDbElement(element, subParams, isVertex, firstAdd); } } private boolean checkSplitDsl(final boolean firstAdd, final int requestLength, final int attachLength, final int appendLength, final int propNum) { final int length = firstAdd ? requestLength + attachLength : requestLength; if (length > this.maxRequestLength) { throw new IllegalArgumentException("request length over limit(" + this.maxRequestLength + ")"); } return length + appendLength > this.maxRequestLength || propNum >= this.propertiesBatchNum; } private Tuple2> buildDsl(final GdbElement element, final List properties, final boolean isVertex, final boolean firstAdd) { final Map params = new HashMap<>(); final StringBuilder sb = new StringBuilder(); if (isVertex) { sb.append(firstAdd ? ADD_V_START : UPDATE_V_START); } else { sb.append(firstAdd ? ADD_E_START : UPDATE_E_START); } for (int i = 0; i < properties.size(); i++) { final GdbElement.GdbProperty prop = properties.get(i); sb.append(".property("); if (prop.getCardinality() == Key.PropertyType.set) { sb.append("set, "); } sb.append(VAR_PROP_KEY).append(i).append(", ").append(VAR_PROP_VALUE).append(i).append(")"); params.put(VAR_PROP_KEY + i, prop.getKey()); params.put(VAR_PROP_VALUE + i, prop.getValue()); } if (firstAdd) { params.put(VAR_LABEL, element.getLabel()); if (!isVertex) { params.put(VAR_FROM, ((GdbEdge)element).getFrom()); params.put(VAR_TO, ((GdbEdge)element).getTo()); } } params.put(VAR_ID, element.getId()); return new Tuple2<>(sb.toString(), params); } private void setGraphDbElement(final GdbElement element, final List properties, final boolean isVertex, final boolean firstAdd) throws GdbDuplicateIdException { int retry = 10; int idleTime = this.random.nextInt(10) + 10; final Tuple2> elementDsl = buildDsl(element, properties, isVertex, firstAdd); while (retry > 0) { try { runInternal(elementDsl.getFirst(), elementDsl.getSecond()); log.debug("AddElement {}", element.getId()); return; } catch (final Exception e) { final String cause = e.getCause() == null ? "" : e.getCause().toString(); if (cause.contains("rejected from") || cause.contains("Timeout waiting to lock key")) { retry--; try { Thread.sleep(idleTime); } catch (final InterruptedException e1) { // ... } idleTime = Math.min(idleTime * 2, 2000); continue; } else if (firstAdd && cause.contains("GraphDB id exists")) { throw new GdbDuplicateIdException(e); } log.error("Add Failed id {}, dsl {}, params {}, e {}", element.getId(), elementDsl.getFirst(), elementDsl.getSecond(), e); throw new RuntimeException(e); } } log.error("Add Failed id {}, dsl {}, params {}", element.getId(), elementDsl.getFirst(), elementDsl.getSecond()); throw new RuntimeException("failed to queue new element to server"); } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/ConfigHelper.java ================================================ /** * */ package com.alibaba.datax.plugin.writer.gdbwriter.util; import java.io.IOException; import java.io.InputStream; import java.util.function.Supplier; import org.apache.commons.lang3.StringUtils; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.gdbwriter.GdbWriterErrorCode; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONObject; /** * @author jerrywang * */ public interface ConfigHelper { static void assertConfig(final String key, final Supplier f) { if (!f.get()) { throw DataXException.asDataXException(GdbWriterErrorCode.BAD_CONFIG_VALUE, key); } } static void assertHasContent(final Configuration config, final String key) { assertConfig(key, () -> StringUtils.isNotBlank(config.getString(key))); } /** * NOTE: {@code Configuration::get(String, Class)} doesn't work. * * @param conf * Configuration * @param key * key path to configuration * @param cls * Class of result type * @return the target configuration object of type T */ static T getConfig(final Configuration conf, final String key, final Class cls) { final JSONObject j = (JSONObject)conf.get(key); return JSON.toJavaObject(j, cls); } /** * Create a configuration from the specified file on the classpath. * * @param name * file name * @return Configuration instance. */ static Configuration fromClasspath(final String name) { try (final InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) { return Configuration.from(is); } catch (final IOException e) { throw new IllegalArgumentException("File not found: " + name); } } } ================================================ FILE: gdbwriter/src/main/java/com/alibaba/datax/plugin/writer/gdbwriter/util/GdbDuplicateIdException.java ================================================ /* * (C) 2019-present Alibaba Group Holding Limited. * * This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public * License version 2 as published by the Free Software Foundation. */ package com.alibaba.datax.plugin.writer.gdbwriter.util; /** * @author : Liu Jianping * @date : 2019/8/3 */ public class GdbDuplicateIdException extends Exception { public GdbDuplicateIdException(Exception e) { super(e); } public GdbDuplicateIdException() { super(); } } ================================================ FILE: gdbwriter/src/main/resources/plugin.json ================================================ { "name": "gdbwriter", "class": "com.alibaba.datax.plugin.writer.gdbwriter.GdbWriter", "description": "useScene: prod. mechanism: connect GDB with gremlin-client, execute DSL as 'g.addV() or g.addE()' to write record", "developer": "alibaba" } ================================================ FILE: gdbwriter/src/main/resources/plugin_job_template.json ================================================ { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "odpsreader" }, "writer": { "name": "gdbwriter", "parameter": { "host": "localhost", "port": 8182, "username": "username", "password": "password", "label": "test-label", "srcLabel": "test-srcLabel-", "dstLabel": "test-dstLabel-", "labelType": "EDGE", "writeMode": "INSERT", "idTransRule": "labelPrefix", "srcIdTransRule": "labelPrefix", "dstIdTransRule": "labelPrefix", "column": [ { "name": "id", "value": "-test-${0}", "type": "string", "columnType": "primaryKey" }, { "name": "id", "value": "from-id-${2}", "type": "string", "columnType": "srcPrimaryKey" }, { "name": "id", "value": "to-id-${3}", "type": "string", "columnType": "dstPrimaryKey" }, { "name": "strValue-${2}-key", "value": "strValue-${2}-value", "type": "string", "columnType": "edgeProperty" }, { "name": "intProp", "value": "${3}", "type": "int", "columnType": "edgeProperty" }, { "name": "booleanProp", "value": "${5}", "type": "boolean", "columnType": "edgeProperty" } ] } } } ] } } ================================================ FILE: hbase094xreader/doc/.gitkeep ================================================ ================================================ FILE: hbase094xreader/doc/hbase094xreader.md ================================================ # Hbase094XReader & Hbase11XReader 插件文档 ___ ## 1 快速介绍 HbaseReader 插件实现了从 Hbase中读取数据。在底层实现上,HbaseReader 通过 HBase 的 Java 客户端连接远程 HBase 服务,并通过 Scan 方式读取你指定 rowkey 范围内的数据,并将读取的数据使用 DataX 自定义的数据类型拼装为抽象的数据集,并传递给下游 Writer 处理。 ### 1.1支持的功能 1、目前HbaseReader支持的Hbase版本有:Hbase0.94.x和Hbase1.1.x。 * 若您的hbase版本为Hbase0.94.x,reader端的插件请选择:hbase094xreader,即: ``` "reader": { "name": "hbase094xreader" } ``` * 若您的hbase版本为Hbase1.1.x,reader端的插件请选择:hbase11xreader,即: ``` "reader": { "name": "hbase11xreader" } ``` 2、目前HbaseReader支持两模式读取:normal 模式、multiVersionFixedColumn模式; * normal 模式:把HBase中的表,当成普通二维表(横表)进行读取,读取最新版本数据。如: ``` hbase(main):017:0> scan 'users' ROW COLUMN+CELL lisi column=address:city, timestamp=1457101972764, value=beijing lisi column=address:contry, timestamp=1457102773908, value=china lisi column=address:province, timestamp=1457101972736, value=beijing lisi column=info:age, timestamp=1457101972548, value=27 lisi column=info:birthday, timestamp=1457101972604, value=1987-06-17 lisi column=info:company, timestamp=1457101972653, value=baidu xiaoming column=address:city, timestamp=1457082196082, value=hangzhou xiaoming column=address:contry, timestamp=1457082195729, value=china xiaoming column=address:province, timestamp=1457082195773, value=zhejiang xiaoming column=info:age, timestamp=1457082218735, value=29 xiaoming column=info:birthday, timestamp=1457082186830, value=1987-06-17 xiaoming column=info:company, timestamp=1457082189826, value=alibaba 2 row(s) in 0.0580 seconds ``` 读取后数据 | rowKey | addres:city | address:contry | address:province | info:age| info:birthday | info:company | | --------| ---------------- |----- |----- |--------| ---------------- |----- | | lisi | beijing| china| beijing |27 | 1987-06-17 | baidu| | xiaoming | hangzhou| china | zhejiang|29 | 1987-06-17 | alibaba| * multiVersionFixedColumn模式:把HBase中的表,当成竖表进行读取。读出的每条记录一定是四列形式,依次为:rowKey,family:qualifier,timestamp,value。读取时需要明确指定要读取的列,把每一个 cell 中的值,作为一条记录(record),若有多个版本就有多条记录(record)。如: ``` hbase(main):018:0> scan 'users',{VERSIONS=>5} ROW COLUMN+CELL lisi column=address:city, timestamp=1457101972764, value=beijing lisi column=address:contry, timestamp=1457102773908, value=china lisi column=address:province, timestamp=1457101972736, value=beijing lisi column=info:age, timestamp=1457101972548, value=27 lisi column=info:birthday, timestamp=1457101972604, value=1987-06-17 lisi column=info:company, timestamp=1457101972653, value=baidu xiaoming column=address:city, timestamp=1457082196082, value=hangzhou xiaoming column=address:contry, timestamp=1457082195729, value=china xiaoming column=address:province, timestamp=1457082195773, value=zhejiang xiaoming column=info:age, timestamp=1457082218735, value=29 xiaoming column=info:age, timestamp=1457082178630, value=24 xiaoming column=info:birthday, timestamp=1457082186830, value=1987-06-17 xiaoming column=info:company, timestamp=1457082189826, value=alibaba 2 row(s) in 0.0260 seconds ``` 读取后数据(4列) | rowKey | column:qualifier| timestamp | value | | --------| ---------------- |----- |----- | | lisi | address:city| 1457101972764 | beijing | | lisi | address:contry| 1457102773908 | china | | lisi | address:province| 1457101972736 | beijing | | lisi | info:age| 1457101972548 | 27 | | lisi | info:birthday| 1457101972604 | 1987-06-17 | | lisi | info:company| 1457101972653 | beijing | | xiaoming | address:city| 1457082196082 | hangzhou | | xiaoming | address:contry| 1457082195729 | china | | xiaoming | address:province| 1457082195773 | zhejiang | | xiaoming | info:age| 1457082218735 | 29 | | xiaoming | info:age| 1457082178630 | 24 | | xiaoming | info:birthday| 1457082186830 | 1987-06-17 | | xiaoming | info:company| 1457082189826 | alibaba | 3、HbaseReader中有一个必填配置项是:hbaseConfig,需要你联系 HBase PE,将hbase-site.xml 中与连接 HBase 相关的配置项提取出来,以 json 格式填入,同时可以补充更多HBase client的配置,如:设置scan的cache(hbase.client.scanner.caching)、batch来优化与服务器的交互。 如:hbase-site.xml的配置内容如下 ``` hbase.rootdir hdfs://ip:9000/hbase hbase.cluster.distributed true hbase.zookeeper.quorum *** ``` 转换后的json为: ``` "hbaseConfig": { "hbase.rootdir": "hdfs: //ip:9000/hbase", "hbase.cluster.distributed": "true", "hbase.zookeeper.quorum": "***" } ``` ### 1.2 限制 1、目前不支持动态列的读取。考虑网络传输流量(支持动态列,需要先将hbase所有列的数据读取出来,再按规则进行过滤),现支持的两种读取模式中需要用户明确指定要读取的列。 2、关于同步作业的切分:目前的切分方式是根据用户hbase表数据的region分布进行切分。即:在用户填写的[startrowkey,endrowkey]范围内,一个region会切分成一个task,单个region不进行切分。 3、multiVersionFixedColumn模式下不支持增加常量列 ## 2 实现原理 简而言之,HbaseReader 通过 HBase 的 Java 客户端,通过 HTable, Scan, ResultScanner 等 API,读取你指定 rowkey 范围内的数据,并将读取的数据使用 DataX 自定义的数据类型拼装为抽象的数据集,并传递给下游 Writer 处理。hbase11xreader与hbase094xreader的主要不同在于API的调用不同,Hbase1.1.x废弃了很多Hbase0.94.x的api。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从 HBase 抽取数据到本地的作业:(normal 模式) ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "hbase11xreader", "parameter": { "hbaseConfig": { "hbase.rootdir": "hdfs: //xxx: 9000/hbase", "hbase.cluster.distributed": "true", "hbase.zookeeper.quorum": "xxx" }, "table": "users", "encoding": "utf-8", "mode": "normal", "column": [ { "name": "rowkey", "type": "string" }, { "name": "info: age", "type": "string" }, { "name": "info: birthday", "type": "date", "format":"yyyy-MM-dd" }, { "name": "info: company", "type": "string" }, { "name": "address: contry", "type": "string" }, { "name": "address: province", "type": "string" }, { "name": "address: city", "type": "string" } ], "range": { "startRowkey": "", "endRowkey": "", "isBinaryRowkey": true } } }, "writer": { "name": "txtfilewriter", "parameter": { "path": "/Users/shf/workplace/datax_test/hbase11xreader/result", "fileName": "qiran", "writeMode": "truncate" } } } ] } } ``` * 配置一个从 HBase 抽取数据到本地的作业:( multiVersionFixedColumn 模式) ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "hbase11xreader", "parameter": { "hbaseConfig": { "hbase.rootdir": "hdfs: //xxx: 9000/hbase", "hbase.cluster.distributed": "true", "hbase.zookeeper.quorum": "xxx" }, "table": "users", "encoding": "utf-8", "mode": "multiVersionFixedColumn", "maxVersion": "-1", "column": [ { "name": "rowkey", "type": "string" }, { "name": "info: age", "type": "string" }, { "name": "info: birthday", "type": "date", "format":"yyyy-MM-dd" }, { "name": "info: company", "type": "string" }, { "name": "address: contry", "type": "string" }, { "name": "address: province", "type": "string" }, { "name": "address: city", "type": "string" } ], "range": { "startRowkey": "", "endRowkey": "" } } }, "writer": { "name": "txtfilewriter", "parameter": { "path": "/Users/shf/workplace/datax_test/hbase11xreader/result", "fileName": "qiran", "writeMode": "truncate" } } } ] } } ``` ### 3.2 参数说明 * **hbaseConfig** * 描述:每个HBase集群提供给DataX客户端连接的配置信息存放在hbase-site.xml,请联系你的HBase PE提供配置信息,并转换为JSON格式。同时可以补充更多HBase client的配置,如:设置scan的cache、batch来优化与服务器的交互。 * 必选:是
* 默认值:无
* **mode** * 描述:读取hbase的模式,支持normal 模式、multiVersionFixedColumn模式,即:normal/multiVersionFixedColumn
* 必选:是
* 默认值:无
* **table** * 描述:要读取的 hbase 表名(大小写敏感)
* 必选:是
* 默认值:无
* **encoding** * 描述:编码方式,UTF-8 或是 GBK,用于对二进制存储的 HBase byte[] 转为 String 时的编码
* 必选:否
* 默认值:UTF-8
* **column** * 描述:要读取的hbase字段,normal 模式与multiVersionFixedColumn 模式下必填项。 (1)、normal 模式下:name指定读取的hbase列,除了rowkey外,必须为 列族:列名 的格式,type指定源数据的类型,format指定日期类型的格式,value指定当前类型为常量,不从hbase读取数据,而是根据value值自动生成对应的列。配置格式如下: ``` "column": [ { "name": "rowkey", "type": "string" }, { "value": "test", "type": "string" } ] ``` normal 模式下,对于用户指定Column信息,type必须填写,name/value必须选择其一。 (2)、multiVersionFixedColumn 模式下:name指定读取的hbase列,除了rowkey外,必须为 列族:列名 的格式,type指定源数据的类型,format指定日期类型的格式 。multiVersionFixedColumn模式下不支持常量列。配置格式如下: ``` "column": [ { "name": "rowkey", "type": "string" }, { "name": "info: age", "type": "string" } ] ``` * 必选:是
* 默认值:无
* **maxVersion** * 描述:指定在多版本模式下的hbasereader读取的版本数,取值只能为-1或者大于1的数字,-1表示读取所有版本
* 必选:multiVersionFixedColumn 模式下必填项
* 默认值:无
* **range** * 描述:指定hbasereader读取的rowkey范围。
startRowkey:指定开始rowkey;
endRowkey指定结束rowkey;
isBinaryRowkey:指定配置的startRowkey和endRowkey转换为byte[]时的方式,默认值为false,若为true,则调用Bytes.toBytesBinary(rowkey)方法进行转换;若为false:则调用Bytes.toBytes(rowkey)
配置格式如下: ``` "range": { "startRowkey": "aaa", "endRowkey": "ccc", "isBinaryRowkey":false } ```
* 必选:否
* 默认值:无
* **scanCacheSize** * 描述:Hbase client每次rpc从服务器端读取的行数
* 必选:否
* 默认值:256
* **scanBatchSize** * 描述:Hbase client每次rpc从服务器端读取的列数
* 必选:否
* 默认值:100
### 3.3 类型转换 下面列出支持的读取HBase数据类型,HbaseReader 针对 HBase 类型转换列表: | DataX 内部类型| HBase 数据类型 | | -------- | ----- | | Long |int, short ,long| | Double |float, double| | String |string,binarystring | | Date |date | | Boolean |boolean | 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 ## 4 性能报告 略 ## 5 约束限制 略 ## 6 FAQ *** ================================================ FILE: hbase094xreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hbase094xreader hbase094xreader 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.hbase hbase 0.94.27 jdk.tools jdk.tools org.apache.hadoop hadoop-core 0.20.205.0 org.apache.zookeeper zookeeper 3.3.2 com.alibaba.datax datax-core ${datax-project-version} test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hbase094xreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/hbase094xreader target/ hbase094xreader-0.0.1-SNAPSHOT.jar plugin/reader/hbase094xreader false plugin/reader/hbase094xreader/libs runtime ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/ColumnType.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang.StringUtils; import java.util.Arrays; /** * 只对 normal 模式读取时有用,多版本读取时,不存在列类型的 */ public enum ColumnType { BOOLEAN("boolean"), SHORT("short"), INT("int"), LONG("long"), FLOAT("float"), DOUBLE("double"), DATE("date"), STRING("string"), BINARY_STRING("binarystring") ; private String typeName; ColumnType(String typeName) { this.typeName = typeName; } public static ColumnType getByTypeName(String typeName) { if(StringUtils.isBlank(typeName)){ throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, String.format("Hbasereader 不支持该类型:%s, 目前支持的类型是:%s", typeName, Arrays.asList(values()))); } for (ColumnType columnType : values()) { if (StringUtils.equalsIgnoreCase(columnType.typeName, typeName.trim())) { return columnType; } } throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, String.format("Hbasereader 不支持该类型:%s, 目前支持的类型是:%s", typeName, Arrays.asList(values()))); } @Override public String toString() { return this.typeName; } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; public final class Constant { public static final String RANGE = "range"; public static final String ROWKEY_FLAG = "rowkey"; public static final String DEFAULT_ENCODING = "UTF-8"; public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final int DEFAULT_SCAN_CACHE_SIZE = 256; public static final int DEFAULT_SCAN_BATCH_SIZE = 100; } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Hbase094xHelper.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.Charset; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * 工具类 * Created by shf on 16/3/7. */ public class Hbase094xHelper { private static final Logger LOG = LoggerFactory.getLogger(Hbase094xHelper.class); public static org.apache.hadoop.conf.Configuration getHbaseConf(String hbaseConf) { if (StringUtils.isBlank(hbaseConf)) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.REQUIRED_VALUE, "读 Hbase 时需要配置 hbaseConfig,其内容为 Hbase 连接信息,请联系 Hbase PE 获取该信息."); } org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration(); try { Map map = JSON.parseObject(hbaseConf, new TypeReference>() {}); // 用户配置的 key-value 对 来表示 hbaseConf Validate.isTrue(map != null && map.size() !=0, "hbaseConfig 不能为空 Map 结构!"); for (Map.Entry entry : map.entrySet()) { conf.set(entry.getKey(), entry.getValue()); } } catch (Exception e) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.GET_HBASE_CONFIGURATION_ERROR, e); } return conf; } /** * 每次都获取一个新的HTable 注意:HTable 本身是线程不安全的 */ public static HTable getTable(com.alibaba.datax.common.util.Configuration configuration) { String hbaseConnConf = configuration.getString(Key.HBASE_CONFIG); String tableName = configuration.getString(Key.TABLE); HBaseAdmin admin = null; try { org.apache.hadoop.conf.Configuration hbaseConf = Hbase094xHelper.getHbaseConf(hbaseConnConf); HTable htable = new HTable(hbaseConf, tableName); admin = new HBaseAdmin(hbaseConf); checkHbaseTable(admin, htable); return htable; } catch (Exception e) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.GET_HBASE_TABLE_ERROR, e); } finally { Hbase094xHelper.closeAdmin(admin); } } private static void checkHbaseTable(HBaseAdmin admin, HTable htable) throws DataXException, IOException { if (!admin.isMasterRunning()) { throw new IllegalStateException("HBase master 没有运行, 请检查您的配置 或者 联系 Hbase 管理员."); } if (!admin.tableExists(htable.getTableName())) { throw new IllegalStateException("HBase源头表" + Bytes.toString(htable.getTableName()) + "不存在, 请检查您的配置 或者 联系 Hbase 管理员."); } if (!admin.isTableAvailable(htable.getTableName()) || !admin.isTableEnabled(htable.getTableName())) { throw new IllegalStateException("HBase源头表" + Bytes.toString(htable.getTableName()) + " 不可用, 请检查您的配置 或者 联系 Hbase 管理员."); } if(admin.isTableDisabled(htable.getTableName())){ throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, "HBase源头表" + Bytes.toString(htable.getTableName()) + "is disabled, 请检查您的配置 或者 联系 Hbase 管理员."); } } public static void closeAdmin(HBaseAdmin admin){ try { if(null != admin) admin.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.CLOSE_HBASE_ADMIN_ERROR, e); } } public static void closeTable(HTable table){ try { if(null != table) table.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.CLOSE_HBASE_TABLE_ERROR, e); } } public static void closeResultScanner(ResultScanner resultScanner){ if(null != resultScanner) { resultScanner.close(); } } public static byte[] convertUserStartRowkey(Configuration configuration) { String startRowkey = configuration.getString(Key.START_ROWKEY); if (StringUtils.isBlank(startRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } else { boolean isBinaryRowkey = configuration.getBool(Key.IS_BINARY_ROWKEY); return Hbase094xHelper.stringToBytes(startRowkey, isBinaryRowkey); } } public static byte[] convertUserEndRowkey(Configuration configuration) { String endRowkey = configuration.getString(Key.END_ROWKEY); if (StringUtils.isBlank(endRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } else { boolean isBinaryRowkey = configuration.getBool(Key.IS_BINARY_ROWKEY); return Hbase094xHelper.stringToBytes(endRowkey, isBinaryRowkey); } } /** * 注意:convertUserStartRowkey 和 convertInnerStartRowkey,前者会受到 isBinaryRowkey 的影响,只用于第一次对用户配置的 String 类型的 rowkey 转为二进制时使用。而后者约定:切分时得到的二进制的 rowkey 回填到配置中时采用 */ public static byte[] convertInnerStartRowkey(Configuration configuration) { String startRowkey = configuration.getString(Key.START_ROWKEY); if (StringUtils.isBlank(startRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } return Bytes.toBytesBinary(startRowkey); } public static byte[] convertInnerEndRowkey(Configuration configuration) { String endRowkey = configuration.getString(Key.END_ROWKEY); if (StringUtils.isBlank(endRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } return Bytes.toBytesBinary(endRowkey); } private static byte[] stringToBytes(String rowkey, boolean isBinaryRowkey) { if (isBinaryRowkey) { return Bytes.toBytesBinary(rowkey); } else { return Bytes.toBytes(rowkey); } } public static boolean isRowkeyColumn(String columnName) { return Constant.ROWKEY_FLAG.equalsIgnoreCase(columnName); } /** * 用于解析 Normal 模式下的列配置 */ public static List parseColumnOfNormalMode(List column) { List hbaseColumnCells = new ArrayList(); HbaseColumnCell oneColumnCell; for (Map aColumn : column) { ColumnType type = ColumnType.getByTypeName(aColumn.get(Key.TYPE)); String columnName = aColumn.get(Key.NAME); String columnValue = aColumn.get(Key.VALUE); String dateformat = aColumn.get(Key.FORMAT); if (type == ColumnType.DATE) { if(dateformat == null){ dateformat = Constant.DEFAULT_DATA_FORMAT; } Validate.isTrue(StringUtils.isNotBlank(columnName) || StringUtils.isNotBlank(columnValue), "Hbasereader 在 normal 方式读取时则要么是 type + name + format 的组合,要么是type + value + format 的组合. 而您的配置非这两种组合,请检查并修改."); oneColumnCell = new HbaseColumnCell .Builder(type) .columnName(columnName) .columnValue(columnValue) .dateformat(dateformat) .build(); } else { Validate.isTrue(StringUtils.isNotBlank(columnName) || StringUtils.isNotBlank(columnValue), "Hbasereader 在 normal 方式读取时,其列配置中,如果类型不是时间,则要么是 type + name 的组合,要么是type + value 的组合. 而您的配置非这两种组合,请检查并修改."); oneColumnCell = new HbaseColumnCell.Builder(type) .columnName(columnName) .columnValue(columnValue) .build(); } hbaseColumnCells.add(oneColumnCell); } return hbaseColumnCells; } //将多竖表column变成>形式 public static HashMap> parseColumnOfMultiversionMode(List column){ HashMap> familyQualifierMap = new HashMap>(); for (Map aColumn : column) { String type = aColumn.get(Key.TYPE); String columnName = aColumn.get(Key.NAME); String dateformat = aColumn.get(Key.FORMAT); ColumnType.getByTypeName(type); Validate.isTrue(StringUtils.isNotBlank(columnName), "Hbasereader 中,column 需要配置列名称name,格式为 列族:列名,您的配置为空,请检查并修改."); String familyQualifier; if( !Hbase094xHelper.isRowkeyColumn(columnName)){ String[] cfAndQualifier = columnName.split(":"); if ( cfAndQualifier.length != 2) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 中,column 的列配置格式应该是:列族:列名. 您配置的列错误:" + columnName); } familyQualifier = StringUtils.join(cfAndQualifier[0].trim(),":",cfAndQualifier[1].trim()); }else{ familyQualifier = columnName.trim(); } HashMap typeAndFormat = new HashMap(); typeAndFormat.put(Key.TYPE,type); typeAndFormat.put(Key.FORMAT,dateformat); familyQualifierMap.put(familyQualifier,typeAndFormat); } return familyQualifierMap; } public static List split(Configuration configuration) { byte[] startRowkeyByte = Hbase094xHelper.convertUserStartRowkey(configuration); byte[] endRowkeyByte = Hbase094xHelper.convertUserEndRowkey(configuration); /* 如果用户配置了 startRowkey 和 endRowkey,需要确保:startRowkey <= endRowkey */ if (startRowkeyByte.length != 0 && endRowkeyByte.length != 0 && Bytes.compareTo(startRowkeyByte, endRowkeyByte) > 0) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 中 startRowkey 不得大于 endRowkey."); } HTable htable = Hbase094xHelper.getTable(configuration); List resultConfigurations; try { Pair regionRanges = htable.getStartEndKeys(); if (null == regionRanges) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.SPLIT_ERROR, "获取源头 Hbase 表的 rowkey 范围失败."); } resultConfigurations = Hbase094xHelper.doSplit(configuration, startRowkeyByte, endRowkeyByte, regionRanges); LOG.info("HBaseReader split job into {} tasks.", resultConfigurations.size()); return resultConfigurations; } catch (Exception e) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.SPLIT_ERROR, "切分源头 Hbase 表失败.", e); } finally { Hbase094xHelper.closeTable(htable); } } private static List doSplit(Configuration config, byte[] startRowkeyByte, byte[] endRowkeyByte, Pair regionRanges) { List configurations = new ArrayList(); for (int i = 0; i < regionRanges.getFirst().length; i++) { byte[] regionStartKey = regionRanges.getFirst()[i]; byte[] regionEndKey = regionRanges.getSecond()[i]; // 当前的region为最后一个region // 如果最后一个region的start Key大于用户指定的userEndKey,则最后一个region,应该不包含在内 // 注意如果用户指定userEndKey为"",则此判断应该不成立。userEndKey为""表示取得最大的region if (Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) == 0 && (endRowkeyByte.length != 0 && (Bytes.compareTo( regionStartKey, endRowkeyByte) > 0))) { continue; } // 如果当前的region不是最后一个region, // 用户配置的userStartKey大于等于region的endkey,则这个region不应该含在内 if ((Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) != 0) && (Bytes.compareTo(startRowkeyByte, regionEndKey) >= 0)) { continue; } // 如果用户配置的userEndKey小于等于 region的startkey,则这个region不应该含在内 // 注意如果用户指定的userEndKey为"",则次判断应该不成立。userEndKey为""表示取得最大的region if (endRowkeyByte.length != 0 && (Bytes.compareTo(endRowkeyByte, regionStartKey) <= 0)) { continue; } Configuration p = config.clone(); String thisStartKey = getStartKey(startRowkeyByte, regionStartKey); String thisEndKey = getEndKey(endRowkeyByte, regionEndKey); p.set(Key.START_ROWKEY, thisStartKey); p.set(Key.END_ROWKEY, thisEndKey); LOG.debug("startRowkey:[{}], endRowkey:[{}] .", thisStartKey, thisEndKey); configurations.add(p); } return configurations; } private static String getEndKey(byte[] endRowkeyByte, byte[] regionEndKey) { if (endRowkeyByte == null) {// 由于之前处理过,所以传入的userStartKey不可能为null throw new IllegalArgumentException("userEndKey should not be null!"); } byte[] tempEndRowkeyByte; if (endRowkeyByte.length == 0) { tempEndRowkeyByte = regionEndKey; } else if (Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) == 0) { // 为最后一个region tempEndRowkeyByte = endRowkeyByte; } else { if (Bytes.compareTo(endRowkeyByte, regionEndKey) > 0) { tempEndRowkeyByte = regionEndKey; } else { tempEndRowkeyByte = endRowkeyByte; } } return Bytes.toStringBinary(tempEndRowkeyByte); } private static String getStartKey(byte[] startRowkeyByte, byte[] regionStarKey) { if (startRowkeyByte == null) {// 由于之前处理过,所以传入的userStartKey不可能为null throw new IllegalArgumentException( "userStartKey should not be null!"); } byte[] tempStartRowkeyByte; if (Bytes.compareTo(startRowkeyByte, regionStarKey) < 0) { tempStartRowkeyByte = regionStarKey; } else { tempStartRowkeyByte = startRowkeyByte; } return Bytes.toStringBinary(tempStartRowkeyByte); } public static void validateParameter(Configuration originalConfig) { originalConfig.getNecessaryValue(Key.HBASE_CONFIG, Hbase094xReaderErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.TABLE, Hbase094xReaderErrorCode.REQUIRED_VALUE); Hbase094xHelper.validateMode(originalConfig); //非必选参数处理 String encoding = originalConfig.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); if (!Charset.isSupported(encoding)) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, String.format("Hbasereader 不支持您所配置的编码:[%s]", encoding)); } originalConfig.set(Key.ENCODING, encoding); // 处理 range 的配置 String startRowkey = originalConfig.getString(Constant.RANGE + "." + Key.START_ROWKEY); //此处判断需要谨慎:如果有 key range.startRowkey 但是没有值,得到的 startRowkey 是空字符串,而不是 null if (startRowkey != null && startRowkey.length() != 0) { originalConfig.set(Key.START_ROWKEY, startRowkey); } String endRowkey = originalConfig.getString(Constant.RANGE + "." + Key.END_ROWKEY); //此处判断需要谨慎:如果有 key range.endRowkey 但是没有值,得到的 endRowkey 是空字符串,而不是 null if (endRowkey != null && endRowkey.length() != 0) { originalConfig.set(Key.END_ROWKEY, endRowkey); } Boolean isBinaryRowkey = originalConfig.getBool(Constant.RANGE + "." + Key.IS_BINARY_ROWKEY,false); originalConfig.set(Key.IS_BINARY_ROWKEY, isBinaryRowkey); //scan cache int scanCacheSize = originalConfig.getInt(Key.SCAN_CACHE_SIZE,Constant.DEFAULT_SCAN_CACHE_SIZE); originalConfig.set(Key.SCAN_CACHE_SIZE,scanCacheSize); int scanBatchSize = originalConfig.getInt(Key.SCAN_BATCH_SIZE,Constant.DEFAULT_SCAN_BATCH_SIZE); originalConfig.set(Key.SCAN_BATCH_SIZE,scanBatchSize); } private static String validateMode(Configuration originalConfig) { String mode = originalConfig.getNecessaryValue(Key.MODE,Hbase094xReaderErrorCode.REQUIRED_VALUE); List column = originalConfig.getList(Key.COLUMN, Map.class); if (column == null || column.isEmpty()) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.REQUIRED_VALUE, "您配置的column为空,Hbase必须配置 column,其形式为:column:[{\"name\": \"cf0:column0\",\"type\": \"string\"},{\"name\": \"cf1:column1\",\"type\": \"long\"}]"); } ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: { // normal 模式不需要配置 maxVersion,需要配置 column,并且 column 格式为 Map 风格 String maxVersion = originalConfig.getString(Key.MAX_VERSION); Validate.isTrue(maxVersion == null, "您配置的是 normal 模式读取 hbase 中的数据,所以不能配置无关项:maxVersion"); // 通过 parse 进行 column 格式的进一步检查 Hbase094xHelper.parseColumnOfNormalMode(column); break; } case MultiVersionFixedColumn:{ // multiVersionFixedColumn 模式需要配置 maxVersion checkMaxVersion(originalConfig, mode); Hbase094xHelper.parseColumnOfMultiversionMode(column); break; } default: throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, String.format("Hbase11xReader不支持该 mode 类型:%s", mode)); } return mode; } // 检查 maxVersion 是否存在,并且值是否合法 private static void checkMaxVersion(Configuration configuration, String mode) { Integer maxVersion = configuration.getInt(Key.MAX_VERSION); Validate.notNull(maxVersion, String.format("您配置的是 %s 模式读取 hbase 中的数据,所以必须配置:maxVersion", mode)); boolean isMaxVersionValid = maxVersion == -1 || maxVersion > 1; Validate.isTrue(isMaxVersionValid, String.format("您配置的是 %s 模式读取 hbase 中的数据,但是配置的 maxVersion 值错误. maxVersion规定:-1为读取全部版本,不能配置为0或者1(因为0或者1,我们认为用户是想用 normal 模式读取数据,而非 %s 模式读取,二者差别大),大于1则表示读取最新的对应个数的版本", mode, mode)); } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Hbase094xReader.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; /** * Hbase094xReader * Created by shf on 16/3/7. */ public class Hbase094xReader extends Reader { public static class Job extends Reader.Job { private Configuration originConfig = null; @Override public void init() { this.originConfig = this.getPluginJobConf(); Hbase094xHelper.validateParameter(this.originConfig); } @Override public List split(int adviceNumber) { return Hbase094xHelper.split(this.originConfig); } @Override public void destroy() { } } public static class Task extends Reader.Task { private Configuration taskConfig; private static Logger LOG = LoggerFactory.getLogger(Task.class); private HbaseAbstractTask hbaseTaskProxy; @Override public void init() { this.taskConfig = super.getPluginJobConf(); String mode = this.taskConfig.getString(Key.MODE); ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: this.hbaseTaskProxy = new NormalTask(this.taskConfig); break; case MultiVersionFixedColumn: this.hbaseTaskProxy = new MultiVersionFixedColumnTask(this.taskConfig); break; default: throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 不支持此类模式:" + modeType); } } @Override public void prepare() { try { this.hbaseTaskProxy.prepare(); } catch (Exception e) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.PREPAR_READ_ERROR, e); } } @Override public void startRead(RecordSender recordSender) { Record record = recordSender.createRecord(); boolean fetchOK; while (true) { try { fetchOK = this.hbaseTaskProxy.fetchLine(record); } catch (Exception e) { LOG.info("Exception", e); super.getTaskPluginCollector().collectDirtyRecord(record, e); record = recordSender.createRecord(); continue; } if (fetchOK) { recordSender.sendToWriter(record); record = recordSender.createRecord(); } else { break; } } recordSender.flush(); } @Override public void post() { super.post(); } @Override public void destroy() { if (this.hbaseTaskProxy != null) { this.hbaseTaskProxy.close(); } } } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Hbase094xReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.spi.ErrorCode; public enum Hbase094xReaderErrorCode implements ErrorCode { REQUIRED_VALUE("Hbase094xReader-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("Hbase094xReader-01", "您配置的值不合法."), PREPAR_READ_ERROR("Hbase094xReader-02", "准备读取 Hbase 时出错."), SPLIT_ERROR("Hbase094xReader-03", "切分 Hbase 表时出错."), GET_HBASE_CONFIGURATION_ERROR("HbaseReader-04", "解析hbase configuration时出错."), INIT_TABLE_ERROR("Hbase094xReader-04", "初始化 Hbase 抽取表时出错."), GET_HBASE_TABLE_ERROR("HbaseReader-05", "初始化 Hbase 抽取表时出错."), CLOSE_HBASE_TABLE_ERROR("HbaseReader-06", "关闭Hbase 抽取表时出错."), CLOSE_HBASE_ADMIN_ERROR("HbaseReader-07", "关闭 Hbase admin时出错.") ; private final String code; private final String description; private Hbase094xReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/HbaseAbstractTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang.ArrayUtils; import org.apache.commons.lang3.time.DateUtils; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; public abstract class HbaseAbstractTask { private final static Logger LOG = LoggerFactory.getLogger(HbaseAbstractTask.class); private byte[] startKey = null; private byte[] endKey = null; protected HTable htable; protected String encoding; protected int scanCacheSize; protected int scanBatchSize; protected Result lastResult = null; protected Scan scan; protected ResultScanner resultScanner; public HbaseAbstractTask(com.alibaba.datax.common.util.Configuration configuration) { this.htable = Hbase094xHelper.getTable(configuration); this.encoding = configuration.getString(Key.ENCODING,Constant.DEFAULT_ENCODING); this.startKey = Hbase094xHelper.convertInnerStartRowkey(configuration); this.endKey = Hbase094xHelper.convertInnerEndRowkey(configuration); this.scanCacheSize = configuration.getInt(Key.SCAN_CACHE_SIZE,Constant.DEFAULT_SCAN_CACHE_SIZE); this.scanBatchSize = configuration.getInt(Key.SCAN_BATCH_SIZE,Constant.DEFAULT_SCAN_BATCH_SIZE); } public abstract boolean fetchLine(Record record) throws Exception; //不同模式设置不同,如多版本模式需要设置版本 public abstract void initScan(Scan scan); public void prepare() throws Exception { this.scan = new Scan(); this.scan.setSmall(false); this.scan.setStartRow(startKey); this.scan.setStopRow(endKey); LOG.info("The task set startRowkey=[{}], endRowkey=[{}].", Bytes.toStringBinary(this.startKey), Bytes.toStringBinary(this.endKey)); //scan的Caching Batch全部留在hconfig中每次从服务器端读取的行数,设置默认值未256 this.scan.setCaching(this.scanCacheSize); //设置获取记录的列个数,hbase默认无限制,也就是返回所有的列,这里默认是100 this.scan.setBatch(this.scanBatchSize); //为是否缓存块,hbase默认缓存,同步全部数据时非热点数据,因此不需要缓存 this.scan.setCacheBlocks(false); initScan(this.scan); this.resultScanner = this.htable.getScanner(this.scan); } public void close() { Hbase094xHelper.closeResultScanner(this.resultScanner); Hbase094xHelper.closeTable(this.htable); } protected Result getNextHbaseRow() throws IOException { Result result; try { result = resultScanner.next(); } catch (IOException e) { if (lastResult != null) { this.scan.setStartRow(lastResult.getRow()); } resultScanner = this.htable.getScanner(scan); result = resultScanner.next(); if (lastResult != null && Bytes.equals(lastResult.getRow(), result.getRow())) { result = resultScanner.next(); } } lastResult = result; // may be null return result; } public Column convertBytesToAssignType(ColumnType columnType, byte[] byteArray,String dateformat) throws Exception { Column column; switch (columnType) { case BOOLEAN: column = new BoolColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toBoolean(byteArray)); break; case SHORT: column = new LongColumn(ArrayUtils.isEmpty(byteArray) ? null : String.valueOf(Bytes.toShort(byteArray))); break; case INT: column = new LongColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toInt(byteArray)); break; case LONG: column = new LongColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toLong(byteArray)); break; case FLOAT: column = new DoubleColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toFloat(byteArray)); break; case DOUBLE: column = new DoubleColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toDouble(byteArray)); break; case STRING: column = new StringColumn(ArrayUtils.isEmpty(byteArray) ? null : new String(byteArray, encoding)); break; case BINARY_STRING: column = new StringColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toStringBinary(byteArray)); break; case DATE: String dateValue = Bytes.toStringBinary(byteArray); column = new DateColumn(ArrayUtils.isEmpty(byteArray) ? null : DateUtils.parseDate(dateValue, new String[]{dateformat})); break; default: throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 不支持您配置的列类型:" + columnType); } return column; } public Column convertValueToAssignType(ColumnType columnType, String constantValue,String dateformat) throws Exception { Column column; switch (columnType) { case BOOLEAN: column = new BoolColumn(constantValue); break; case SHORT: case INT: case LONG: column = new LongColumn(constantValue); break; case FLOAT: case DOUBLE: column = new DoubleColumn(constantValue); break; case STRING: column = new StringColumn(constantValue); break; case DATE: column = new DateColumn(DateUtils.parseDate(constantValue, new String[]{dateformat})); break; default: throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 常量列不支持您配置的列类型:" + columnType); } return column; } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/HbaseColumnCell.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.base.BaseObject; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.hbase.util.Bytes; /** * 描述 hbasereader 插件中,column 配置中的一个单元项实体 */ public class HbaseColumnCell extends BaseObject { private ColumnType columnType; // columnName 格式为:列族:列名 private String columnName; private byte[] columnFamily; private byte[] qualifier; //对于常量类型,其常量值放到 columnValue 里 private String columnValue; //当配置了 columnValue 时,isConstant=true(这个成员变量是用于方便使用本类的地方判断是否是常量类型字段) private boolean isConstant; // 只在类型是时间类型时,才会设置该值,无默认值。形式如:yyyy-MM-dd HH:mm:ss private String dateformat; private HbaseColumnCell(Builder builder) { this.columnType = builder.columnType; //columnName 和 columnValue 必须有一个为 null Validate.isTrue(builder.columnName == null || builder.columnValue == null, "Hbasereader 中,column 不能同时配置 列名称 和 列值,二者选其一."); //columnName 和 columnValue 不能都为 null Validate.isTrue(builder.columnName != null || builder.columnValue != null, "Hbasereader 中,column 需要配置 列名称 或者 列值, 二者选其一."); if (builder.columnName != null) { this.isConstant = false; this.columnName = builder.columnName; // 如果 columnName 不是 rowkey,则必须配置为:列族:列名 格式 if (!Hbase094xHelper.isRowkeyColumn(this.columnName)) { String promptInfo = "Hbasereader 中,column 的列配置格式应该是:列族:列名. 您配置的列错误:" + this.columnName; String[] cfAndQualifier = this.columnName.split(":"); Validate.isTrue(cfAndQualifier != null && cfAndQualifier.length == 2 && StringUtils.isNotBlank(cfAndQualifier[0]) && StringUtils.isNotBlank(cfAndQualifier[1]), promptInfo); this.columnFamily = Bytes.toBytes(cfAndQualifier[0].trim()); this.qualifier = Bytes.toBytes(cfAndQualifier[1].trim()); } } else { this.isConstant = true; this.columnValue = builder.columnValue; } if (builder.dateformat != null) { this.dateformat = builder.dateformat; } } public ColumnType getColumnType() { return columnType; } public String getColumnName() { return columnName; } public byte[] getColumnFamily() { return columnFamily; } public byte[] getQualifier() { return qualifier; } public String getDateformat() { return dateformat; } public String getColumnValue() { return columnValue; } public boolean isConstant() { return isConstant; } // 内部 builder 类 public static class Builder { private ColumnType columnType; private String columnName; private String columnValue; private String dateformat; public Builder(ColumnType columnType) { this.columnType = columnType; } public Builder columnName(String columnName) { this.columnName = columnName; return this; } public Builder columnValue(String columnValue) { this.columnValue = columnValue; return this; } public Builder dateformat(String dateformat) { this.dateformat = dateformat; return this; } public HbaseColumnCell build() { return new HbaseColumnCell(this); } } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; public final class Key { public final static String HBASE_CONFIG = "hbaseConfig"; public final static String TABLE = "table"; /** * mode 可以取 normal 或者 multiVersionFixedColumn 或者 multiVersionDynamicColumn 三个值,无默认值。 *

* normal 配合 column(Map 结构的)使用 */ public final static String MODE = "mode"; /** * 配合 mode = multiVersion 时使用,指明需要读取的版本个数。无默认值 * -1 表示去读全部版本 * 不能为0,1 * >1 表示最多读取对应个数的版本数(不能超过 Integer 的最大值) */ public final static String MAX_VERSION = "maxVersion"; /** * 默认为 utf8 */ public final static String ENCODING = "encoding"; public final static String COLUMN = "column"; public final static String COLUMN_FAMILY = "columnFamily"; public static final String NAME = "name"; public static final String TYPE = "type"; public static final String FORMAT = "format"; public static final String VALUE = "value"; public final static String START_ROWKEY = "startRowkey"; public final static String END_ROWKEY = "endRowkey"; public final static String IS_BINARY_ROWKEY = "isBinaryRowkey"; public final static String SCAN_CACHE_SIZE = "scanCacheSize"; public final static String SCAN_BATCH_SIZE = "scanBatchSize"; } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/ModeType.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; public enum ModeType { Normal("normal"), MultiVersionFixedColumn("multiVersionFixedColumn") ; private String mode; ModeType(String mode) { this.mode = mode.toLowerCase(); } public static ModeType getByTypeName(String modeName) { for (ModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Hbase094xReaderErrorCode.ILLEGAL_VALUE, String.format("HbaseReader 不支持该 mode 类型:%s, 目前支持的 mode 类型是:%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/MultiVersionFixedColumnTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import java.util.Map; public class MultiVersionFixedColumnTask extends MultiVersionTask { public MultiVersionFixedColumnTask(Configuration configuration) { super(configuration); } @Override public void initScan(Scan scan) { for (Map aColumn : column) { String columnName = aColumn.get(Key.NAME); if(!Hbase094xHelper.isRowkeyColumn(columnName)){ String[] cfAndQualifier = columnName.split(":"); scan.addColumn(Bytes.toBytes(cfAndQualifier[0].trim()), Bytes.toBytes(cfAndQualifier[1].trim())); } } super.setMaxVersions(scan); } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/MultiVersionTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import java.io.UnsupportedEncodingException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; public abstract class MultiVersionTask extends HbaseAbstractTask { private static byte[] COLON_BYTE; private int maxVersion; private List kvList = new ArrayList(); private int currentReadPosition = 0; public List column; private HashMap> familyQualifierMap = null; public MultiVersionTask(Configuration configuration) { super(configuration); this.maxVersion = configuration.getInt(Key.MAX_VERSION); this.column = configuration.getList(Key.COLUMN, Map.class); this.familyQualifierMap = Hbase094xHelper.parseColumnOfMultiversionMode(this.column); try { MultiVersionTask.COLON_BYTE = ":".getBytes("utf8"); } catch (UnsupportedEncodingException e) { throw DataXException.asDataXException(Hbase094xReaderErrorCode.PREPAR_READ_ERROR, "系统内部获取 列族与列名冒号分隔符的二进制时失败.", e); } } @Override public boolean fetchLine(Record record) throws Exception { Result result; if (this.kvList == null || this.kvList.size() == this.currentReadPosition) { result = super.getNextHbaseRow(); if (result == null) { return false; } super.lastResult = result; this.kvList = result.list(); if (this.kvList == null) { return false; } this.currentReadPosition = 0; } try { KeyValue keyValue = this.kvList.get(this.currentReadPosition); convertCellToLine(keyValue, record); } catch (Exception e) { throw e; } finally { this.currentReadPosition++; } return true; } private void convertCellToLine(KeyValue keyValue, Record record) throws Exception { byte[] rawRowkey = keyValue.getRow(); long timestamp = keyValue.getTimestamp(); byte[] cfAndQualifierName = Bytes.add(keyValue.getFamily(), MultiVersionTask.COLON_BYTE, keyValue.getQualifier()); byte[] columnValue = keyValue.getValue(); ColumnType rawRowkeyType = ColumnType.getByTypeName(familyQualifierMap.get(Constant.ROWKEY_FLAG).get(Key.TYPE)); String familyQualifier = new String(cfAndQualifierName, Constant.DEFAULT_ENCODING); ColumnType columnValueType = ColumnType.getByTypeName(familyQualifierMap.get(familyQualifier).get(Key.TYPE)); String columnValueFormat = familyQualifierMap.get(familyQualifier).get(Key.FORMAT); if(StringUtils.isBlank(columnValueFormat)){ columnValueFormat = Constant.DEFAULT_DATA_FORMAT; } record.addColumn(convertBytesToAssignType(rawRowkeyType, rawRowkey, columnValueFormat)); record.addColumn(convertBytesToAssignType(ColumnType.STRING, cfAndQualifierName, columnValueFormat)); // 直接忽略了用户配置的 timestamp 的类型 record.addColumn(new LongColumn(timestamp)); record.addColumn(convertBytesToAssignType(columnValueType, columnValue, columnValueFormat)); } public void setMaxVersions(Scan scan) { if (this.maxVersion == -1 || this.maxVersion == Integer.MAX_VALUE) { scan.setMaxVersions(); } else { scan.setMaxVersions(this.maxVersion); } } } ================================================ FILE: hbase094xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase094xreader/NormalTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase094xreader; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import java.util.List; import java.util.Map; public class NormalTask extends HbaseAbstractTask { private List column; private List hbaseColumnCells; public NormalTask(Configuration configuration) { super(configuration); this.column = configuration.getList(Key.COLUMN, Map.class); this.hbaseColumnCells = Hbase094xHelper.parseColumnOfNormalMode(this.column); } /** * normal模式下将用户配置的column 设置到scan中 */ @Override public void initScan(Scan scan) { boolean isConstant; boolean isRowkeyColumn; for (HbaseColumnCell cell : this.hbaseColumnCells) { isConstant = cell.isConstant(); isRowkeyColumn = Hbase094xHelper.isRowkeyColumn(cell.getColumnName()); if (!isConstant && !isRowkeyColumn) { this.scan.addColumn(cell.getColumnFamily(), cell.getQualifier()); } } } @Override public boolean fetchLine(Record record) throws Exception { Result result = super.getNextHbaseRow(); if (null == result) { return false; } super.lastResult = result; try { byte[] hbaseColumnValue; String columnName; ColumnType columnType; byte[] columnFamily; byte[] qualifier; for (HbaseColumnCell cell : this.hbaseColumnCells) { columnType = cell.getColumnType(); if (cell.isConstant()) { // 对常量字段的处理 String constantValue = cell.getColumnValue(); Column constantColumn = super.convertValueToAssignType(columnType,constantValue,cell.getDateformat()); record.addColumn(constantColumn); } else { // 根据列名称获取值 columnName = cell.getColumnName(); if (Hbase094xHelper.isRowkeyColumn(columnName)) { hbaseColumnValue = result.getRow(); } else { columnFamily = cell.getColumnFamily(); qualifier = cell.getQualifier(); hbaseColumnValue = result.getValue(columnFamily, qualifier); } Column hbaseColumn = super.convertBytesToAssignType(columnType,hbaseColumnValue,cell.getDateformat()); record.addColumn(hbaseColumn); } } } catch (Exception e) { // 注意,这里catch的异常,期望是byte数组转换失败的情况。而实际上,string的byte数组,转成整数类型是不容易报错的。但是转成double类型容易报错。 record.setColumn(0, new StringColumn(Bytes.toStringBinary(result.getRow()))); throw e; } return true; } } ================================================ FILE: hbase094xreader/src/main/resources/plugin.json ================================================ { "name": "hbase094xreader", "class": "com.alibaba.datax.plugin.reader.hbase094xreader.Hbase094xReader", "description": "useScene: prod. mechanism: Scan to read data.", "developer": "alibaba" } ================================================ FILE: hbase094xreader/src/main/resources/plugin_job_template.json ================================================ { "name": "hbase094xreader", "parameter": { "hbaseConfig": {}, "table": "", "encoding": "", "mode": "", "column": [], "range": { "startRowkey": "", "endRowkey": "", "isBinaryRowkey": true } } } ================================================ FILE: hbase094xwriter/doc/.gitkeep ================================================ ================================================ FILE: hbase094xwriter/doc/hbase094xwriter.md ================================================ # Hbase094XWriter & Hbase11XWriter 插件文档 ___ ## 1 快速介绍 HbaseWriter 插件实现了从向Hbase中写取数据。在底层实现上,HbaseWriter 通过 HBase 的 Java 客户端连接远程 HBase 服务,并通过 put 方式写入Hbase。 ### 1.1支持功能 1、目前HbaseWriter支持的Hbase版本有:Hbase0.94.x和Hbase1.1.x。 * 若您的hbase版本为Hbase0.94.x,writer端的插件请选择:hbase094xwriter,即: ``` "writer": { "name": "hbase094xwriter" } ``` * 若您的hbase版本为Hbase1.1.x,writer端的插件请选择:hbase11xwriter,即: ``` "writer": { "name": "hbase11xwriter" } ``` 2、目前HbaseWriter支持源端多个字段拼接作为hbase 表的 rowkey,具体配置参考:rowkeyColumn配置; 3、写入hbase的时间戳(版本)支持:用当前时间作为版本,指定源端列作为版本,指定一个时间 三种方式作为版本; 4、HbaseWriter中有一个必填配置项是:hbaseConfig,需要你联系 HBase PE,将hbase-site.xml 中与连接 HBase 相关的配置项提取出来,以 json 格式填入,同时可以补充更多HBase client的配置来优化与服务器的交互。 如:hbase-site.xml的配置内容如下 ``` hbase.rootdir hdfs://ip:9000/hbase hbase.cluster.distributed true hbase.zookeeper.quorum *** ``` 转换后的json为: ``` "hbaseConfig": { "hbase.rootdir": "hdfs: //ip: 9000/hbase", "hbase.cluster.distributed": "true", "hbase.zookeeper.quorum": "***" } ``` ### 1.2 限制 1、目前只支持源端为横表写入,不支持竖表(源端读出的为四元组: rowKey,family:qualifier,timestamp,value)模式的数据写入;本期目标主要是替换DataX2中的habsewriter,下次迭代考虑支持。 2、目前不支持写入hbase前清空表数据,若需要清空数据请联系HBase PE ## 2 实现原理 简而言之,HbaseWriter 通过 HBase 的 Java 客户端,通过 HTable, Put等 API,将从上游Reader读取的数据写入HBase你hbase11xwriter与hbase094xwriter的主要不同在于API的调用不同,Hbase1.1.x废弃了很多Hbase0.94.x的api。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从本地写入hbase1.1.x的作业: ``` { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": "/Users/shf/workplace/datax_test/hbase11xwriter/txt/normal.txt", "charset": "UTF-8", "column": [ { "index": 0, "type": "String" }, { "index": 1, "type": "string" }, { "index": 2, "type": "string" }, { "index": 3, "type": "string" }, { "index": 4, "type": "string" }, { "index": 5, "type": "string" }, { "index": 6, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "hbase11xwriter", "parameter": { "hbaseConfig": { "hbase.rootdir": "hdfs: //ip: 9000/hbase", "hbase.cluster.distributed": "true", "hbase.zookeeper.quorum": "***" }, "table": "writer", "mode": "normal", "rowkeyColumn": [ { "index":0, "type":"string" }, { "index":-1, "type":"string", "value":"_" } ], "column": [ { "index":1, "name": "cf1:q1", "type": "string" }, { "index":2, "name": "cf1:q2", "type": "string" }, { "index":3, "name": "cf1:q3", "type": "string" }, { "index":4, "name": "cf2:q1", "type": "string" }, { "index":5, "name": "cf2:q2", "type": "string" }, { "index":6, "name": "cf2:q3", "type": "string" } ], "versionColumn":{ "index": -1, "value":"123456789" }, "encoding": "utf-8" } } } ] } } ``` ### 3.2 参数说明 * **hbaseConfig** * 描述:每个HBase集群提供给DataX客户端连接的配置信息存放在hbase-site.xml,请联系你的HBase PE提供配置信息,并转换为JSON格式。同时可以补充更多HBase client的配置,如:设置scan的cache、batch来优化与服务器的交互。 * 必选:是
* 默认值:无
* **mode** * 描述:写hbase的模式,目前只支持normal 模式,后续考虑动态列模式
* 必选:是
* 默认值:无
* **table** * 描述:要写的 hbase 表名(大小写敏感)
* 必选:是
* 默认值:无
* **encoding** * 描述:编码方式,UTF-8 或是 GBK,用于 String 转 HBase byte[]时的编码
* 必选:否
* 默认值:UTF-8
* **column** * 描述:要写入的hbase字段。index:指定该列对应reader端column的索引,从0开始;name:指定hbase表中的列,必须为 列族:列名 的格式;type:指定写入数据类型,用于转换HBase byte[]。配置格式如下: ``` "column": [ { "index":1, "name": "cf1:q1", "type": "string" }, { "index":2, "name": "cf1:q2", "type": "string" } ] ``` * 必选:是
* 默认值:无
* **rowkeyColumn** * 描述:要写入的hbase的rowkey列。index:指定该列对应reader端column的索引,从0开始,若为常量index为-1;type:指定写入数据类型,用于转换HBase byte[];value:配置常量,常作为多个字段的拼接符。hbasewriter会将rowkeyColumn中所有列按照配置顺序进行拼接作为写入hbase的rowkey,不能全为常量。配置格式如下: ``` "rowkeyColumn": [ { "index":0, "type":"string" }, { "index":-1, "type":"string", "value":"_" } ] ``` * 必选:是
* 默认值:无
* **versionColumn** * 描述:指定写入hbase的时间戳。支持:当前时间、指定时间列,指定时间,三者选一。若不配置表示用当前时间。index:指定对应reader端column的索引,从0开始,需保证能转换为long,若是Date类型,会尝试用yyyy-MM-dd HH:mm:ss和yyyy-MM-dd HH:mm:ss SSS去解析;若为指定时间index为-1;value:指定时间的值,long值。配置格式如下: ``` "versionColumn":{ "index":1 } ``` 或者 ``` "versionColumn":{ "index":-1, "value":123456789 } ``` * 必选:否
* 默认值:无
* **nullMode** * 描述:读取的null值时,如何处理。支持两种方式:(1)skip:表示不向hbase写这列;(2)empty:写入HConstants.EMPTY_BYTE_ARRAY,即new byte [0]
* 必选:否
* 默认值:skip
* **walFlag** * 描述:在HBae client向集群中的RegionServer提交数据时(Put/Delete操作),首先会先写WAL(Write Ahead Log)日志(即HLog,一个RegionServer上的所有Region共享一个HLog),只有当WAL日志写成功后,再接着写MemStore,然后客户端被通知提交数据成功;如果写WAL日志失败,客户端则被通知提交失败。关闭(false)放弃写WAL日志,从而提高数据写入的性能。
* 必选:否
* 默认值:false
* **writeBufferSize** * 描述:设置HBae client的写buffer大小,单位字节。配合autoflush使用。autoflush,开启(true)表示Hbase client在写的时候有一条put就执行一次更新;关闭(false),表示Hbase client在写的时候只有当put填满客户端写缓存时,才实际向HBase服务端发起写请求
* 必选:否
* 默认值:8M
### 3.3 HBase支持的列类型 * BOOLEAN * SHORT * INT * LONG * FLOAT * DOUBLE * STRING 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 ## 4 性能报告 略 ## 5 约束限制 略 ## 6 FAQ *** ================================================ FILE: hbase094xwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hbase094xwriter hbase094xwriter 0.0.1-SNAPSHOT 1.8 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.hbase hbase 0.94.27 jdk.tools jdk.tools org.apache.hadoop hadoop-core 0.20.205.0 org.apache.zookeeper zookeeper 3.3.2 commons-codec commons-codec ${commons-codec.version} com.alibaba.datax datax-core ${datax-project-version} test com.alibaba.datax datax-common 0.0.1-SNAPSHOT maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hbase094xwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/hbase094xwriter target/ hbase094xwriter-0.0.1-SNAPSHOT.jar plugin/writer/hbase094xwriter false plugin/writer/hbase094xwriter/libs runtime ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/ColumnType.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang.StringUtils; import java.util.Arrays; /** * 只对 normal 模式读取时有用,多版本读取时,不存在列类型的 */ public enum ColumnType { STRING("string"), BOOLEAN("boolean"), SHORT("short"), INT("int"), LONG("long"), FLOAT("float"), DOUBLE("double"); private String typeName; ColumnType(String typeName) { this.typeName = typeName; } public static ColumnType getByTypeName(String typeName) { if(StringUtils.isBlank(typeName)){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持该类型:%s, 目前支持的类型是:%s", typeName, Arrays.asList(values()))); } for (ColumnType columnType : values()) { if (StringUtils.equalsIgnoreCase(columnType.typeName, typeName.trim())) { return columnType; } } throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持该类型:%s, 目前支持的类型是:%s", typeName, Arrays.asList(values()))); } @Override public String toString() { return this.typeName; } } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; public final class Constant { public static final String DEFAULT_ENCODING = "UTF-8"; public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String DEFAULT_NULL_MODE = "skip"; public static final long DEFAULT_WRITE_BUFFER_SIZE = 8 * 1024 * 1024; } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Hbase094xHelper.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.Charset; import java.util.List; import java.util.Map; /** * Created by shf on 16/3/7. */ public class Hbase094xHelper { private static final Logger LOG = LoggerFactory.getLogger(Hbase094xHelper.class); /** * * @param hbaseConfig * @return */ public static org.apache.hadoop.conf.Configuration getHbaseConfiguration(String hbaseConfig) { if (StringUtils.isBlank(hbaseConfig)) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, "读 Hbase 时需要配置hbaseConfig,其内容为 Hbase 连接信息,请联系 Hbase PE 获取该信息."); } org.apache.hadoop.conf.Configuration hConfiguration = HBaseConfiguration.create(); try { Map hbaseConfigMap = JSON.parseObject(hbaseConfig, new TypeReference>() {}); // 用户配置的 key-value 对 来表示 hbaseConfig Validate.isTrue(hbaseConfigMap != null, "hbaseConfig不能为空Map结构!"); for (Map.Entry entry : hbaseConfigMap.entrySet()) { hConfiguration.set(entry.getKey(), entry.getValue()); } } catch (Exception e) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.GET_HBASE_CONFIG_ERROR, e); } return hConfiguration; } public static HTable getTable(com.alibaba.datax.common.util.Configuration configuration){ String hbaseConfig = configuration.getString(Key.HBASE_CONFIG); String userTable = configuration.getString(Key.TABLE); org.apache.hadoop.conf.Configuration hConfiguration = Hbase094xHelper.getHbaseConfiguration(hbaseConfig); Boolean autoFlush = configuration.getBool(Key.AUTO_FLUSH, false); long writeBufferSize = configuration.getLong(Key.WRITE_BUFFER_SIZE, Constant.DEFAULT_WRITE_BUFFER_SIZE); HTable htable = null; HBaseAdmin admin = null; try { htable = new HTable(hConfiguration, userTable); admin = new HBaseAdmin(hConfiguration); Hbase094xHelper.checkHbaseTable(admin,htable); //本期设置autoflush 一定为flase,通过hbase writeBufferSize来控制每次flush大小 htable.setAutoFlush(false); htable.setWriteBufferSize(writeBufferSize); return htable; } catch (Exception e) { Hbase094xHelper.closeTable(htable); throw DataXException.asDataXException(Hbase094xWriterErrorCode.GET_HBASE_TABLE_ERROR, e); }finally { Hbase094xHelper.closeAdmin(admin); } } public static void deleteTable(com.alibaba.datax.common.util.Configuration configuration) { String userTable = configuration.getString(Key.TABLE); LOG.info(String.format("由于您配置了deleteType delete,HBasWriter begins to delete table %s .", userTable)); Scan scan = new Scan(); HTable hTable =Hbase094xHelper.getTable(configuration); ResultScanner scanner = null; try { scanner = hTable.getScanner(scan); for (Result rr = scanner.next(); rr != null; rr = scanner.next()) { hTable.delete(new Delete(rr.getRow())); } } catch (Exception e) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.DELETE_HBASE_ERROR, e); }finally { if(scanner != null){ scanner.close(); } Hbase094xHelper.closeTable(hTable); } } public static void truncateTable(com.alibaba.datax.common.util.Configuration configuration) { String hbaseConfig = configuration.getString(Key.HBASE_CONFIG); String userTable = configuration.getString(Key.TABLE); org.apache.hadoop.conf.Configuration hConfiguration = Hbase094xHelper.getHbaseConfiguration(hbaseConfig); HTable htable = null; HBaseAdmin admin = null; LOG.info(String.format("由于您配置了deleteType truncate,HBasWriter begins to truncate table %s .", userTable)); try{ htable = new HTable(hConfiguration, userTable); admin = new HBaseAdmin(hConfiguration); HTableDescriptor descriptor = htable.getTableDescriptor(); Hbase094xHelper.checkHbaseTable(admin,htable); admin.disableTable(htable.getTableName()); admin.deleteTable(htable.getTableName()); admin.createTable(descriptor); }catch (Exception e) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.TRUNCATE_HBASE_ERROR, e); }finally { Hbase094xHelper.closeAdmin(admin); Hbase094xHelper.closeTable(htable); } } public static void closeAdmin(HBaseAdmin admin){ try { if(null != admin) admin.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.CLOSE_HBASE_AMIN_ERROR, e); } } public static void closeTable(HTable table){ try { if(null != table) table.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.CLOSE_HBASE_TABLE_ERROR, e); } } public static void checkHbaseTable(HBaseAdmin admin, HTable hTable) throws IOException { if (!admin.isMasterRunning()) { throw new IllegalStateException("HBase master 没有运行, 请检查您的配置 或者 联系 Hbase 管理员."); } if (!admin.tableExists(hTable.getTableName())) { throw new IllegalStateException("HBase源头表" + Bytes.toString(hTable.getTableName()) + "不存在, 请检查您的配置 或者 联系 Hbase 管理员."); } if (!admin.isTableAvailable(hTable.getTableName()) || !admin.isTableEnabled(hTable.getTableName())) { throw new IllegalStateException("HBase源头表" + Bytes.toString(hTable.getTableName()) + " 不可用, 请检查您的配置 或者 联系 Hbase 管理员."); } if(admin.isTableDisabled(hTable.getTableName())){ throw new IllegalStateException("HBase源头表" + Bytes.toString(hTable.getTableName()) + " 不可用, 请检查您的配置 或者 联系 Hbase 管理员."); } } public static void validateParameter(com.alibaba.datax.common.util.Configuration originalConfig) { originalConfig.getNecessaryValue(Key.HBASE_CONFIG, Hbase094xWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.TABLE, Hbase094xWriterErrorCode.REQUIRED_VALUE); Hbase094xHelper.validateMode(originalConfig); String encoding = originalConfig.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); if (!Charset.isSupported(encoding)) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持您所配置的编码:[%s]", encoding)); } originalConfig.set(Key.ENCODING, encoding); Boolean autoFlush = originalConfig.getBool(Key.AUTO_FLUSH, false); //本期设置autoflush 一定为flase,通过hbase writeBufferSize来控制每次flush大小 originalConfig.set(Key.AUTO_FLUSH,false); Boolean walFlag = originalConfig.getBool(Key.WAL_FLAG, false); originalConfig.set(Key.WAL_FLAG, walFlag); long writeBufferSize = originalConfig.getLong(Key.WRITE_BUFFER_SIZE,Constant.DEFAULT_WRITE_BUFFER_SIZE); originalConfig.set(Key.WRITE_BUFFER_SIZE, writeBufferSize); } public static void validateMode(com.alibaba.datax.common.util.Configuration originalConfig){ String mode = originalConfig.getNecessaryValue(Key.MODE, Hbase094xWriterErrorCode.REQUIRED_VALUE); ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: { validateRowkeyColumn(originalConfig); validateColumn(originalConfig); validateVersionColumn(originalConfig); break; } default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbase11xWriter不支持该 mode 类型:%s", mode)); } } public static void validateColumn(com.alibaba.datax.common.util.Configuration originalConfig){ List columns = originalConfig.getListConfiguration(Key.COLUMN); if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, "column为必填项,其形式为:column:[{\"index\": 0,\"name\": \"cf0:column0\",\"type\": \"string\"},{\"index\": 1,\"name\": \"cf1:column1\",\"type\": \"long\"}]"); } for (Configuration aColumn : columns) { Integer index = aColumn.getInt(Key.INDEX); String type = aColumn.getNecessaryValue(Key.TYPE, Hbase094xWriterErrorCode.REQUIRED_VALUE); String name = aColumn.getNecessaryValue(Key.NAME, Hbase094xWriterErrorCode.REQUIRED_VALUE); ColumnType.getByTypeName(type); if(name.split(":").length != 2){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, String.format("您column配置项中name配置的列格式[%s]不正确,name应该配置为 列族:列名 的形式, 如 {\"index\": 1,\"name\": \"cf1:q1\",\"type\": \"long\"}", name)); } if(index == null || index < 0){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, "您的column配置项不正确,配置项中中index为必填项,且为非负数,请检查并修改."); } } } public static void validateRowkeyColumn(com.alibaba.datax.common.util.Configuration originalConfig){ List rowkeyColumn = originalConfig.getListConfiguration(Key.ROWKEY_COLUMN); if (rowkeyColumn == null || rowkeyColumn.isEmpty()) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, "rowkeyColumn为必填项,其形式为:rowkeyColumn:[{\"index\": 0,\"type\": \"string\"},{\"index\": -1,\"type\": \"string\",\"value\": \"_\"}]"); } int rowkeyColumnSize = rowkeyColumn.size(); //包含{"index":0,"type":"string"} 或者 {"index":-1,"type":"string","value":"_"} for (Configuration aRowkeyColumn : rowkeyColumn) { Integer index = aRowkeyColumn.getInt(Key.INDEX); String type = aRowkeyColumn.getNecessaryValue(Key.TYPE, Hbase094xWriterErrorCode.REQUIRED_VALUE); ColumnType.getByTypeName(type); if(index == null ){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, "rowkeyColumn配置项中index为必填项"); } //不能只有-1列,即rowkey连接串 if(rowkeyColumnSize ==1 && index == -1){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, "rowkeyColumn配置项不能全为常量列,至少指定一个rowkey列"); } if(index == -1){ aRowkeyColumn.getNecessaryValue(Key.VALUE, Hbase094xWriterErrorCode.REQUIRED_VALUE); } } } public static void validateVersionColumn(com.alibaba.datax.common.util.Configuration originalConfig){ Configuration versionColumn = originalConfig.getConfiguration(Key.VERSION_COLUMN); //为null,表示用当前时间;指定列,需要index if(versionColumn != null){ Integer index = versionColumn.getInt(Key.INDEX); if(index == null ){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, "versionColumn配置项中index为必填项"); } if(index == -1){ //指定时间,需要index=-1,value versionColumn.getNecessaryValue(Key.VALUE, Hbase094xWriterErrorCode.REQUIRED_VALUE); }else if(index < 0){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, "您versionColumn配置项中index配置不正确,只能取-1或者非负数"); } } } } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Hbase094xWriter.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; /** * Created by shf on 16/3/17. */ public class Hbase094xWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originConfig = null; @Override public void init() { this.originConfig = this.getPluginJobConf(); Hbase094xHelper.validateParameter(this.originConfig); } @Override public void prepare(){ Boolean truncate = originConfig.getBool(Key.TRUNCATE,false); if(truncate){ Hbase094xHelper.truncateTable(this.originConfig); } } @Override public List split(int mandatoryNumber) { List splitResultConfigs = new ArrayList(); for (int j = 0; j < mandatoryNumber; j++) { splitResultConfigs.add(originConfig.clone()); } return splitResultConfigs; } @Override public void destroy() { } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration taskConfig; private HbaseAbstractTask hbaseTaskProxy; @Override public void init() { this.taskConfig = super.getPluginJobConf(); String mode = this.taskConfig.getString(Key.MODE); ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: this.hbaseTaskProxy = new NormalTask(this.taskConfig); break; default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, "Hbasewriter 不支持此类模式:" + modeType); } } @Override public void startWrite(RecordReceiver lineReceiver) { this.hbaseTaskProxy.startWriter(lineReceiver,super.getTaskPluginCollector()); } @Override public void destroy() { if (this.hbaseTaskProxy != null) { this.hbaseTaskProxy.close(); } } } } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Hbase094xWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by shf on 16/3/8. */ public enum Hbase094xWriterErrorCode implements ErrorCode { REQUIRED_VALUE("Hbasewriter-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("Hbasewriter-01", "您填写的参数值不合法."), GET_HBASE_CONFIG_ERROR("Hbasewriter-02", "获取Hbase config时出错."), GET_HBASE_TABLE_ERROR("Hbasewriter-03", "初始化 Hbase 抽取表时出错."), CLOSE_HBASE_AMIN_ERROR("Hbasewriter-05", "关闭Hbase admin时出错."), CLOSE_HBASE_TABLE_ERROR("Hbasewriter-06", "关闭Hbase table时时出错."), PUT_HBASE_ERROR("Hbasewriter-07", "写入hbase时发生IO异常."), DELETE_HBASE_ERROR("Hbasewriter-08", "delete hbase表时发生异常."), TRUNCATE_HBASE_ERROR("Hbasewriter-09", "truncate hbase表时发生异常"), CONSTRUCT_ROWKEY_ERROR("Hbasewriter-10", "构建rowkey时发生异常."), CONSTRUCT_VERSION_ERROR("Hbasewriter-11", "构建version时发生异常.") ; private final String code; private final String description; private Hbase094xWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/HbaseAbstractTask.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.Charset; import java.util.List; public abstract class HbaseAbstractTask { private final static Logger LOG = LoggerFactory.getLogger(HbaseAbstractTask.class); public NullModeType nullMode = null; public List columns; public List rowkeyColumn; public Configuration versionColumn; public HTable htable; public String encoding; public Boolean walFlag; public HbaseAbstractTask(com.alibaba.datax.common.util.Configuration configuration) { this.htable = Hbase094xHelper.getTable(configuration); this.columns = configuration.getListConfiguration(Key.COLUMN); this.rowkeyColumn = configuration.getListConfiguration(Key.ROWKEY_COLUMN); this.versionColumn = configuration.getConfiguration(Key.VERSION_COLUMN); this.encoding = configuration.getString(Key.ENCODING,Constant.DEFAULT_ENCODING); this.nullMode = NullModeType.getByTypeName(configuration.getString(Key.NULL_MODE,Constant.DEFAULT_NULL_MODE)); this.walFlag = configuration.getBool(Key.WAL_FLAG, false); } public void startWriter(RecordReceiver lineReceiver,TaskPluginCollector taskPluginCollector){ Record record; try { while ((record = lineReceiver.getFromReader()) != null) { Put put; try { put = convertRecordToPut(record); } catch (Exception e) { taskPluginCollector.collectDirtyRecord(record, e); continue; } try { this.htable.put(put); } catch (IllegalArgumentException e) { if(e.getMessage().equals("No columns to insert") && nullMode.equals(NullModeType.Skip)){ LOG.info(String.format("record is empty, 您配置nullMode为[skip],将会忽略这条记录,record[%s]", record.toString())); continue; }else { taskPluginCollector.collectDirtyRecord(record, e); continue; } } } }catch (IOException e){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.PUT_HBASE_ERROR,e); }finally { Hbase094xHelper.closeTable(this.htable); } } public abstract Put convertRecordToPut(Record record); public void close() { Hbase094xHelper.closeTable(this.htable); } public byte[] getColumnByte(ColumnType columnType, Column column){ byte[] bytes; if(column.getRawData() != null){ switch (columnType) { case INT: bytes = Bytes.toBytes(column.asLong().intValue()); break; case LONG: bytes = Bytes.toBytes(column.asLong()); break; case DOUBLE: bytes = Bytes.toBytes(column.asDouble()); break; case FLOAT: bytes = Bytes.toBytes(column.asDouble().floatValue()); break; case SHORT: bytes = Bytes.toBytes(column.asLong().shortValue()); break; case BOOLEAN: bytes = Bytes.toBytes(column.asBoolean()); break; case STRING: bytes = this.getValueByte(columnType,column.asString()); break; default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, "HbaseWriter列不支持您配置的列类型:" + columnType); } }else{ switch (nullMode){ case Skip: bytes = null; break; case Empty: bytes = HConstants.EMPTY_BYTE_ARRAY; break; default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, "HbaseWriter nullMode不支持您配置的类型,只支持skip或者empty"); } } return bytes; } public byte[] getValueByte(ColumnType columnType, String value){ byte[] bytes; if(value != null){ switch (columnType) { case INT: bytes = Bytes.toBytes(Integer.parseInt(value)); break; case LONG: bytes = Bytes.toBytes(Long.parseLong(value)); break; case DOUBLE: bytes = Bytes.toBytes(Double.parseDouble(value)); break; case FLOAT: bytes = Bytes.toBytes(Float.parseFloat(value)); break; case SHORT: bytes = Bytes.toBytes(Short.parseShort(value)); break; case BOOLEAN: bytes = Bytes.toBytes(Boolean.parseBoolean(value)); break; case STRING: bytes = value.getBytes(Charset.forName(encoding)); break; default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, "HbaseWriter列不支持您配置的列类型:" + columnType); } }else{ bytes = HConstants.EMPTY_BYTE_ARRAY; } return bytes; } } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; public final class Key { public final static String HBASE_CONFIG = "hbaseConfig"; public final static String TABLE = "table"; /** * mode 可以取 normal 或者 multiVersionFixedColumn 或者 multiVersionDynamicColumn 三个值,无默认值。 *

* normal 配合 column(Map 结构的)使用 *

* multiVersion */ public final static String MODE = "mode"; public final static String ROWKEY_COLUMN = "rowkeyColumn"; public final static String VERSION_COLUMN = "versionColumn"; /** * 默认为 utf8 */ public final static String ENCODING = "encoding"; public final static String COLUMN = "column"; public static final String INDEX = "index"; public static final String NAME = "name"; public static final String TYPE = "type"; public static final String VALUE = "value"; public static final String FORMAT = "format"; /** * 默认为 EMPTY_BYTES */ public static final String NULL_MODE = "nullMode"; public static final String TRUNCATE = "truncate"; public static final String AUTO_FLUSH = "autoFlush"; public static final String WAL_FLAG = "walFlag"; public static final String WRITE_BUFFER_SIZE = "writeBufferSize"; } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/ModeType.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; public enum ModeType { Normal("normal"), MultiVersion("multiVersion") ; private String mode; ModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static ModeType getByTypeName(String modeName) { for (ModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持该 mode 类型:%s, 目前支持的 mode 类型是:%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/NormalTask.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.commons.lang3.time.DateUtils; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Timestamp; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; public class NormalTask extends HbaseAbstractTask { private static final Logger LOG = LoggerFactory.getLogger(NormalTask.class); public NormalTask(Configuration configuration) { super(configuration); } @Override public Put convertRecordToPut(Record record){ byte[] rowkey = getRowkey(record); Put put = null; if(this.versionColumn == null){ put = new Put(rowkey); put.setWriteToWAL(super.walFlag); }else { long timestamp = getVersion(record); put = new Put(rowkey,timestamp); } for (Configuration aColumn : columns) { Integer index = aColumn.getInt(Key.INDEX); String type = aColumn.getString(Key.TYPE); ColumnType columnType = ColumnType.getByTypeName(type); String name = aColumn.getString(Key.NAME); String promptInfo = "Hbasewriter 中,column 的列配置格式应该是:列族:列名. 您配置的列错误:" + name; String[] cfAndQualifier = name.split(":"); Validate.isTrue(cfAndQualifier != null && cfAndQualifier.length == 2 && StringUtils.isNotBlank(cfAndQualifier[0]) && StringUtils.isNotBlank(cfAndQualifier[1]), promptInfo); if(index >= record.getColumnNumber()){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, String.format("您的column配置项中中index值超出范围,根据reader端配置,index的值小于%s,而您配置的值为%s,请检查并修改.",record.getColumnNumber(),index)); } byte[] columnBytes = getColumnByte(columnType,record.getColumn(index)); //columnBytes 为null忽略这列 if(null != columnBytes){ put.add(Bytes.toBytes( cfAndQualifier[0]), Bytes.toBytes(cfAndQualifier[1]), columnBytes); }else{ continue; } } return put; } public byte[] getRowkey(Record record){ byte[] rowkeyBuffer = {}; for (Configuration aRowkeyColumn : rowkeyColumn) { Integer index = aRowkeyColumn.getInt(Key.INDEX); String type = aRowkeyColumn.getString(Key.TYPE); ColumnType columnType = ColumnType.getByTypeName(type); if(index == -1){ String value = aRowkeyColumn.getString(Key.VALUE); rowkeyBuffer = Bytes.add(rowkeyBuffer,getValueByte(columnType,value)); }else{ if(index >= record.getColumnNumber()){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_ROWKEY_ERROR, String.format("您的rowkeyColumn配置项中中index值超出范围,根据reader端配置,index的值小于%s,而您配置的值为%s,请检查并修改.",record.getColumnNumber(),index)); } byte[] value = getColumnByte(columnType,record.getColumn(index)); rowkeyBuffer = Bytes.add(rowkeyBuffer, value); } } return rowkeyBuffer; } public long getVersion(Record record){ int index = versionColumn.getInt(Key.INDEX); long timestamp; if(index == -1){ //指定时间作为版本 timestamp = versionColumn.getLong(Key.VALUE); if(timestamp < 0){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_VERSION_ERROR, "您指定的版本非法!"); } }else{ //指定列作为版本,long/doubleColumn直接record.aslong, 其它类型尝试用yyyy-MM-dd HH:mm:ss,yyyy-MM-dd HH:mm:ss SSS去format if(index >= record.getColumnNumber()){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_VERSION_ERROR, String.format("您的versionColumn配置项中中index值超出范围,根据reader端配置,index的值小于%s,而您配置的值为%s,请检查并修改.",record.getColumnNumber(),index)); } if(record.getColumn(index).getRawData() == null){ throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_VERSION_ERROR, "您指定的版本为空!"); } SimpleDateFormat df_senconds = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); SimpleDateFormat df_ms = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss SSS"); if(record.getColumn(index) instanceof LongColumn || record.getColumn(index) instanceof DoubleColumn){ timestamp = record.getColumn(index).asLong(); }else { Date date; try{ date = df_ms.parse(record.getColumn(index).asString()); }catch (ParseException e){ try { date = df_senconds.parse(record.getColumn(index).asString()); } catch (ParseException e1) { LOG.info(String.format("您指定第[%s]列作为hbase写入版本,但在尝试用yyyy-MM-dd HH:mm:ss 和 yyyy-MM-dd HH:mm:ss SSS 去解析为Date时均出错,请检查并修改",index)); throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_VERSION_ERROR, e1); } } timestamp = date.getTime(); } } return timestamp; } } ================================================ FILE: hbase094xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase094xwriter/NullModeType.java ================================================ package com.alibaba.datax.plugin.writer.hbase094xwriter; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; public enum NullModeType { Skip("skip"), Empty("empty") ; private String mode; NullModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static NullModeType getByTypeName(String modeName) { for (NullModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持该 nullMode 类型:%s, 目前支持的 nullMode 类型是:%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: hbase094xwriter/src/main/resources/plugin.json ================================================ { "name": "hbase094xwriter", "class": "com.alibaba.datax.plugin.writer.hbase094xwriter.Hbase094xWriter", "description": "use put: prod. mechanism: use hbase java api put data.", "developer": "alibaba" } ================================================ FILE: hbase094xwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "hbase094xwriter", "parameter": { "hbaseConfig": {}, "table": "", "mode": "", "rowkeyColumn": [ ], "column": [ ], "versionColumn":{ "index": "", "value":"" }, "encoding": "" } } ================================================ FILE: hbase11xreader/doc/.gitkeep ================================================ ================================================ FILE: hbase11xreader/doc/hbase11xreader.md ================================================ # Hbase094XReader & Hbase11XReader 插件文档 ___ ## 1 快速介绍 HbaseReader 插件实现了从 Hbase中读取数据。在底层实现上,HbaseReader 通过 HBase 的 Java 客户端连接远程 HBase 服务,并通过 Scan 方式读取你指定 rowkey 范围内的数据,并将读取的数据使用 DataX 自定义的数据类型拼装为抽象的数据集,并传递给下游 Writer 处理。 ### 1.1支持的功能 1、目前HbaseReader支持的Hbase版本有:Hbase0.94.x和Hbase1.1.x。 * 若您的hbase版本为Hbase0.94.x,reader端的插件请选择:hbase094xreader,即: ``` "reader": { "name": "hbase094xreader" } ``` * 若您的hbase版本为Hbase1.1.x,reader端的插件请选择:hbase11xreader,即: ``` "reader": { "name": "hbase11xreader" } ``` 2、目前HbaseReader支持两模式读取:normal 模式、multiVersionFixedColumn模式; * normal 模式:把HBase中的表,当成普通二维表(横表)进行读取,读取最新版本数据。如: ``` hbase(main):017:0> scan 'users' ROW COLUMN+CELL lisi column=address:city, timestamp=1457101972764, value=beijing lisi column=address:contry, timestamp=1457102773908, value=china lisi column=address:province, timestamp=1457101972736, value=beijing lisi column=info:age, timestamp=1457101972548, value=27 lisi column=info:birthday, timestamp=1457101972604, value=1987-06-17 lisi column=info:company, timestamp=1457101972653, value=baidu xiaoming column=address:city, timestamp=1457082196082, value=hangzhou xiaoming column=address:contry, timestamp=1457082195729, value=china xiaoming column=address:province, timestamp=1457082195773, value=zhejiang xiaoming column=info:age, timestamp=1457082218735, value=29 xiaoming column=info:birthday, timestamp=1457082186830, value=1987-06-17 xiaoming column=info:company, timestamp=1457082189826, value=alibaba 2 row(s) in 0.0580 seconds ``` 读取后数据 | rowKey | addres:city | address:contry | address:province | info:age| info:birthday | info:company | | --------| ---------------- |----- |----- |--------| ---------------- |----- | | lisi | beijing| china| beijing |27 | 1987-06-17 | baidu| | xiaoming | hangzhou| china | zhejiang|29 | 1987-06-17 | alibaba| * multiVersionFixedColumn模式:把HBase中的表,当成竖表进行读取。读出的每条记录一定是四列形式,依次为:rowKey,family:qualifier,timestamp,value。读取时需要明确指定要读取的列,把每一个 cell 中的值,作为一条记录(record),若有多个版本就有多条记录(record)。如: ``` hbase(main):018:0> scan 'users',{VERSIONS=>5} ROW COLUMN+CELL lisi column=address:city, timestamp=1457101972764, value=beijing lisi column=address:contry, timestamp=1457102773908, value=china lisi column=address:province, timestamp=1457101972736, value=beijing lisi column=info:age, timestamp=1457101972548, value=27 lisi column=info:birthday, timestamp=1457101972604, value=1987-06-17 lisi column=info:company, timestamp=1457101972653, value=baidu xiaoming column=address:city, timestamp=1457082196082, value=hangzhou xiaoming column=address:contry, timestamp=1457082195729, value=china xiaoming column=address:province, timestamp=1457082195773, value=zhejiang xiaoming column=info:age, timestamp=1457082218735, value=29 xiaoming column=info:age, timestamp=1457082178630, value=24 xiaoming column=info:birthday, timestamp=1457082186830, value=1987-06-17 xiaoming column=info:company, timestamp=1457082189826, value=alibaba 2 row(s) in 0.0260 seconds ``` 读取后数据(4列) | rowKey | column:qualifier| timestamp | value | | --------| ---------------- |----- |----- | | lisi | address:city| 1457101972764 | beijing | | lisi | address:contry| 1457102773908 | china | | lisi | address:province| 1457101972736 | beijing | | lisi | info:age| 1457101972548 | 27 | | lisi | info:birthday| 1457101972604 | 1987-06-17 | | lisi | info:company| 1457101972653 | beijing | | xiaoming | address:city| 1457082196082 | hangzhou | | xiaoming | address:contry| 1457082195729 | china | | xiaoming | address:province| 1457082195773 | zhejiang | | xiaoming | info:age| 1457082218735 | 29 | | xiaoming | info:age| 1457082178630 | 24 | | xiaoming | info:birthday| 1457082186830 | 1987-06-17 | | xiaoming | info:company| 1457082189826 | alibaba | ### 1.2 限制 1、目前不支持动态列的读取。考虑网络传输流量(支持动态列,需要先将hbase所有列的数据读取出来,再按规则进行过滤),现支持的两种读取模式中需要用户明确指定要读取的列。 2、关于同步作业的切分:目前的切分方式是根据用户hbase表数据的region分布进行切分。即:在用户填写的[startrowkey,endrowkey]范围内,一个region会切分成一个task,单个region不进行切分。 3、multiVersionFixedColumn模式下不支持增加常量列 ## 2 实现原理 简而言之,HbaseReader 通过 HBase 的 Java 客户端,通过 HTable, Scan, ResultScanner 等 API,读取你指定 rowkey 范围内的数据,并将读取的数据使用 DataX 自定义的数据类型拼装为抽象的数据集,并传递给下游 Writer 处理。hbase11xreader与hbase094xreader的主要不同在于API的调用不同,Hbase1.1.x废弃了很多Hbase0.94.x的api。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从 HBase 抽取数据到本地的作业:(normal 模式) ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "hbase11xreader", "parameter": { "hbaseConfig": { "hbase.zookeeper.quorum": "xxxf" }, "table": "users", "encoding": "utf-8", "mode": "normal", "column": [ { "name": "rowkey", "type": "string" }, { "name": "info: age", "type": "string" }, { "name": "info: birthday", "type": "date", "format":"yyyy-MM-dd" }, { "name": "info: company", "type": "string" }, { "name": "address: contry", "type": "string" }, { "name": "address: province", "type": "string" }, { "name": "address: city", "type": "string" } ], "range": { "startRowkey": "", "endRowkey": "", "isBinaryRowkey": true } } }, "writer": { "name": "txtfilewriter", "parameter": { "path": "/Users/shf/workplace/datax_test/hbase11xreader/result", "fileName": "qiran", "writeMode": "truncate" } } } ] } } ``` * 配置一个从 HBase 抽取数据到本地的作业:( multiVersionFixedColumn 模式) ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "hbase11xreader", "parameter": { "hbaseConfig": { "hbase.zookeeper.quorum": "xxx" }, "table": "users", "encoding": "utf-8", "mode": "multiVersionFixedColumn", "maxVersion": "-1", "column": [ { "name": "rowkey", "type": "string" }, { "name": "info: age", "type": "string" }, { "name": "info: birthday", "type": "date", "format":"yyyy-MM-dd" }, { "name": "info: company", "type": "string" }, { "name": "address: contry", "type": "string" }, { "name": "address: province", "type": "string" }, { "name": "address: city", "type": "string" } ], "range": { "startRowkey": "", "endRowkey": "" } } }, "writer": { "name": "txtfilewriter", "parameter": { "path": "/Users/shf/workplace/datax_test/hbase11xreader/result", "fileName": "qiran", "writeMode": "truncate" } } } ] } } ``` ### 3.2 参数说明 * **hbaseConfig** * 描述:连接HBase集群需要的配置信息,JSON格式。必填的项是hbase.zookeeper.quorum,表示HBase的ZK链接地址。同时可以补充更多HBase client的配置,如:设置scan的cache、batch来优化与服务器的交互。 * 必选:是
* 默认值:无
* **mode** * 描述:读取hbase的模式,支持normal 模式、multiVersionFixedColumn模式,即:normal/multiVersionFixedColumn
* 必选:是
* 默认值:无
* **table** * 描述:要读取的 hbase 表名(大小写敏感)
* 必选:是
* 默认值:无
* **encoding** * 描述:编码方式,UTF-8 或是 GBK,用于对二进制存储的 HBase byte[] 转为 String 时的编码
* 必选:否
* 默认值:UTF-8
* **column** * 描述:要读取的hbase字段,normal 模式与multiVersionFixedColumn 模式下必填项。 (1)、normal 模式下:name指定读取的hbase列,除了rowkey外,必须为 列族:列名 的格式,type指定源数据的类型,format指定日期类型的格式,value指定当前类型为常量,不从hbase读取数据,而是根据value值自动生成对应的列。配置格式如下: ``` "column": [ { "name": "rowkey", "type": "string" }, { "value": "test", "type": "string" } ] ``` normal 模式下,对于用户指定Column信息,type必须填写,name/value必须选择其一。 (2)、multiVersionFixedColumn 模式下:name指定读取的hbase列,除了rowkey外,必须为 列族:列名 的格式,type指定源数据的类型,format指定日期类型的格式 。multiVersionFixedColumn模式下不支持常量列。配置格式如下: ``` "column": [ { "name": "rowkey", "type": "string" }, { "name": "info: age", "type": "string" } ] ``` * 必选:是
* 默认值:无
* **maxVersion** * 描述:指定在多版本模式下的hbasereader读取的版本数,取值只能为-1或者大于1的数字,-1表示读取所有版本
* 必选:multiVersionFixedColumn 模式下必填项
* 默认值:无
* **range** * 描述:指定hbasereader读取的rowkey范围。
startRowkey:指定开始rowkey;
endRowkey指定结束rowkey;
isBinaryRowkey:指定配置的startRowkey和endRowkey转换为byte[]时的方式,默认值为false,若为true,则调用Bytes.toBytesBinary(rowkey)方法进行转换;若为false:则调用Bytes.toBytes(rowkey)
配置格式如下: ``` "range": { "startRowkey": "aaa", "endRowkey": "ccc", "isBinaryRowkey":false } ```
* 必选:否
* 默认值:无
* **scanCacheSize** * 描述:Hbase client每次rpc从服务器端读取的行数
* 必选:否
* 默认值:256
* **scanBatchSize** * 描述:Hbase client每次rpc从服务器端读取的列数
* 必选:否
* 默认值:100
### 3.3 类型转换 下面列出支持的读取HBase数据类型,HbaseReader 针对 HBase 类型转换列表: | DataX 内部类型| HBase 数据类型 | | -------- | ----- | | Long |int, short ,long| | Double |float, double| | String |string,binarystring | | Date |date | | Boolean |boolean | 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 ## 4 性能报告 略 ## 5 约束限制 略 ## 6 FAQ *** ================================================ FILE: hbase11xreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT hbase11xreader hbase11xreader 0.0.1-SNAPSHOT jar 1.1.3 2.5.0 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.apache.hadoop hadoop-hdfs ${hadoop.version} org.apache.hbase hbase-client ${hbase.version} jdk.tools jdk.tools org.apache.hbase hbase-common ${hbase.version} com.alibaba.hbase alihbase-connector 1.0.4 org.apache.hbase hbase-client com.google.guava guava 12.0.1 junit junit test org.mockito mockito-core 2.0.44-beta test com.alibaba.datax datax-core ${datax-project-version} test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hbase11xreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/hbase11xreader target/ hbase11xreader-0.0.1-SNAPSHOT.jar plugin/reader/hbase11xreader false plugin/reader/hbase11xreader/libs runtime ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/ColumnType.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang.StringUtils; import java.util.Arrays; /** * 只对 normal 模式读取时有用,多版本读取时,不存在列类型的 */ public enum ColumnType { BOOLEAN("boolean"), SHORT("short"), INT("int"), LONG("long"), FLOAT("float"), DOUBLE("double"), DATE("date"), STRING("string"), BINARY_STRING("binarystring") ; private String typeName; ColumnType(String typeName) { this.typeName = typeName; } public static ColumnType getByTypeName(String typeName) { if(StringUtils.isBlank(typeName)){ throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, String.format("Hbasereader 不支持该类型:%s, 目前支持的类型是:%s", typeName, Arrays.asList(values()))); } for (ColumnType columnType : values()) { if (StringUtils.equalsIgnoreCase(columnType.typeName, typeName.trim())) { return columnType; } } throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, String.format("Hbasereader 不支持该类型:%s, 目前支持的类型是:%s", typeName, Arrays.asList(values()))); } @Override public String toString() { return this.typeName; } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; public final class Constant { public static final String RANGE = "range"; public static final String ROWKEY_FLAG = "rowkey"; public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String DEFAULT_ENCODING = "UTF-8"; public static final int DEFAULT_SCAN_CACHE_SIZE = 256; public static final int DEFAULT_SCAN_BATCH_SIZE = 100; } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Hbase11xHelper.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.Charset; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * 工具类 * Created by shf on 16/3/7. */ public class Hbase11xHelper { private static final Logger LOG = LoggerFactory.getLogger(Hbase11xHelper.class); public static org.apache.hadoop.hbase.client.Connection getHbaseConnection(String hbaseConfig) { if (StringUtils.isBlank(hbaseConfig)) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.REQUIRED_VALUE, "读 Hbase 时需要配置hbaseConfig,其内容为 Hbase 连接信息,请联系 Hbase PE 获取该信息."); } org.apache.hadoop.conf.Configuration hConfiguration = HBaseConfiguration.create(); try { Map hbaseConfigMap = JSON.parseObject(hbaseConfig, new TypeReference>() {}); // 用户配置的 key-value 对 来表示 hbaseConfig Validate.isTrue(hbaseConfigMap != null && hbaseConfigMap.size() !=0, "hbaseConfig不能为空Map结构!"); for (Map.Entry entry : hbaseConfigMap.entrySet()) { hConfiguration.set(entry.getKey(), entry.getValue()); } } catch (Exception e) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.GET_HBASE_CONNECTION_ERROR, e); } org.apache.hadoop.hbase.client.Connection hConnection = null; try { hConnection = ConnectionFactory.createConnection(hConfiguration); } catch (Exception e) { Hbase11xHelper.closeConnection(hConnection); throw DataXException.asDataXException(Hbase11xReaderErrorCode.GET_HBASE_CONNECTION_ERROR, e); } return hConnection; } public static Table getTable(com.alibaba.datax.common.util.Configuration configuration){ String hbaseConfig = configuration.getString(Key.HBASE_CONFIG); String userTable = configuration.getString(Key.TABLE); org.apache.hadoop.hbase.client.Connection hConnection = Hbase11xHelper.getHbaseConnection(hbaseConfig); TableName hTableName = TableName.valueOf(userTable); org.apache.hadoop.hbase.client.Admin admin = null; org.apache.hadoop.hbase.client.Table hTable = null; try { admin = hConnection.getAdmin(); Hbase11xHelper.checkHbaseTable(admin,hTableName); hTable = hConnection.getTable(hTableName); } catch (Exception e) { Hbase11xHelper.closeTable(hTable); Hbase11xHelper.closeAdmin(admin); Hbase11xHelper.closeConnection(hConnection); throw DataXException.asDataXException(Hbase11xReaderErrorCode.GET_HBASE_TABLE_ERROR, e); } return hTable; } public static RegionLocator getRegionLocator(com.alibaba.datax.common.util.Configuration configuration){ String hbaseConfig = configuration.getString(Key.HBASE_CONFIG); String userTable = configuration.getString(Key.TABLE); org.apache.hadoop.hbase.client.Connection hConnection = Hbase11xHelper.getHbaseConnection(hbaseConfig); TableName hTableName = TableName.valueOf(userTable); org.apache.hadoop.hbase.client.Admin admin = null; RegionLocator regionLocator = null; try { admin = hConnection.getAdmin(); Hbase11xHelper.checkHbaseTable(admin,hTableName); regionLocator = hConnection.getRegionLocator(hTableName); } catch (Exception e) { Hbase11xHelper.closeRegionLocator(regionLocator); Hbase11xHelper.closeAdmin(admin); Hbase11xHelper.closeConnection(hConnection); throw DataXException.asDataXException(Hbase11xReaderErrorCode.GET_HBASE_REGINLOCTOR_ERROR, e); } return regionLocator; } public static void closeConnection(Connection hConnection){ try { if(null != hConnection) hConnection.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.CLOSE_HBASE_CONNECTION_ERROR, e); } } public static void closeAdmin(Admin admin){ try { if(null != admin) admin.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.CLOSE_HBASE_ADMIN_ERROR, e); } } public static void closeTable(Table table){ try { if(null != table) table.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.CLOSE_HBASE_TABLE_ERROR, e); } } public static void closeResultScanner(ResultScanner resultScanner){ if(null != resultScanner) { resultScanner.close(); } } public static void closeRegionLocator(RegionLocator regionLocator){ try { if(null != regionLocator) regionLocator.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.CLOSE_HBASE_REGINLOCTOR_ERROR, e); } } public static void checkHbaseTable(Admin admin, TableName hTableName) throws IOException { if(!admin.tableExists(hTableName)){ throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, "HBase源头表" + hTableName.toString() + "不存在, 请检查您的配置 或者 联系 Hbase 管理员."); } if(!admin.isTableAvailable(hTableName)){ throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, "HBase源头表" +hTableName.toString() + " 不可用, 请检查您的配置 或者 联系 Hbase 管理员."); } if(admin.isTableDisabled(hTableName)){ throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, "HBase源头表" +hTableName.toString() + "is disabled, 请检查您的配置 或者 联系 Hbase 管理员."); } } public static byte[] convertUserStartRowkey(com.alibaba.datax.common.util.Configuration configuration) { String startRowkey = configuration.getString(Key.START_ROWKEY); if (StringUtils.isBlank(startRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } else { boolean isBinaryRowkey = configuration.getBool(Key.IS_BINARY_ROWKEY); return Hbase11xHelper.stringToBytes(startRowkey, isBinaryRowkey); } } public static byte[] convertUserEndRowkey(com.alibaba.datax.common.util.Configuration configuration) { String endRowkey = configuration.getString(Key.END_ROWKEY); if (StringUtils.isBlank(endRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } else { boolean isBinaryRowkey = configuration.getBool(Key.IS_BINARY_ROWKEY); return Hbase11xHelper.stringToBytes(endRowkey, isBinaryRowkey); } } /** * 注意:convertUserStartRowkey 和 convertInnerStartRowkey,前者会受到 isBinaryRowkey 的影响,只用于第一次对用户配置的 String 类型的 rowkey 转为二进制时使用。而后者约定:切分时得到的二进制的 rowkey 回填到配置中时采用 */ public static byte[] convertInnerStartRowkey(Configuration configuration) { String startRowkey = configuration.getString(Key.START_ROWKEY); if (StringUtils.isBlank(startRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } return Bytes.toBytesBinary(startRowkey); } public static byte[] convertInnerEndRowkey(Configuration configuration) { String endRowkey = configuration.getString(Key.END_ROWKEY); if (StringUtils.isBlank(endRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } return Bytes.toBytesBinary(endRowkey); } private static byte[] stringToBytes(String rowkey, boolean isBinaryRowkey) { if (isBinaryRowkey) { return Bytes.toBytesBinary(rowkey); } else { return Bytes.toBytes(rowkey); } } public static boolean isRowkeyColumn(String columnName) { return Constant.ROWKEY_FLAG.equalsIgnoreCase(columnName); } /** * 用于解析 Normal 模式下的列配置 */ public static List parseColumnOfNormalMode(List column) { List hbaseColumnCells = new ArrayList(); HbaseColumnCell oneColumnCell; for (Map aColumn : column) { ColumnType type = ColumnType.getByTypeName(aColumn.get(Key.TYPE)); String columnName = aColumn.get(Key.NAME); String columnValue = aColumn.get(Key.VALUE); String dateformat = aColumn.get(Key.FORMAT); if (type == ColumnType.DATE) { if(dateformat == null){ dateformat = Constant.DEFAULT_DATA_FORMAT; } Validate.isTrue(StringUtils.isNotBlank(columnName) || StringUtils.isNotBlank(columnValue), "Hbasereader 在 normal 方式读取时则要么是 type + name + format 的组合,要么是type + value + format 的组合. 而您的配置非这两种组合,请检查并修改."); oneColumnCell = new HbaseColumnCell .Builder(type) .columnName(columnName) .columnValue(columnValue) .dateformat(dateformat) .build(); } else { Validate.isTrue(StringUtils.isNotBlank(columnName) || StringUtils.isNotBlank(columnValue), "Hbasereader 在 normal 方式读取时,其列配置中,如果类型不是时间,则要么是 type + name 的组合,要么是type + value 的组合. 而您的配置非这两种组合,请检查并修改."); oneColumnCell = new HbaseColumnCell.Builder(type) .columnName(columnName) .columnValue(columnValue) .build(); } hbaseColumnCells.add(oneColumnCell); } return hbaseColumnCells; } //将多竖表column变成>形式 public static HashMap> parseColumnOfMultiversionMode(List column){ HashMap> familyQualifierMap = new HashMap>(); for (Map aColumn : column) { String type = aColumn.get(Key.TYPE); String columnName = aColumn.get(Key.NAME); String dateformat = aColumn.get(Key.FORMAT); ColumnType.getByTypeName(type); Validate.isTrue(StringUtils.isNotBlank(columnName), "Hbasereader 中,column 需要配置列名称name,格式为 列族:列名,您的配置为空,请检查并修改."); String familyQualifier; if( !Hbase11xHelper.isRowkeyColumn(columnName)){ String[] cfAndQualifier = columnName.split(":"); if ( cfAndQualifier.length != 2) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 中,column 的列配置格式应该是:列族:列名. 您配置的列错误:" + columnName); } familyQualifier = StringUtils.join(cfAndQualifier[0].trim(),":",cfAndQualifier[1].trim()); }else{ familyQualifier = columnName.trim(); } HashMap typeAndFormat = new HashMap(); typeAndFormat.put(Key.TYPE,type); typeAndFormat.put(Key.FORMAT,dateformat); familyQualifierMap.put(familyQualifier,typeAndFormat); } return familyQualifierMap; } public static List split(Configuration configuration) { byte[] startRowkeyByte = Hbase11xHelper.convertUserStartRowkey(configuration); byte[] endRowkeyByte = Hbase11xHelper.convertUserEndRowkey(configuration); /* 如果用户配置了 startRowkey 和 endRowkey,需要确保:startRowkey <= endRowkey */ if (startRowkeyByte.length != 0 && endRowkeyByte.length != 0 && Bytes.compareTo(startRowkeyByte, endRowkeyByte) > 0) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 中 startRowkey 不得大于 endRowkey."); } RegionLocator regionLocator = Hbase11xHelper.getRegionLocator(configuration); List resultConfigurations ; try { Pair regionRanges = regionLocator.getStartEndKeys(); if (null == regionRanges) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.SPLIT_ERROR, "获取源头 Hbase 表的 rowkey 范围失败."); } resultConfigurations = Hbase11xHelper.doSplit(configuration, startRowkeyByte, endRowkeyByte, regionRanges); LOG.info("HBaseReader split job into {} tasks.", resultConfigurations.size()); return resultConfigurations; } catch (Exception e) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.SPLIT_ERROR, "切分源头 Hbase 表失败.", e); }finally { Hbase11xHelper.closeRegionLocator(regionLocator); } } private static List doSplit(Configuration config, byte[] startRowkeyByte, byte[] endRowkeyByte, Pair regionRanges) { List configurations = new ArrayList(); for (int i = 0; i < regionRanges.getFirst().length; i++) { byte[] regionStartKey = regionRanges.getFirst()[i]; byte[] regionEndKey = regionRanges.getSecond()[i]; // 当前的region为最后一个region // 如果最后一个region的start Key大于用户指定的userEndKey,则最后一个region,应该不包含在内 // 注意如果用户指定userEndKey为"",则此判断应该不成立。userEndKey为""表示取得最大的region if (Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) == 0 && (endRowkeyByte.length != 0 && (Bytes.compareTo( regionStartKey, endRowkeyByte) > 0))) { continue; } // 如果当前的region不是最后一个region, // 用户配置的userStartKey大于等于region的endkey,则这个region不应该含在内 if ((Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) != 0) && (Bytes.compareTo(startRowkeyByte, regionEndKey) >= 0)) { continue; } // 如果用户配置的userEndKey小于等于 region的startkey,则这个region不应该含在内 // 注意如果用户指定的userEndKey为"",则次判断应该不成立。userEndKey为""表示取得最大的region if (endRowkeyByte.length != 0 && (Bytes.compareTo(endRowkeyByte, regionStartKey) <= 0)) { continue; } Configuration p = config.clone(); String thisStartKey = getStartKey(startRowkeyByte, regionStartKey); String thisEndKey = getEndKey(endRowkeyByte, regionEndKey); p.set(Key.START_ROWKEY, thisStartKey); p.set(Key.END_ROWKEY, thisEndKey); LOG.debug("startRowkey:[{}], endRowkey:[{}] .", thisStartKey, thisEndKey); configurations.add(p); } return configurations; } private static String getEndKey(byte[] endRowkeyByte, byte[] regionEndKey) { if (endRowkeyByte == null) {// 由于之前处理过,所以传入的userStartKey不可能为null throw new IllegalArgumentException("userEndKey should not be null!"); } byte[] tempEndRowkeyByte; if (endRowkeyByte.length == 0) { tempEndRowkeyByte = regionEndKey; } else if (Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) == 0) { // 为最后一个region tempEndRowkeyByte = endRowkeyByte; } else { if (Bytes.compareTo(endRowkeyByte, regionEndKey) > 0) { tempEndRowkeyByte = regionEndKey; } else { tempEndRowkeyByte = endRowkeyByte; } } return Bytes.toStringBinary(tempEndRowkeyByte); } private static String getStartKey(byte[] startRowkeyByte, byte[] regionStarKey) { if (startRowkeyByte == null) {// 由于之前处理过,所以传入的userStartKey不可能为null throw new IllegalArgumentException( "userStartKey should not be null!"); } byte[] tempStartRowkeyByte; if (Bytes.compareTo(startRowkeyByte, regionStarKey) < 0) { tempStartRowkeyByte = regionStarKey; } else { tempStartRowkeyByte = startRowkeyByte; } return Bytes.toStringBinary(tempStartRowkeyByte); } public static void validateParameter(com.alibaba.datax.common.util.Configuration originalConfig) { originalConfig.getNecessaryValue(Key.HBASE_CONFIG, Hbase11xReaderErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.TABLE, Hbase11xReaderErrorCode.REQUIRED_VALUE); Hbase11xHelper.validateMode(originalConfig); //非必选参数处理 String encoding = originalConfig.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); if (!Charset.isSupported(encoding)) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, String.format("Hbasereader 不支持您所配置的编码:[%s]", encoding)); } originalConfig.set(Key.ENCODING, encoding); // 处理 range 的配置 String startRowkey = originalConfig.getString(Constant.RANGE + "." + Key.START_ROWKEY); //此处判断需要谨慎:如果有 key range.startRowkey 但是没有值,得到的 startRowkey 是空字符串,而不是 null if (startRowkey != null && startRowkey.length() != 0) { originalConfig.set(Key.START_ROWKEY, startRowkey); } String endRowkey = originalConfig.getString(Constant.RANGE + "." + Key.END_ROWKEY); //此处判断需要谨慎:如果有 key range.endRowkey 但是没有值,得到的 endRowkey 是空字符串,而不是 null if (endRowkey != null && endRowkey.length() != 0) { originalConfig.set(Key.END_ROWKEY, endRowkey); } Boolean isBinaryRowkey = originalConfig.getBool(Constant.RANGE + "." + Key.IS_BINARY_ROWKEY,false); originalConfig.set(Key.IS_BINARY_ROWKEY, isBinaryRowkey); //scan cache int scanCacheSize = originalConfig.getInt(Key.SCAN_CACHE_SIZE,Constant.DEFAULT_SCAN_CACHE_SIZE); originalConfig.set(Key.SCAN_CACHE_SIZE,scanCacheSize); int scanBatchSize = originalConfig.getInt(Key.SCAN_BATCH_SIZE,Constant.DEFAULT_SCAN_BATCH_SIZE); originalConfig.set(Key.SCAN_BATCH_SIZE,scanBatchSize); } private static String validateMode(com.alibaba.datax.common.util.Configuration originalConfig) { String mode = originalConfig.getNecessaryValue(Key.MODE,Hbase11xReaderErrorCode.REQUIRED_VALUE); List column = originalConfig.getList(Key.COLUMN, Map.class); if (column == null || column.isEmpty()) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.REQUIRED_VALUE, "您配置的column为空,Hbase必须配置 column,其形式为:column:[{\"name\": \"cf0:column0\",\"type\": \"string\"},{\"name\": \"cf1:column1\",\"type\": \"long\"}]"); } ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: { // normal 模式不需要配置 maxVersion,需要配置 column,并且 column 格式为 Map 风格 String maxVersion = originalConfig.getString(Key.MAX_VERSION); Validate.isTrue(maxVersion == null, "您配置的是 normal 模式读取 hbase 中的数据,所以不能配置无关项:maxVersion"); // 通过 parse 进行 column 格式的进一步检查 Hbase11xHelper.parseColumnOfNormalMode(column); break; } case MultiVersionFixedColumn:{ // multiVersionFixedColumn 模式需要配置 maxVersion checkMaxVersion(originalConfig, mode); Hbase11xHelper.parseColumnOfMultiversionMode(column); break; } default: throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, String.format("HbaseReader不支持该 mode 类型:%s", mode)); } return mode; } // 检查 maxVersion 是否存在,并且值是否合法 private static void checkMaxVersion(Configuration configuration, String mode) { Integer maxVersion = configuration.getInt(Key.MAX_VERSION); Validate.notNull(maxVersion, String.format("您配置的是 %s 模式读取 hbase 中的数据,所以必须配置:maxVersion", mode)); boolean isMaxVersionValid = maxVersion == -1 || maxVersion > 1; Validate.isTrue(isMaxVersionValid, String.format("您配置的是 %s 模式读取 hbase 中的数据,但是配置的 maxVersion 值错误. maxVersion规定:-1为读取全部版本,不能配置为0或者1(因为0或者1,我们认为用户是想用 normal 模式读取数据,而非 %s 模式读取,二者差别大),大于1则表示读取最新的对应个数的版本", mode, mode)); } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Hbase11xReader.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; /** * Hbase11xReader * Created by shf on 16/3/7. */ public class Hbase11xReader extends Reader { public static class Job extends Reader.Job { private Configuration originConfig = null; @Override public void init() { this.originConfig = this.getPluginJobConf(); Hbase11xHelper.validateParameter(this.originConfig); } @Override public List split(int adviceNumber) { return Hbase11xHelper.split(this.originConfig); } @Override public void destroy() { } } public static class Task extends Reader.Task { private Configuration taskConfig; private static Logger LOG = LoggerFactory.getLogger(Task.class); private HbaseAbstractTask hbaseTaskProxy; @Override public void init() { this.taskConfig = super.getPluginJobConf(); String mode = this.taskConfig.getString(Key.MODE); ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: this.hbaseTaskProxy = new NormalTask(this.taskConfig); break; case MultiVersionFixedColumn: this.hbaseTaskProxy = new MultiVersionFixedColumnTask(this.taskConfig); break; default: throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 不支持此类模式:" + modeType); } } @Override public void prepare() { try { this.hbaseTaskProxy.prepare(); } catch (Exception e) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.PREPAR_READ_ERROR, e); } } @Override public void startRead(RecordSender recordSender) { Record record = recordSender.createRecord(); boolean fetchOK; while (true) { try { fetchOK = this.hbaseTaskProxy.fetchLine(record); } catch (Exception e) { LOG.info("Exception", e); super.getTaskPluginCollector().collectDirtyRecord(record, e); record = recordSender.createRecord(); continue; } if (fetchOK) { recordSender.sendToWriter(record); record = recordSender.createRecord(); } else { break; } } recordSender.flush(); } @Override public void post() { super.post(); } @Override public void destroy() { if (this.hbaseTaskProxy != null) { this.hbaseTaskProxy.close(); } } } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Hbase11xReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by shf on 16/3/8. */ public enum Hbase11xReaderErrorCode implements ErrorCode { REQUIRED_VALUE("Hbase11xReader-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("Hbase11xReader-01", "您填写的参数值不合法."), PREPAR_READ_ERROR("HbaseReader-02", "准备读取 Hbase 时出错."), SPLIT_ERROR("HbaseReader-03", "切分 Hbase 表时出错."), GET_HBASE_CONNECTION_ERROR("HbaseReader-04", "获取Hbase连接时出错."), GET_HBASE_TABLE_ERROR("HbaseReader-05", "初始化 Hbase 抽取表时出错."), GET_HBASE_REGINLOCTOR_ERROR("HbaseReader-06", "获取 Hbase RegionLocator时出错."), CLOSE_HBASE_CONNECTION_ERROR("HbaseReader-07", "关闭Hbase连接时出错."), CLOSE_HBASE_TABLE_ERROR("HbaseReader-08", "关闭Hbase 抽取表时出错."), CLOSE_HBASE_REGINLOCTOR_ERROR("HbaseReader-09", "关闭 Hbase RegionLocator时出错."), CLOSE_HBASE_ADMIN_ERROR("HbaseReader-10", "关闭 Hbase admin时出错.") ; private final String code; private final String description; private Hbase11xReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/HbaseAbstractTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang.ArrayUtils; import org.apache.commons.lang3.time.DateUtils; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.client.Table; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; public abstract class HbaseAbstractTask { private final static Logger LOG = LoggerFactory.getLogger(HbaseAbstractTask.class); private byte[] startKey = null; private byte[] endKey = null; protected Table htable; protected String encoding; protected int scanCacheSize; protected int scanBatchSize; protected Result lastResult = null; protected Scan scan; protected ResultScanner resultScanner; public HbaseAbstractTask(com.alibaba.datax.common.util.Configuration configuration) { this.htable = Hbase11xHelper.getTable(configuration); this.encoding = configuration.getString(Key.ENCODING,Constant.DEFAULT_ENCODING); this.startKey = Hbase11xHelper.convertInnerStartRowkey(configuration); this.endKey = Hbase11xHelper.convertInnerEndRowkey(configuration); this.scanCacheSize = configuration.getInt(Key.SCAN_CACHE_SIZE,Constant.DEFAULT_SCAN_CACHE_SIZE); this.scanBatchSize = configuration.getInt(Key.SCAN_BATCH_SIZE,Constant.DEFAULT_SCAN_BATCH_SIZE); } public abstract boolean fetchLine(Record record) throws Exception; //不同模式设置不同,如多版本模式需要设置版本 public abstract void initScan(Scan scan); public void prepare() throws Exception { this.scan = new Scan(); this.scan.setSmall(false); this.scan.setStartRow(startKey); this.scan.setStopRow(endKey); LOG.info("The task set startRowkey=[{}], endRowkey=[{}].", Bytes.toStringBinary(this.startKey), Bytes.toStringBinary(this.endKey)); //scan的Caching Batch全部留在hconfig中每次从服务器端读取的行数,设置默认值未256 this.scan.setCaching(this.scanCacheSize); //设置获取记录的列个数,hbase默认无限制,也就是返回所有的列,这里默认是100 this.scan.setBatch(this.scanBatchSize); //为是否缓存块,hbase默认缓存,同步全部数据时非热点数据,因此不需要缓存 this.scan.setCacheBlocks(false); initScan(this.scan); this.resultScanner = this.htable.getScanner(this.scan); } public void close() { Hbase11xHelper.closeResultScanner(this.resultScanner); Hbase11xHelper.closeTable(this.htable); } protected Result getNextHbaseRow() throws IOException { Result result; try { result = resultScanner.next(); } catch (IOException e) { if (lastResult != null) { this.scan.setStartRow(lastResult.getRow()); } resultScanner = this.htable.getScanner(scan); result = resultScanner.next(); if (lastResult != null && Bytes.equals(lastResult.getRow(), result.getRow())) { result = resultScanner.next(); } } lastResult = result; // may be null return result; } public Column convertBytesToAssignType(ColumnType columnType, byte[] byteArray,String dateformat) throws Exception { Column column; switch (columnType) { case BOOLEAN: column = new BoolColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toBoolean(byteArray)); break; case SHORT: column = new LongColumn(ArrayUtils.isEmpty(byteArray) ? null : String.valueOf(Bytes.toShort(byteArray))); break; case INT: column = new LongColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toInt(byteArray)); break; case LONG: column = new LongColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toLong(byteArray)); break; case FLOAT: column = new DoubleColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toFloat(byteArray)); break; case DOUBLE: column = new DoubleColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toDouble(byteArray)); break; case STRING: column = new StringColumn(ArrayUtils.isEmpty(byteArray) ? null : new String(byteArray, encoding)); break; case BINARY_STRING: column = new StringColumn(ArrayUtils.isEmpty(byteArray) ? null : Bytes.toStringBinary(byteArray)); break; case DATE: String dateValue = Bytes.toStringBinary(byteArray); column = new DateColumn(ArrayUtils.isEmpty(byteArray) ? null : DateUtils.parseDate(dateValue, new String[]{dateformat})); break; default: throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 不支持您配置的列类型:" + columnType); } return column; } public Column convertValueToAssignType(ColumnType columnType, String constantValue,String dateformat) throws Exception { Column column; switch (columnType) { case BOOLEAN: column = new BoolColumn(constantValue); break; case SHORT: case INT: case LONG: column = new LongColumn(constantValue); break; case FLOAT: case DOUBLE: column = new DoubleColumn(constantValue); break; case STRING: column = new StringColumn(constantValue); break; case DATE: column = new DateColumn(DateUtils.parseDate(constantValue, new String[]{dateformat})); break; default: throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, "Hbasereader 常量列不支持您配置的列类型:" + columnType); } return column; } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/HbaseColumnCell.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.base.BaseObject; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.hbase.util.Bytes; /** * 描述 hbasereader 插件中,column 配置中的一个单元项实体 */ public class HbaseColumnCell extends BaseObject { private ColumnType columnType; // columnName 格式为:列族:列名 private String columnName; private byte[] columnFamily; private byte[] qualifier; //对于常量类型,其常量值放到 columnValue 里 private String columnValue; //当配置了 columnValue 时,isConstant=true(这个成员变量是用于方便使用本类的地方判断是否是常量类型字段) private boolean isConstant; // 只在类型是时间类型时,才会设置该值,无默认值。形式如:yyyy-MM-dd HH:mm:ss private String dateformat; private HbaseColumnCell(Builder builder) { this.columnType = builder.columnType; //columnName 和 columnValue 必须有一个为 null Validate.isTrue(builder.columnName == null || builder.columnValue == null, "Hbasereader 中,column 不能同时配置 列名称 和 列值,二者选其一."); //columnName 和 columnValue 不能都为 null Validate.isTrue(builder.columnName != null || builder.columnValue != null, "Hbasereader 中,column 需要配置 列名称 或者 列值, 二者选其一."); if (builder.columnName != null) { this.isConstant = false; this.columnName = builder.columnName; // 如果 columnName 不是 rowkey,则必须配置为:列族:列名 格式 if (!Hbase11xHelper.isRowkeyColumn(this.columnName)) { String promptInfo = "Hbasereader 中,column 的列配置格式应该是:列族:列名. 您配置的列错误:" + this.columnName; String[] cfAndQualifier = this.columnName.split(":"); Validate.isTrue(cfAndQualifier != null && cfAndQualifier.length == 2 && StringUtils.isNotBlank(cfAndQualifier[0]) && StringUtils.isNotBlank(cfAndQualifier[1]), promptInfo); this.columnFamily = Bytes.toBytes(cfAndQualifier[0].trim()); this.qualifier = Bytes.toBytes(cfAndQualifier[1].trim()); } } else { this.isConstant = true; this.columnValue = builder.columnValue; } if (builder.dateformat != null) { this.dateformat = builder.dateformat; } } public ColumnType getColumnType() { return columnType; } public String getColumnName() { return columnName; } public byte[] getColumnFamily() { return columnFamily; } public byte[] getQualifier() { return qualifier; } public String getDateformat() { return dateformat; } public String getColumnValue() { return columnValue; } public boolean isConstant() { return isConstant; } // 内部 builder 类 public static class Builder { private ColumnType columnType; private String columnName; private String columnValue; private String dateformat; public Builder(ColumnType columnType) { this.columnType = columnType; } public Builder columnName(String columnName) { this.columnName = columnName; return this; } public Builder columnValue(String columnValue) { this.columnValue = columnValue; return this; } public Builder dateformat(String dateformat) { this.dateformat = dateformat; return this; } public HbaseColumnCell build() { return new HbaseColumnCell(this); } } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; public final class Key { public final static String HBASE_CONFIG = "hbaseConfig"; public final static String TABLE = "table"; /** * mode 可以取 normal 或者 multiVersionFixedColumn 或者 multiVersionDynamicColumn 三个值,无默认值。 *

* normal 配合 column(Map 结构的)使用 */ public final static String MODE = "mode"; /** * 配合 mode = multiVersion 时使用,指明需要读取的版本个数。无默认值 * -1 表示去读全部版本 * 不能为0,1 * >1 表示最多读取对应个数的版本数(不能超过 Integer 的最大值) */ public final static String MAX_VERSION = "maxVersion"; /** * 默认为 utf8 */ public final static String ENCODING = "encoding"; public final static String COLUMN = "column"; public final static String COLUMN_FAMILY = "columnFamily"; public static final String NAME = "name"; public static final String TYPE = "type"; public static final String FORMAT = "format"; public static final String VALUE = "value"; public final static String START_ROWKEY = "startRowkey"; public final static String END_ROWKEY = "endRowkey"; public final static String IS_BINARY_ROWKEY = "isBinaryRowkey"; public final static String SCAN_CACHE_SIZE = "scanCacheSize"; public final static String SCAN_BATCH_SIZE = "scanBatchSize"; } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/ModeType.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; public enum ModeType { Normal("normal"), MultiVersionFixedColumn("multiVersionFixedColumn") ; private String mode; ModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static ModeType getByTypeName(String modeName) { for (ModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Hbase11xReaderErrorCode.ILLEGAL_VALUE, String.format("HbaseReader 不支持该 mode 类型:%s, 目前支持的 mode 类型是:%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/MultiVersionDynamicColumnTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import java.util.List; public class MultiVersionDynamicColumnTask extends MultiVersionTask { private List columnFamilies = null; public MultiVersionDynamicColumnTask(Configuration configuration){ super(configuration); this.columnFamilies = configuration.getList(Key.COLUMN_FAMILY, String.class); } @Override public void initScan(Scan scan) { for (String columnFamily : columnFamilies) { scan.addFamily(Bytes.toBytes(columnFamily.trim())); } super.setMaxVersions(scan); } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/MultiVersionFixedColumnTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import java.util.List; import java.util.Map; public class MultiVersionFixedColumnTask extends MultiVersionTask { public MultiVersionFixedColumnTask(Configuration configuration) { super(configuration); } @Override public void initScan(Scan scan) { for (Map aColumn : column) { String columnName = aColumn.get(Key.NAME); if(!Hbase11xHelper.isRowkeyColumn(columnName)){ String[] cfAndQualifier = columnName.split(":"); scan.addColumn(Bytes.toBytes(cfAndQualifier[0].trim()), Bytes.toBytes(cfAndQualifier[1].trim())); } } super.setMaxVersions(scan); } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/MultiVersionTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.hbase.Cell; import org.apache.hadoop.hbase.CellUtil; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import java.io.UnsupportedEncodingException; import java.util.HashMap; import java.util.List; import java.util.Map; public abstract class MultiVersionTask extends HbaseAbstractTask { private static byte[] COLON_BYTE; private int maxVersion; private Cell cellArr[] = null; private int currentReadPosition = 0; public List column; private HashMap> familyQualifierMap = null; public MultiVersionTask(Configuration configuration) { super(configuration); this.maxVersion = configuration.getInt(Key.MAX_VERSION); this.column = configuration.getList(Key.COLUMN, Map.class); this.familyQualifierMap = Hbase11xHelper.parseColumnOfMultiversionMode(this.column); try { MultiVersionTask.COLON_BYTE = ":".getBytes("utf8"); } catch (UnsupportedEncodingException e) { throw DataXException.asDataXException(Hbase11xReaderErrorCode.PREPAR_READ_ERROR, "系统内部获取 列族与列名冒号分隔符的二进制时失败.", e); } } @Override public boolean fetchLine(Record record) throws Exception { Result result; if (this.cellArr == null || this.cellArr.length == this.currentReadPosition) { result = super.getNextHbaseRow(); if (result == null) { return false; } super.lastResult = result; this.cellArr = result.rawCells(); if(this.cellArr == null || this.cellArr.length ==0){ return false; } this.currentReadPosition = 0; } try { Cell cell = this.cellArr[this.currentReadPosition]; convertCellToLine(cell, record); } catch (Exception e) { throw e; } finally { this.currentReadPosition++; } return true; } private void convertCellToLine(Cell cell, Record record) throws Exception { byte[] rawRowkey = CellUtil.cloneRow(cell); long timestamp = cell.getTimestamp(); byte[] cfAndQualifierName = Bytes.add(CellUtil.cloneFamily(cell), MultiVersionTask.COLON_BYTE, CellUtil.cloneQualifier(cell)); byte[] columnValue = CellUtil.cloneValue(cell); ColumnType rawRowkeyType = ColumnType.getByTypeName(familyQualifierMap.get(Constant.ROWKEY_FLAG).get(Key.TYPE)); String familyQualifier = new String(cfAndQualifierName, Constant.DEFAULT_ENCODING); ColumnType columnValueType = ColumnType.getByTypeName(familyQualifierMap.get(familyQualifier).get(Key.TYPE)); String columnValueFormat = familyQualifierMap.get(familyQualifier).get(Key.FORMAT); if(StringUtils.isBlank(columnValueFormat)){ columnValueFormat = Constant.DEFAULT_DATA_FORMAT; } record.addColumn(convertBytesToAssignType(rawRowkeyType, rawRowkey, columnValueFormat)); record.addColumn(convertBytesToAssignType(ColumnType.STRING, cfAndQualifierName, columnValueFormat)); // 直接忽略了用户配置的 timestamp 的类型 record.addColumn(new LongColumn(timestamp)); record.addColumn(convertBytesToAssignType(columnValueType, columnValue, columnValueFormat)); } public void setMaxVersions(Scan scan) { if (this.maxVersion == -1 || this.maxVersion == Integer.MAX_VALUE) { scan.setMaxVersions(); } else { scan.setMaxVersions(this.maxVersion); } } } ================================================ FILE: hbase11xreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xreader/NormalTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xreader; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import java.util.List; import java.util.Map; public class NormalTask extends HbaseAbstractTask { private List column; private List hbaseColumnCells; public NormalTask(Configuration configuration) { super(configuration); this.column = configuration.getList(Key.COLUMN, Map.class); this.hbaseColumnCells = Hbase11xHelper.parseColumnOfNormalMode(this.column); } /** * normal模式下将用户配置的column 设置到scan中 */ @Override public void initScan(Scan scan) { boolean isConstant; boolean isRowkeyColumn; for (HbaseColumnCell cell : this.hbaseColumnCells) { isConstant = cell.isConstant(); isRowkeyColumn = Hbase11xHelper.isRowkeyColumn(cell.getColumnName()); if (!isConstant && !isRowkeyColumn) { this.scan.addColumn(cell.getColumnFamily(), cell.getQualifier()); } } } @Override public boolean fetchLine(Record record) throws Exception { Result result = super.getNextHbaseRow(); if (null == result) { return false; } super.lastResult = result; try { byte[] hbaseColumnValue; String columnName; ColumnType columnType; byte[] columnFamily; byte[] qualifier; for (HbaseColumnCell cell : this.hbaseColumnCells) { columnType = cell.getColumnType(); if (cell.isConstant()) { // 对常量字段的处理 String constantValue = cell.getColumnValue(); Column constantColumn = super.convertValueToAssignType(columnType,constantValue,cell.getDateformat()); record.addColumn(constantColumn); } else { // 根据列名称获取值 columnName = cell.getColumnName(); if (Hbase11xHelper.isRowkeyColumn(columnName)) { hbaseColumnValue = result.getRow(); } else { columnFamily = cell.getColumnFamily(); qualifier = cell.getQualifier(); hbaseColumnValue = result.getValue(columnFamily, qualifier); } Column hbaseColumn = super.convertBytesToAssignType(columnType,hbaseColumnValue,cell.getDateformat()); record.addColumn(hbaseColumn); } } } catch (Exception e) { // 注意,这里catch的异常,期望是byte数组转换失败的情况。而实际上,string的byte数组,转成整数类型是不容易报错的。但是转成double类型容易报错。 record.setColumn(0, new StringColumn(Bytes.toStringBinary(result.getRow()))); throw e; } return true; } } ================================================ FILE: hbase11xreader/src/main/resources/plugin.json ================================================ { "name": "hbase11xreader", "class": "com.alibaba.datax.plugin.reader.hbase11xreader.Hbase11xReader", "description": "useScene: prod. mechanism: Scan to read data.", "developer": "alibaba" } ================================================ FILE: hbase11xreader/src/main/resources/plugin_job_template.json ================================================ { "name": "hbase11xreader", "parameter": { "hbaseConfig": {}, "table": "", "encoding": "", "mode": "", "column": [], "range": { "startRowkey": "", "endRowkey": "", "isBinaryRowkey": true } } } ================================================ FILE: hbase11xsqlreader/doc/hbase11xsqlreader.md ================================================ # hbase11xsqlreader 插件文档 ___ ## 1 快速介绍 hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实现上,hbase11xsqlreader通过Phoenix客户端去连接远程的HBase集群,并执行相应的sql语句将数据从Phoenix库中SELECT出来。 ## 2 实现原理 简而言之,hbase11xsqlreader通过Phoenix客户端去连接远程的HBase集群,并根据用户配置的信息生成查询SELECT 语句,然后发送到HBase集群,并将返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 # hbase11xsqlreader 插件文档 ___ ## 1 快速介绍 hbase11xsqlreader插件实现了从Phoenix(HBase SQL)读取数据。在底层实现上,hbase11xsqlreader通过Phoenix客户端去连接远程的HBase集群,并执行相应的sql语句将数据从Phoenix库中SELECT出来。 ## 2 实现原理 简而言之,hbase11xsqlreader通过Phoenix客户端去连接远程的HBase集群,并根据用户配置的信息生成查询SELECT 语句,然后发送到HBase集群,并将返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Phoenix同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte":10485760 }, //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { //指定插件为hbase11xsqlreader "name": "hbase11xsqlreader", "parameter": { //填写连接Phoenix的hbase集群zk地址 "hbaseConfig": { "hbase.zookeeper.quorum": "hb-proxy-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-xxx-003.hbase.rds.aliyuncs.com" }, //填写要读取的phoenix的命名空间 "schema": "TAG", //填写要读取的phoenix的表名 "table": "US_POPULATION", //填写要读取的列名,不填读取所有列 "column": [ ], //查询条件 "where": "id=" } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print":true, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **hbaseConfig** * 描述:hbase11xsqlreader需要通过Phoenix客户端去连接hbase集群,因此这里需要填写对应hbase集群的zkurl地址,注意不要添加2181。 * 必选:是
* 默认值:无
* **schema** * 描述:编写Phoenix中的namespace,该值设置为'' * 必选:是
* 默认值:无
* **table** * 描述:编写Phoenix中的表名,该值设置为'tablename' * 必选:是
* 默认值:无
* **column** * 描述:填写需要从phoenix表中读取的列名集合,使用JSON的数组描述字段信息,空值表示读取所有列。 * 必选:是
* 默认值:无
* **where** * 描述:填写需要从phoenix表中读取条件判断。 * 可选:是
* 默认值:无
### 3.3 类型转换 目前hbase11xsqlreader支持大部分Phoenix类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出MysqlReader针对Mysql类型转换列表: | DataX 内部类型| Phoenix 数据类型 | | -------- | ----- | | String |CHAR, VARCHAR| | Bytes |BINARY, VARBINARY| | Bool |BOOLEAN | | Long |INTEGER, TINYINT, SMALLINT, BIGINT | | Double |FLOAT, DECIMAL, DOUBLE, | | Date |DATE, TIME, TIMESTAMP | ## 4 性能报告 略 ## 5 约束限制 略 ## 6 FAQ *** ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Phoenix同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte":10485760 }, //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { //指定插件为hbase11xsqlreader "name": "hbase11xsqlreader", "parameter": { //填写连接Phoenix的hbase集群zk地址 "hbaseConfig": { "hbase.zookeeper.quorum": "hb-proxy-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-xxx-003.hbase.rds.aliyuncs.com" }, "schema": "TAG", //填写要读取的phoenix的表名 "table": "US_POPULATION", //填写要读取的列名,不填读取所有列 "column": [ ], //查询条件 "where": "id=" } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print":true, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **hbaseConfig** * 描述:hbase11xsqlreader需要通过Phoenix客户端去连接hbase集群,因此这里需要填写对应hbase集群的zkurl地址,注意不要添加2181。 * 必选:是
* 默认值:无
* **schema** * 描述:编写Phoenix中的namespace,该值设置为'' * 必选:是
* 默认值:无
* **table** * 描述:编写Phoenix中的表名,如果有namespace,该值设置为'namespace.tablename' * 必选:是
* 默认值:无
* **column** * 描述:填写需要从phoenix表中读取的列名集合,使用JSON的数组描述字段信息,空值表示读取所有列。 * 必选:是
* 默认值:无
* **where** * 描述:填写需要从phoenix表中读取条件判断。 * 可选:是
* 默认值:无
### 3.3 类型转换 目前hbase11xsqlreader支持大部分Phoenix类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出MysqlReader针对Mysql类型转换列表: | DataX 内部类型| Phoenix 数据类型 | | -------- | ----- | | String |CHAR, VARCHAR| | Bytes |BINARY, VARBINARY| | Bool |BOOLEAN | | Long |INTEGER, TINYINT, SMALLINT, BIGINT | | Double |FLOAT, DECIMAL, DOUBLE, | | Date |DATE, TIME, TIMESTAMP | ## 4 性能报告 略 ## 5 约束限制 略 ## 6 FAQ *** ================================================ FILE: hbase11xsqlreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT hbase11xsqlreader hbase11xsqlreader 0.0.1-SNAPSHOT jar 4.12.0-AliHBase-1.1-0.5 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.aliyun.phoenix ali-phoenix-core ${phoenix.version} servlet-api javax.servlet jdk.tools jdk.tools junit junit test org.mockito mockito-core 2.0.44-beta test com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-service-face test src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hbase11xsqlreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/hbase11xsqlreader target/ hbase11xsqlreader-0.0.1-SNAPSHOT.jar plugin/reader/hbase11xsqlreader false plugin/reader/hbase11xsqlreader/libs runtime ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HadoopSerializationUtil.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.DataInputStream; import java.io.DataOutputStream; import java.io.IOException; import org.apache.hadoop.io.Writable; public class HadoopSerializationUtil { public static byte[] serialize(Writable writable) throws IOException { ByteArrayOutputStream out = new ByteArrayOutputStream(); DataOutputStream dataout = new DataOutputStream(out); writable.write(dataout); dataout.close(); return out.toByteArray(); } public static void deserialize(Writable writable, byte[] bytes) throws Exception { ByteArrayInputStream in = new ByteArrayInputStream(bytes); DataInputStream datain = new DataInputStream(in); writable.readFields(datain); datain.close(); } } ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelper.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.util.Pair; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.JobID; import org.apache.hadoop.mapreduce.task.JobContextImpl; import org.apache.phoenix.jdbc.PhoenixConnection; import org.apache.phoenix.jdbc.PhoenixEmbeddedDriver; import org.apache.phoenix.mapreduce.PhoenixInputFormat; import org.apache.phoenix.mapreduce.PhoenixInputSplit; import org.apache.phoenix.mapreduce.PhoenixRecordWritable; import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil; import org.apache.phoenix.schema.MetaDataClient; import org.apache.phoenix.schema.PColumn; import org.apache.phoenix.schema.PTable; import org.apache.phoenix.schema.SaltingUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.util.*; public class HbaseSQLHelper { private static final Logger LOG = LoggerFactory.getLogger(HbaseSQLHelper.class); static { try { Class.forName("org.apache.phoenix.jdbc.PhoenixDriver"); } catch (Throwable t) { throw new RuntimeException("faild load org.apache.phoenix.jdbc.PhoenixDriver", t); } } public static org.apache.hadoop.conf.Configuration generatePhoenixConf(HbaseSQLReaderConfig readerConfig) { org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration(); String table = readerConfig.getTableName(); List columns = readerConfig.getColumns(); String zkUrl = readerConfig.getZkUrl(); PhoenixConfigurationUtil.setInputClass(conf, PhoenixRecordWritable.class); PhoenixConfigurationUtil.setInputTableName(conf, readerConfig.getSchema()+"."+table); if (!columns.isEmpty()) { PhoenixConfigurationUtil.setSelectColumnNames(conf, columns.toArray(new String[columns.size()])); } if(Objects.nonNull(readerConfig.getWhere())){ PhoenixConfigurationUtil.setInputTableConditions(conf,readerConfig.getWhere()); } PhoenixEmbeddedDriver.ConnectionInfo info = null; try { info = PhoenixEmbeddedDriver.ConnectionInfo.create(zkUrl); } catch (SQLException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.GET_PHOENIX_CONNECTIONINFO_ERROR, "通过zkURL获取phoenix的connectioninfo出错,请检查hbase集群服务是否正常", e); } conf.set(HConstants.ZOOKEEPER_QUORUM, info.getZookeeperQuorum()); if (info.getPort() != null) conf.setInt(HConstants.ZOOKEEPER_CLIENT_PORT, info.getPort()); if (info.getRootNode() != null) conf.set(HConstants.ZOOKEEPER_ZNODE_PARENT, info.getRootNode()); conf.set(Key.NAME_SPACE_MAPPING_ENABLED,"true"); conf.set(Key.SYSTEM_TABLES_TO_NAMESPACE,"true"); return conf; } public static List getPColumnNames(String connectionString, String tableName,String schema) throws SQLException { Properties pro = new Properties(); pro.put(Key.NAME_SPACE_MAPPING_ENABLED, true); pro.put(Key.SYSTEM_TABLES_TO_NAMESPACE, true); Connection con = DriverManager.getConnection(connectionString,pro); PhoenixConnection phoenixConnection = con.unwrap(PhoenixConnection.class); MetaDataClient metaDataClient = new MetaDataClient(phoenixConnection); PTable table = metaDataClient.updateCache(schema, tableName).getTable(); List columnNames = new ArrayList(); for (PColumn pColumn : table.getColumns()) { if (!pColumn.getName().getString().equals(SaltingUtil.SALTING_COLUMN_NAME)) columnNames.add(pColumn.getName().getString()); else LOG.info(tableName + " is salt table"); } return columnNames; } public static List split(HbaseSQLReaderConfig readerConfig) { PhoenixInputFormat inputFormat = new PhoenixInputFormat(); org.apache.hadoop.conf.Configuration conf = generatePhoenixConf(readerConfig); JobID jobId = new JobID(Key.MOCK_JOBID_IDENTIFIER, Key.MOCK_JOBID); JobContextImpl jobContext = new JobContextImpl(conf, jobId); List resultConfigurations = new ArrayList(); List rawSplits = null; try { rawSplits = inputFormat.getSplits(jobContext); LOG.info("split size is " + rawSplits.size()); for (InputSplit split : rawSplits) { Configuration cfg = readerConfig.getOriginalConfig().clone(); byte[] splitSer = HadoopSerializationUtil.serialize((PhoenixInputSplit) split); String splitBase64Str = org.apache.commons.codec.binary.Base64.encodeBase64String(splitSer); cfg.set(Key.SPLIT_KEY, splitBase64Str); resultConfigurations.add(cfg); } } catch (IOException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.GET_PHOENIX_SPLITS_ERROR, "获取表的split信息时出现了异常,请检查hbase集群服务是否正常," + e.getMessage(), e); } catch (InterruptedException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.GET_PHOENIX_SPLITS_ERROR, "获取表的split信息时被中断,请重试,若还有问题请联系datax管理员," + e.getMessage(), e); } return resultConfigurations; } public static HbaseSQLReaderConfig parseConfig(Configuration cfg) { return HbaseSQLReaderConfig.parse(cfg); } public static Pair getHbaseConfig(String hbaseCfgString) { assert hbaseCfgString != null; Map hbaseConfigMap = JSON.parseObject(hbaseCfgString, new TypeReference>() { }); String zkQuorum = hbaseConfigMap.get(Key.HBASE_ZK_QUORUM); String znode = hbaseConfigMap.get(Key.HBASE_ZNODE_PARENT); if(znode == null) znode = ""; return new Pair(zkQuorum, znode); } } ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReader.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class HbaseSQLReader extends Reader { public static class Job extends Reader.Job { private HbaseSQLReaderConfig readerConfig; @Override public void init() { readerConfig = HbaseSQLHelper.parseConfig(this.getPluginJobConf()); } @Override public List split(int adviceNumber) { return HbaseSQLHelper.split(readerConfig); } @Override public void destroy() { } } public static class Task extends Reader.Task { private static Logger LOG = LoggerFactory.getLogger(Task.class); private HbaseSQLReaderTask hbase11SQLReaderTask; @Override public void init() { hbase11SQLReaderTask = new HbaseSQLReaderTask(this.getPluginJobConf()); this.hbase11SQLReaderTask.init(); } @Override public void prepare() { hbase11SQLReaderTask.prepare(); } @Override public void startRead(RecordSender recordSender) { Long recordNum = 0L; Record record = recordSender.createRecord(); boolean fetchOK; while (true) { try { fetchOK = this.hbase11SQLReaderTask.readRecord(record); } catch (Exception e) { LOG.info("Read record exception", e); e.printStackTrace(); super.getTaskPluginCollector().collectDirtyRecord(record, e); record = recordSender.createRecord(); continue; } if (fetchOK) { recordSender.sendToWriter(record); recordNum++; if (recordNum % 10000 == 0) LOG.info("already read record num is " + recordNum); record = recordSender.createRecord(); } else { break; } } recordSender.flush(); } @Override public void post() { super.post(); } @Override public void destroy() { this.hbase11SQLReaderTask.destroy(); } } } ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderConfig.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.hbase.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.SQLException; import java.util.List; import java.util.StringJoiner; public class HbaseSQLReaderConfig { private final static Logger LOG = LoggerFactory.getLogger(HbaseSQLReaderConfig.class); private Configuration originalConfig; // 原始的配置数据 // 集群配置 private String connectionString; public String getZkUrl() { return zkUrl; } private String zkUrl; // 表配置 private String tableName; private List columns; // 目的表的所有列的列名,包括主键和非主键,不包括时间列 private String where;//条件 private String schema;// /** * @return 获取原始的datax配置 */ public Configuration getOriginalConfig() { return originalConfig; } /** * @return 获取连接字符串,使用ZK模式 */ public String getConnectionString() { return connectionString; } /** * @return 获取表名 */ public String getTableName() { return tableName; } /** * @return 返回所有的列,包括主键列和非主键列,但不包括version列 */ public List getColumns() { return columns; } /** * @param dataxCfg * @return */ public static HbaseSQLReaderConfig parse(Configuration dataxCfg) { assert dataxCfg != null; HbaseSQLReaderConfig cfg = new HbaseSQLReaderConfig(); cfg.originalConfig = dataxCfg; // 1. 解析集群配置 parseClusterConfig(cfg, dataxCfg); // 2. 解析列配置 parseTableConfig(cfg, dataxCfg); // 4. 打印解析出来的配置 LOG.info("HBase SQL reader config parsed:" + cfg.toString()); return cfg; } private static void parseClusterConfig(HbaseSQLReaderConfig cfg, Configuration dataxCfg) { // 获取hbase集群的连接信息字符串 String hbaseCfg = dataxCfg.getString(Key.HBASE_CONFIG); if (StringUtils.isBlank(hbaseCfg)) { // 集群配置必须存在且不为空 throw DataXException.asDataXException( HbaseSQLReaderErrorCode.REQUIRED_VALUE, "读 Hbase 时需要配置hbaseConfig,其内容为 Hbase 连接信息,请查看 Hbase 集群信息."); } // 解析zk服务器和znode信息 Pair zkCfg; try { zkCfg = HbaseSQLHelper.getHbaseConfig(hbaseCfg); } catch (Throwable t) { // 解析hbase配置错误 throw DataXException.asDataXException( HbaseSQLReaderErrorCode.REQUIRED_VALUE, "解析hbaseConfig出错,请确认您配置的hbaseConfig为合法的json数据格式,内容正确." ); } String zkQuorum = zkCfg.getFirst(); String znode = zkCfg.getSecond(); if (zkQuorum == null || zkQuorum.isEmpty()) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "HBase的hbase.zookeeper.quorum配置不能为空" ); } // 生成sql使用的连接字符串, 格式: jdbc:hbase:zk_quorum:2181:/znode_parent StringBuilder connectionString=new StringBuilder("jdbc:phoenix:"); connectionString.append(zkQuorum); cfg.connectionString = connectionString.toString(); StringBuilder zkUrl =new StringBuilder(zkQuorum); cfg.zkUrl = zkUrl.append(":2181").toString(); if (!znode.isEmpty()) { cfg.connectionString = connectionString.append(":").append(znode).toString(); cfg.zkUrl=zkUrl.append(":").append(znode).toString(); } } private static void parseTableConfig(HbaseSQLReaderConfig cfg, Configuration dataxCfg) { // 解析并检查表名 cfg.tableName = dataxCfg.getString(Key.TABLE); cfg.schema = dataxCfg.getString(Key.SCHEMA); if (cfg.tableName == null || cfg.tableName.isEmpty()) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "HBase的tableName配置不能为空,请检查并修改配置." ); } // 解析列配置,列为空时,补全所有的列 cfg.columns = dataxCfg.getList(Key.COLUMN, String.class); if (cfg.columns == null) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.ILLEGAL_VALUE, "您配置的tableName含有非法字符{0},请检查您的配置."); } else if (cfg.columns.isEmpty()) { try { cfg.columns = HbaseSQLHelper.getPColumnNames(cfg.connectionString, cfg.tableName,cfg.schema); dataxCfg.set(Key.COLUMN, cfg.columns); } catch (SQLException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.GET_PHOENIX_COLUMN_ERROR, "HBase的columns配置不能为空,请添加目标表的列名配置." + e.getMessage(), e); } } cfg.where=dataxCfg.getString(Key.WHERE); } @Override public String toString() { StringBuilder ret = new StringBuilder(); // 集群配置 ret.append("\n[jdbc]"); ret.append(connectionString); ret.append("\n"); // 表配置 ret.append("[tableName]"); ret.append(tableName); ret.append("\n"); ret.append("[column]"); for (String col : columns) { ret.append(col); ret.append(","); } ret.setLength(ret.length() - 1); ret.append("[where=]").append(getWhere()); ret.append("[schema=]").append(getSchema()); ret.append("\n"); return ret.toString(); } /** * 禁止直接实例化本类,必须调用{@link #parse}接口来初始化 */ private HbaseSQLReaderConfig() { } public String getWhere() { return where; } public void setWhere(String where) { this.where = where; } public String getSchema() { return schema; } public void setSchema(String schema) { this.schema = schema; } } ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import com.alibaba.datax.common.spi.ErrorCode; public enum HbaseSQLReaderErrorCode implements ErrorCode { REQUIRED_VALUE("Hbasewriter-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("Hbasewriter-01", "您填写的参数值不合法."), GET_PHOENIX_COLUMN_ERROR("Hbasewriter-02", "获取phoenix表的列值错误"), GET_PHOENIX_CONNECTIONINFO_ERROR("Hbasewriter-03", "获取phoenix服务的zkurl错误"), GET_PHOENIX_SPLITS_ERROR("Hbasewriter-04", "获取phoenix的split信息错误"), PHOENIX_CREATEREADER_ERROR("Hbasewriter-05", "获取phoenix的reader错误"), PHOENIX_READERINIT_ERROR("Hbasewriter-06", "phoenix reader的初始化错误"), PHOENIX_COLUMN_TYPE_CONVERT_ERROR("Hbasewriter-07", "phoenix的列类型转换错误"), PHOENIX_RECORD_READ_ERROR("Hbasewriter-08", "phoenix record 读取错误"), PHOENIX_READER_CLOSE_ERROR("Hbasewriter-09", "phoenix reader 的close错误") ; private final String code; private final String description; private HbaseSQLReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.mapreduce.TaskAttemptID; import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl; import org.apache.phoenix.jdbc.PhoenixConnection; import org.apache.phoenix.mapreduce.PhoenixInputFormat; import org.apache.phoenix.mapreduce.PhoenixInputSplit; import org.apache.phoenix.mapreduce.PhoenixRecordReader; import org.apache.phoenix.mapreduce.PhoenixRecordWritable; import org.apache.phoenix.schema.MetaDataClient; import org.apache.phoenix.schema.PColumn; import org.apache.phoenix.schema.PTable; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.math.BigDecimal; import java.sql.*; import java.sql.Date; import java.util.*; /** * Created by admin on 1/3/18. */ public class HbaseSQLReaderTask { private static Logger LOG = LoggerFactory.getLogger(HbaseSQLReaderTask.class); private PhoenixInputFormat phoenixInputFormat; PhoenixInputSplit phoenixInputSplit; private PhoenixRecordReader phoenixRecordReader; private Map pColumns; private HbaseSQLReaderConfig readerConfig; private TaskAttemptContextImpl hadoopAttemptContext; public HbaseSQLReaderTask(Configuration config) { this.readerConfig = HbaseSQLHelper.parseConfig(config); pColumns = new LinkedHashMap(); } private void getPColumns() throws SQLException { Properties pro = new Properties(); pro.put(Key.NAME_SPACE_MAPPING_ENABLED, true); pro.put(Key.SYSTEM_TABLES_TO_NAMESPACE, true); Connection con = DriverManager.getConnection(this.readerConfig.getConnectionString(),pro); PhoenixConnection phoenixConnection = con.unwrap(PhoenixConnection.class); MetaDataClient metaDataClient = new MetaDataClient(phoenixConnection); PTable table = metaDataClient.updateCache(this.readerConfig.getSchema(), this.readerConfig.getTableName()).getTable(); List columnNames = this.readerConfig.getColumns(); for (PColumn pColumn : table.getColumns()) { if (columnNames.contains(pColumn.getName().getString())) { pColumns.put(pColumn.getName().getString(), pColumn); } } } public void init() { LOG.info("reader table info: " + this.readerConfig.toString()); try { this.getPColumns(); } catch (SQLException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.GET_PHOENIX_CONNECTIONINFO_ERROR, "获取表的列出问题,重试,若还有问题请检查hbase集群状态,"+ e.getMessage()); } this.phoenixInputFormat = new PhoenixInputFormat(); String splitBase64Str = this.readerConfig.getOriginalConfig().getString(Key.SPLIT_KEY); byte[] splitBytes = org.apache.commons.codec.binary.Base64.decodeBase64(splitBase64Str); TaskAttemptID attemptId = new TaskAttemptID(); org.apache.hadoop.conf.Configuration conf = HbaseSQLHelper.generatePhoenixConf(this.readerConfig); this.hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId); this.phoenixInputSplit = new PhoenixInputSplit(); try { HadoopSerializationUtil.deserialize(phoenixInputSplit, splitBytes); this.phoenixRecordReader = (PhoenixRecordReader) phoenixInputFormat.createRecordReader(phoenixInputSplit, hadoopAttemptContext); } catch (Exception e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.PHOENIX_CREATEREADER_ERROR, "创建phoenix的reader出现问题,请重试,若还有问题请检查hbase集群状态," + e.getMessage()); } } public void prepare() { try { this.phoenixRecordReader.initialize(this.phoenixInputSplit, hadoopAttemptContext); } catch (IOException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.PHOENIX_READERINIT_ERROR, "phoenix的reader初始化出现问题,请重试,若还有问题请检查hbase集群状态" + e.getMessage()); } catch (InterruptedException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.PHOENIX_READERINIT_ERROR, "phoenix的reader初始化被中断,请重试," + e.getMessage()); } } private Column convertPhoenixValueToDataxColumn(int sqlType, Object value) throws IOException { Column column; switch (sqlType) { case Types.CHAR: case Types.VARCHAR: column = new StringColumn((String) value); break; case Types.BINARY: case Types.VARBINARY: column = new BytesColumn((byte[]) value); break; case Types.BOOLEAN: column = new BoolColumn((Boolean) value); break; case Types.INTEGER: column = new LongColumn((Integer) value); break; case Types.TINYINT: column = new LongColumn(((Byte) value).longValue()); break; case Types.SMALLINT: column = new LongColumn(((Short) value).longValue()); break; case Types.BIGINT: column = new LongColumn((Long) value); break; case Types.FLOAT: column = new DoubleColumn(((Float) value).doubleValue()); break; case Types.DECIMAL: column = new DoubleColumn(((BigDecimal) value)); break; case Types.DOUBLE: column = new DoubleColumn((Double) value); break; case Types.DATE: column = new DateColumn((Date) value); break; case Types.TIME: column = new DateColumn((Time) value); break; case Types.TIMESTAMP: column = new DateColumn((Timestamp) value); break; default: throw DataXException.asDataXException( HbaseSQLReaderErrorCode.PHOENIX_COLUMN_TYPE_CONVERT_ERROR, "遇到不可识别的phoenix类型," + "sqlType :" + sqlType); } return column; } private void constructRecordFromPhoenix(Record record, Map phoenixRecord) throws IOException { for (Map.Entry pColumnItem : this.pColumns.entrySet()) { Column column = this.convertPhoenixValueToDataxColumn( pColumnItem.getValue().getDataType().getSqlType(), phoenixRecord.get(pColumnItem.getKey())); record.addColumn(column); } } public boolean readRecord(Record record) throws IOException, InterruptedException { boolean hasNext = false; hasNext = this.phoenixRecordReader.nextKeyValue(); if (!hasNext) return hasNext; PhoenixRecordWritable phoenixRecordWritable = (PhoenixRecordWritable) this.phoenixRecordReader.getCurrentValue(); Map phoenixRecord = phoenixRecordWritable.getResultMap(); this.constructRecordFromPhoenix(record, phoenixRecord); return hasNext; } public void destroy() { if (this.phoenixRecordReader != null) { try { this.phoenixRecordReader.close(); } catch (IOException e) { throw DataXException.asDataXException( HbaseSQLReaderErrorCode.PHOENIX_READER_CLOSE_ERROR, "phoenix的reader close失败,请重试,若还有问题请检查hbase集群状态" + e.getMessage()); } } } } ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import org.apache.hadoop.hbase.HConstants; public final class Key { public final static String MOCK_JOBID_IDENTIFIER = "phoenixreader"; public final static int MOCK_JOBID = 1; public final static String SPLIT_KEY = "phoenixsplit"; /** * 【必选】hbase集群配置,连接一个hbase集群需要的最小配置只有两个:zk和znode */ public final static String HBASE_CONFIG = "hbaseConfig"; public final static String HBASE_ZK_QUORUM = HConstants.ZOOKEEPER_QUORUM; public final static String HBASE_ZNODE_PARENT = HConstants.ZOOKEEPER_ZNODE_PARENT; /** * 【必选】writer要写入的表的表名 */ public final static String TABLE = "table"; /** * 【必选】列配置 */ public final static String COLUMN = "column"; /** * */ public static final String WHERE = "where"; /** * 【可选】Phoenix表所属schema,默认为空 */ public static final String SCHEMA = "schema"; public static final String NAME_SPACE_MAPPING_ENABLED = "phoenix.schema.isNamespaceMappingEnabled"; public static final String SYSTEM_TABLES_TO_NAMESPACE = "phoenix.schema.mapSystemTablesToNamespace"; } ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/LocalStrings.properties ================================================ errorcode.required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.illegal_value=\u60A8\u586B\u5199\u7684\u53C2\u6570\u503C\u4E0D\u5408\u6CD5. errorcode.get_phoenix_table_columns_error=\u83B7\u53D6\u8868\u7684\u5217\u51FA\u9519. errorcode.get_phoenix_connectioninfo_error=\u83B7\u53D6phoenix\u7684connectioninfo\u51FA\u9519. errorcode.get_phoenix_splits_error=\u83B7\u53D6phoenix\u7684split\u4FE1\u606F\u65F6\u51FA\u9519. errorcode.get_phoenix_createreader_error=\u521B\u5EFAphoenix\u7684split\u7684reader\u65F6\u51FA\u9519. errorcode.get_phoenix_readerinit_error=phoenix\u7684split\u7684reader\u521D\u59CB\u5316\u65F6\u51FA\u9519. errorcode.get_phoenix_column_typeconvert_error=\u5C06phoenix\u5217\u7684\u7C7B\u578B\u8F6C\u6362\u4E3Adatax\u7684\u7C7B\u578B\u65F6\u51FA\u9519. errorcode.get_phoenix_record_read_error=\u8BFB\u53D6phoenix\u5177\u4F53\u7684\u4E00\u884C\u65F6\u51FA\u9519. errorcode.get_phoenix_reader_close_error=\u5173\u95EDphoenix\u3000reader\u65F6\u51FA\u9519. sqlhelper.1=\u901A\u8FC7zkURL\u83B7\u53D6phoenix\u7684connectioninfo\u51FA\u9519\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u670D\u52A1\u662F\u5426\u6B63\u5E38 sqlhelper.2=\u83B7\u53D6\u8868\u7684split\u4FE1\u606F\u65F6\u51FA\u73B0\u4E86\u5F02\u5E38\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u670D\u52A1\u662F\u5426\u6B63\u5E38 sqlhelper.3=\u83B7\u53D6\u8868\u7684split\u4FE1\u606F\u65F6\u88AB\u4E2D\u65AD\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u8054\u7CFBdatax\u7BA1\u7406\u5458 sqlreadertask.1=\u83B7\u53D6\u8868\u7684\u5217\u51FA\u95EE\u9898\uFF0C\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.2=\u521B\u5EFAphoenix\u7684reader\u51FA\u73B0\u95EE\u9898,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.3=phoenix\u7684reader\u521D\u59CB\u5316\u51FA\u73B0\u95EE\u9898,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.4=phoenix\u7684reader\u521D\u59CB\u5316\u88AB\u4E2D\u65AD,\u8BF7\u91CD\u8BD5 sqlreadertask.5=\u9047\u5230\u4E0D\u53EF\u8BC6\u522B\u7684phoenix\u7C7B\u578B\uFF0C\u8BF7\u8054\u7CFBhbase\u7BA1\u7406\u5458 sqlreadertask.6=\u8BFB\u53D6phoenix\u7684record\u65F6\u51FA\u73B0\u95EE\u9898\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.7=\u8BFB\u53D6phoenix\u7684record\u65F6\u51FA\u73B0\u95EE\u9898\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.8=phoenix\u7684reader close\u5931\u8D25,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 hbaseconfig.1=hbase\u7684\u914D\u7F6E\u4FE1\u606F\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.2=hbase\u7684\u914D\u7F6E\u4FE1\u606F\u6709\u95EE\u9898\uFF0C\u8BF7\u53C2\u8003\u6587\u6863\u68C0\u67E5\u4E0B hbaseconfig.3=zkquorum\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.5=table\u7684\u540D\u5B57\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.6=column\u53C2\u6570\u6CA1\u6709\u914D\u7F6E hbaseconfig.7=\u4ECEphoenix\u83B7\u53D6column\u51FA\u9519\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/LocalStrings_en_US.properties ================================================ errorcode.required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.illegal_value=\u60A8\u586B\u5199\u7684\u53C2\u6570\u503C\u4E0D\u5408\u6CD5. errorcode.get_phoenix_table_columns_error=\u83B7\u53D6\u8868\u7684\u5217\u51FA\u9519. errorcode.get_phoenix_connectioninfo_error=\u83B7\u53D6phoenix\u7684connectioninfo\u51FA\u9519. errorcode.get_phoenix_splits_error=\u83B7\u53D6phoenix\u7684split\u4FE1\u606F\u65F6\u51FA\u9519. errorcode.get_phoenix_createreader_error=\u521B\u5EFAphoenix\u7684split\u7684reader\u65F6\u51FA\u9519. errorcode.get_phoenix_readerinit_error=phoenix\u7684split\u7684reader\u521D\u59CB\u5316\u65F6\u51FA\u9519. errorcode.get_phoenix_column_typeconvert_error=\u5C06phoenix\u5217\u7684\u7C7B\u578B\u8F6C\u6362\u4E3Adatax\u7684\u7C7B\u578B\u65F6\u51FA\u9519. errorcode.get_phoenix_record_read_error=\u8BFB\u53D6phoenix\u5177\u4F53\u7684\u4E00\u884C\u65F6\u51FA\u9519. errorcode.get_phoenix_reader_close_error=\u5173\u95EDphoenix\u3000reader\u65F6\u51FA\u9519. sqlhelper.1=\u901A\u8FC7zkURL\u83B7\u53D6phoenix\u7684connectioninfo\u51FA\u9519\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u670D\u52A1\u662F\u5426\u6B63\u5E38 sqlhelper.2=\u83B7\u53D6\u8868\u7684split\u4FE1\u606F\u65F6\u51FA\u73B0\u4E86\u5F02\u5E38\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u670D\u52A1\u662F\u5426\u6B63\u5E38 sqlhelper.3=\u83B7\u53D6\u8868\u7684split\u4FE1\u606F\u65F6\u88AB\u4E2D\u65AD\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u8054\u7CFBdatax\u7BA1\u7406\u5458 sqlreadertask.1=\u83B7\u53D6\u8868\u7684\u5217\u51FA\u95EE\u9898\uFF0C\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.2=\u521B\u5EFAphoenix\u7684reader\u51FA\u73B0\u95EE\u9898,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.3=phoenix\u7684reader\u521D\u59CB\u5316\u51FA\u73B0\u95EE\u9898,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.4=phoenix\u7684reader\u521D\u59CB\u5316\u88AB\u4E2D\u65AD,\u8BF7\u91CD\u8BD5 sqlreadertask.5=\u9047\u5230\u4E0D\u53EF\u8BC6\u522B\u7684phoenix\u7C7B\u578B\uFF0C\u8BF7\u8054\u7CFBhbase\u7BA1\u7406\u5458 sqlreadertask.6=\u8BFB\u53D6phoenix\u7684record\u65F6\u51FA\u73B0\u95EE\u9898\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.7=\u8BFB\u53D6phoenix\u7684record\u65F6\u51FA\u73B0\u95EE\u9898\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.8=phoenix\u7684reader close\u5931\u8D25,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 hbaseconfig.1=hbase\u7684\u914D\u7F6E\u4FE1\u606F\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.2=hbase\u7684\u914D\u7F6E\u4FE1\u606F\u6709\u95EE\u9898\uFF0C\u8BF7\u53C2\u8003\u6587\u6863\u68C0\u67E5\u4E0B hbaseconfig.3=zkquorum\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.5=table\u7684\u540D\u5B57\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.6=column\u53C2\u6570\u6CA1\u6709\u914D\u7F6E hbaseconfig.7=\u4ECEphoenix\u83B7\u53D6column\u51FA\u9519\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/LocalStrings_ja_JP.properties ================================================ errorcode.required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.illegal_value=\u60A8\u586B\u5199\u7684\u53C2\u6570\u503C\u4E0D\u5408\u6CD5. errorcode.get_phoenix_table_columns_error=\u83B7\u53D6\u8868\u7684\u5217\u51FA\u9519. errorcode.get_phoenix_connectioninfo_error=\u83B7\u53D6phoenix\u7684connectioninfo\u51FA\u9519. errorcode.get_phoenix_splits_error=\u83B7\u53D6phoenix\u7684split\u4FE1\u606F\u65F6\u51FA\u9519. errorcode.get_phoenix_createreader_error=\u521B\u5EFAphoenix\u7684split\u7684reader\u65F6\u51FA\u9519. errorcode.get_phoenix_readerinit_error=phoenix\u7684split\u7684reader\u521D\u59CB\u5316\u65F6\u51FA\u9519. errorcode.get_phoenix_column_typeconvert_error=\u5C06phoenix\u5217\u7684\u7C7B\u578B\u8F6C\u6362\u4E3Adatax\u7684\u7C7B\u578B\u65F6\u51FA\u9519. errorcode.get_phoenix_record_read_error=\u8BFB\u53D6phoenix\u5177\u4F53\u7684\u4E00\u884C\u65F6\u51FA\u9519. errorcode.get_phoenix_reader_close_error=\u5173\u95EDphoenix\u3000reader\u65F6\u51FA\u9519. sqlhelper.1=\u901A\u8FC7zkURL\u83B7\u53D6phoenix\u7684connectioninfo\u51FA\u9519\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u670D\u52A1\u662F\u5426\u6B63\u5E38 sqlhelper.2=\u83B7\u53D6\u8868\u7684split\u4FE1\u606F\u65F6\u51FA\u73B0\u4E86\u5F02\u5E38\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u670D\u52A1\u662F\u5426\u6B63\u5E38 sqlhelper.3=\u83B7\u53D6\u8868\u7684split\u4FE1\u606F\u65F6\u88AB\u4E2D\u65AD\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u8054\u7CFBdatax\u7BA1\u7406\u5458 sqlreadertask.1=\u83B7\u53D6\u8868\u7684\u5217\u51FA\u95EE\u9898\uFF0C\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.2=\u521B\u5EFAphoenix\u7684reader\u51FA\u73B0\u95EE\u9898,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.3=phoenix\u7684reader\u521D\u59CB\u5316\u51FA\u73B0\u95EE\u9898,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.4=phoenix\u7684reader\u521D\u59CB\u5316\u88AB\u4E2D\u65AD,\u8BF7\u91CD\u8BD5 sqlreadertask.5=\u9047\u5230\u4E0D\u53EF\u8BC6\u522B\u7684phoenix\u7C7B\u578B\uFF0C\u8BF7\u8054\u7CFBhbase\u7BA1\u7406\u5458 sqlreadertask.6=\u8BFB\u53D6phoenix\u7684record\u65F6\u51FA\u73B0\u95EE\u9898\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.7=\u8BFB\u53D6phoenix\u7684record\u65F6\u51FA\u73B0\u95EE\u9898\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.8=phoenix\u7684reader close\u5931\u8D25,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 hbaseconfig.1=hbase\u7684\u914D\u7F6E\u4FE1\u606F\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.2=hbase\u7684\u914D\u7F6E\u4FE1\u606F\u6709\u95EE\u9898\uFF0C\u8BF7\u53C2\u8003\u6587\u6863\u68C0\u67E5\u4E0B hbaseconfig.3=zkquorum\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.5=table\u7684\u540D\u5B57\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.6=column\u53C2\u6570\u6CA1\u6709\u914D\u7F6E hbaseconfig.7=\u4ECEphoenix\u83B7\u53D6column\u51FA\u9519\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 ================================================ FILE: hbase11xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/LocalStrings_zh_CN.properties ================================================ errorcode.required_value=\u60A8\u7F3A\u5931\u4E86\u5FC5\u987B\u586B\u5199\u7684\u53C2\u6570\u503C. errorcode.illegal_value=\u60A8\u586B\u5199\u7684\u53C2\u6570\u503C\u4E0D\u5408\u6CD5. errorcode.get_phoenix_table_columns_error=\u83B7\u53D6\u8868\u7684\u5217\u51FA\u9519. errorcode.get_phoenix_connectioninfo_error=\u83B7\u53D6phoenix\u7684connectioninfo\u51FA\u9519. errorcode.get_phoenix_splits_error=\u83B7\u53D6phoenix\u7684split\u4FE1\u606F\u65F6\u51FA\u9519. errorcode.get_phoenix_createreader_error=\u521B\u5EFAphoenix\u7684split\u7684reader\u65F6\u51FA\u9519. errorcode.get_phoenix_readerinit_error=phoenix\u7684split\u7684reader\u521D\u59CB\u5316\u65F6\u51FA\u9519. errorcode.get_phoenix_column_typeconvert_error=\u5C06phoenix\u5217\u7684\u7C7B\u578B\u8F6C\u6362\u4E3Adatax\u7684\u7C7B\u578B\u65F6\u51FA\u9519. errorcode.get_phoenix_record_read_error=\u8BFB\u53D6phoenix\u5177\u4F53\u7684\u4E00\u884C\u65F6\u51FA\u9519. errorcode.get_phoenix_reader_close_error=\u5173\u95EDphoenix\u3000reader\u65F6\u51FA\u9519. sqlhelper.1=\u901A\u8FC7zkURL\u83B7\u53D6phoenix\u7684connectioninfo\u51FA\u9519\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u670D\u52A1\u662F\u5426\u6B63\u5E38 sqlhelper.2=\u83B7\u53D6\u8868\u7684split\u4FE1\u606F\u65F6\u51FA\u73B0\u4E86\u5F02\u5E38\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u670D\u52A1\u662F\u5426\u6B63\u5E38 sqlhelper.3=\u83B7\u53D6\u8868\u7684split\u4FE1\u606F\u65F6\u88AB\u4E2D\u65AD\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u8054\u7CFBdatax\u7BA1\u7406\u5458 sqlreadertask.1=\u83B7\u53D6\u8868\u7684\u5217\u51FA\u95EE\u9898\uFF0C\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.2=\u521B\u5EFAphoenix\u7684reader\u51FA\u73B0\u95EE\u9898,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.3=phoenix\u7684reader\u521D\u59CB\u5316\u51FA\u73B0\u95EE\u9898,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.4=phoenix\u7684reader\u521D\u59CB\u5316\u88AB\u4E2D\u65AD,\u8BF7\u91CD\u8BD5 sqlreadertask.5=\u9047\u5230\u4E0D\u53EF\u8BC6\u522B\u7684phoenix\u7C7B\u578B\uFF0C\u8BF7\u8054\u7CFBhbase\u7BA1\u7406\u5458 sqlreadertask.6=\u8BFB\u53D6phoenix\u7684record\u65F6\u51FA\u73B0\u95EE\u9898\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.7=\u8BFB\u53D6phoenix\u7684record\u65F6\u51FA\u73B0\u95EE\u9898\uFF0C\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 sqlreadertask.8=phoenix\u7684reader close\u5931\u8D25,\u8BF7\u91CD\u8BD5\uFF0C\u82E5\u8FD8\u6709\u95EE\u9898\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 hbaseconfig.1=hbase\u7684\u914D\u7F6E\u4FE1\u606F\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.2=hbase\u7684\u914D\u7F6E\u4FE1\u606F\u6709\u95EE\u9898\uFF0C\u8BF7\u53C2\u8003\u6587\u6863\u68C0\u67E5\u4E0B hbaseconfig.3=zkquorum\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.5=table\u7684\u540D\u5B57\u4E0D\u80FD\u4E3A\u7A7A hbaseconfig.6=column\u53C2\u6570\u6CA1\u6709\u914D\u7F6E hbaseconfig.7=\u4ECEphoenix\u83B7\u53D6column\u51FA\u9519\uFF0C\u8BF7\u68C0\u67E5hbase\u96C6\u7FA4\u72B6\u6001 ================================================ FILE: hbase11xsqlreader/src/main/resources/plugin.json ================================================ { "name": "hbase11xsqlreader", "class": "com.alibaba.datax.plugin.reader.hbase11xsqlreader.HbaseSQLReader", "description": "useScene: prod. mechanism: Scan to read data.", "developer": "alibaba" } ================================================ FILE: hbase11xsqlreader/src/main/resources/plugin_job_template.json ================================================ { "name": "hbase11sqlreader", "parameter": { "hbaseConfig": { "hbase.zookeeper.quorum": "hb-proxy-pub-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-pub-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-pub-xxx-003.hbase.rds.aliyuncs.com" }, "table": "TABLE1", "column": [ "ID", "COL1" ] } } ================================================ FILE: hbase11xsqlreader/src/test/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLHelperTest.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import com.alibaba.datax.common.util.Configuration; import org.junit.Test; import java.util.List; import static junit.framework.Assert.assertEquals; /** * Created by shf on 16/7/20. */ public class HbaseSQLHelperTest { private String jsonStr = "{\n" + " \"hbaseConfig\": {\n" + " \"hbase.zookeeper.quorum\": \"hb-proxy-pub-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-pub-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-pub-xxx-003.hbase.rds.aliyuncs.com\"\n" + " },\n" + " \"table\": \"TABLE1\",\n" + " \"column\": []\n" + " }"; @Test public void testParseConfig() { Configuration config = Configuration.from(jsonStr); HbaseSQLReaderConfig readerConfig = HbaseSQLHelper.parseConfig(config); System.out.println("tablenae = " +readerConfig.getTableName() +",zk = " +readerConfig.getZkUrl()); assertEquals("TABLE1", readerConfig.getTableName()); assertEquals("hb-proxy-pub-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-pub-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-pub-xxx-003.hbase.rds.aliyuncs.com:2181", readerConfig.getZkUrl()); } @Test public void testSplit() { Configuration config = Configuration.from(jsonStr); HbaseSQLReaderConfig readerConfig = HbaseSQLHelper.parseConfig(config); List splits = HbaseSQLHelper.split(readerConfig); System.out.println("split size = " + splits.size()); } } ================================================ FILE: hbase11xsqlreader/src/test/java/com/alibaba/datax/plugin/reader/hbase11xsqlreader/HbaseSQLReaderTaskTest.java ================================================ package com.alibaba.datax.plugin.reader.hbase11xsqlreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.transport.record.DefaultRecord; import org.junit.Test; import java.io.IOException; import java.util.List; import static junit.framework.Assert.assertEquals; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; public class HbaseSQLReaderTaskTest { private String jsonStr = "{\n" + " \"hbaseConfig\": {\n" + " \"hbase.zookeeper.quorum\": \"hb-proxy-pub-xxx-001.hbase.rds.aliyuncs.com,hb-proxy-pub-xxx-002.hbase.rds.aliyuncs.com,hb-proxy-pub-xxx-003.hbase.rds.aliyuncs.com\"\n" + " },\n" + " \"table\": \"TABLE1\",\n" + " \"column\": []\n" + " }"; private List generateSplitConfig() throws IOException, InterruptedException { Configuration config = Configuration.from(jsonStr); HbaseSQLReaderConfig readerConfig = HbaseSQLHelper.parseConfig(config); List splits = HbaseSQLHelper.split(readerConfig); System.out.println("split size = " + splits.size()); return splits; } @Test public void testReadRecord() throws Exception { List splits = this.generateSplitConfig(); int allRecordNum = 0; for (int i = 0; i < splits.size(); i++) { RecordSender recordSender = mock(RecordSender.class); when(recordSender.createRecord()).thenReturn(new DefaultRecord()); Record record = recordSender.createRecord(); HbaseSQLReaderTask hbase11SQLReaderTask = new HbaseSQLReaderTask(splits.get(i)); hbase11SQLReaderTask.init(); hbase11SQLReaderTask.prepare(); int num = 0; while (true) { boolean hasLine = false; try { hasLine = hbase11SQLReaderTask.readRecord(record); } catch (Exception e) { e.printStackTrace(); throw e; } if (!hasLine) break; num++; if (num % 100 == 0) System.out.println("record num is :" + num + ",record is " + record.toString()); when(recordSender.createRecord()).thenReturn(new DefaultRecord()); String recordStr = ""; for (int j = 0; j < record.getColumnNumber(); j++) { recordStr += record.getColumn(j).asString() + ","; } recordSender.sendToWriter(record); record = recordSender.createRecord(); } System.out.println("split id is " + i + ",record num = " + num); allRecordNum += num; recordSender.flush(); hbase11SQLReaderTask.destroy(); } System.out.println("all record num = " + allRecordNum); assertEquals(10000, allRecordNum); } } ================================================ FILE: hbase11xsqlwriter/doc/hbase11xsqlwriter.md ================================================ # HBase11xsqlwriter插件文档 ## 1. 快速介绍 HBase11xsqlwriter实现了向hbase中的SQL表(phoenix)批量导入数据的功能。Phoenix因为对rowkey做了数据编码,所以,直接使用HBaseAPI进行写入会面临手工数据转换的问题,麻烦且易错。本插件提供了单间的SQL表的数据导入方式。 在底层实现上,通过Phoenix的JDBC驱动,执行UPSERT语句向hbase写入数据。 ### 1.1 支持的功能 * 支持带索引的表的数据导入,可以同步更新所有的索引表 ### 1.2 限制 * 仅支持1.x系列的hbase * 仅支持通过phoenix创建的表,不支持原生HBase表 * 不支持带时间戳的数据导入 ## 2. 实现原理 通过Phoenix的JDBC驱动,执行UPSERT语句向表中批量写入数据。因为使用上层接口,所以,可以同步更新索引表。 ## 3. 配置说明 ### 3.1 配置样例 ```json { "job": { "entry": { "jvm": "-Xms2048m -Xmx2048m" }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": "/Users/shf/workplace/datax_test/hbase11xsqlwriter/txt/normal.txt", "charset": "UTF-8", "column": [ { "index": 0, "type": "String" }, { "index": 1, "type": "string" }, { "index": 2, "type": "string" }, { "index": 3, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "hbase11xsqlwriter", "parameter": { "batchSize": "256", "column": [ "UID", "TS", "EVENTID", "CONTENT" ], "hbaseConfig": { "hbase.zookeeper.quorum": "目标hbase集群的ZK服务器地址,向PE咨询", "zookeeper.znode.parent": "目标hbase集群的znode,向PE咨询" }, "nullMode": "skip", "table": "目标hbase表名,大小写有关" } } } ], "setting": { "speed": { "channel": 5 } } } } ``` ### 3.2 参数说明 * **name** * 描述:插件名字,必须是`hbase11xsqlwriter` * 必选:是 * 默认值:无 * **table** * 描述:要导入的表名,大小写敏感,通常phoenix表都是**大写**表名 * 必选:是 * 默认值:无 * **column** * 描述:列名,大小写敏感,通常phoenix的列名都是**大写**。 * 需要注意列的顺序,必须与reader输出的列的顺序一一对应。 * 不需要填写数据类型,会自动从phoenix获取列的元数据 * 必选:是 * 默认值:无 * **hbaseConfig** * 描述:hbase集群地址,zk为必填项,格式:ip1,ip2,ip3,注意,多个IP之间使用英文的逗号分隔。znode是可选的,默认值是/hbase * 必选:是 * 默认值:无 * **batchSize** * 描述:批量写入的最大行数 * 必选:否 * 默认值:256 * **nullMode** * 描述:读取到的列值为null时,如何处理。目前有两种方式: * skip:跳过这一列,即不插入这一列(如果该行的这一列之前已经存在,则会被删除) * empty:插入空值,值类型的空值是0,varchar的空值是空字符串 * 必选:否 * 默认值:skip ## 4. 性能报告 无 ## 5. 约束限制 writer中的列的定义顺序必须与reader的列顺序匹配。reader中的列顺序定义了输出的每一行中,列的组织顺序。而writer的列顺序,定义的是在收到的数据中,writer期待的列的顺序。例如: reader的列顺序是: c1, c2, c3, c4 writer的列顺序是: x1, x2, x3, x4 则reader输出的列c1就会赋值给writer的列x1。如果writer的列顺序是x1, x2, x4, x3,则c3会赋值给x4,c4会赋值给x3. ## 6. FAQ 1. 并发开多少合适?速度慢时增加并发有用吗? 数据导入进程默认JVM的堆大小是2GB,并发(channel数)是通过多线程实现的,开过多的线程有时并不能提高导入速度,反而可能因为过于频繁的GC导致性能下降。一般建议并发数(channel)为5-10. 2. batchSize设置多少比较合适? 默认是256,但应根据每行的大小来计算最合适的batchSize。通常一次操作的数据量在2MB-4MB左右,用这个值除以行大小,即可得到batchSize。 ================================================ FILE: hbase11xsqlwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hbase11xsqlwriter hbase11xsqlwriter 0.0.1-SNAPSHOT jar 4.11.0-HBase-1.1 2.7.1 1.8 3.2.0 4.4.1 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.apache.hadoop hadoop-hdfs ${hadoop.version} org.apache.hadoop hadoop-common ${hadoop.version} org.apache.phoenix phoenix-core ${phoenix.version} jdk.tools jdk.tools org.apache.phoenix phoenix-queryserver-client ${phoenix.version} com.google.guava guava 12.0.1 commons-codec commons-codec ${commons-codec.version} org.apache.httpcomponents httpclient ${httpclient.version} com.google.protobuf protobuf-java ${protobuf.version} junit junit test com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-service-face test org.mockito mockito-all 1.9.5 test src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hbase11xsqlwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin/writer/hbase11xsqlwriter target/ hbase11xsqlwriter-0.0.1-SNAPSHOT.jar plugin/writer/hbase11xsqlwriter false plugin/writer/hbase11xsqlwriter/libs runtime ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; public final class Constant { public static final String DEFAULT_ENCODING = "UTF-8"; public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String DEFAULT_NULL_MODE = "skip"; public static final String DEFAULT_ZNODE = "/hbase"; public static final boolean DEFAULT_LAST_COLUMN_IS_VERSION = false; // 默认最后一列不是version列 public static final int DEFAULT_BATCH_ROW_COUNT = 256; // 默认一次写256行 public static final boolean DEFAULT_TRUNCATE = false; // 默认开始的时候不清空表 public static final boolean DEFAULT_USE_THIN_CLIENT = false; // 默认不用thin客户端 public static final int TYPE_UNSIGNED_TINYINT = 11; public static final int TYPE_UNSIGNED_SMALLINT = 13; public static final int TYPE_UNSIGNED_INTEGER = 9; public static final int TYPE_UNSIGNED_LONG = 10; public static final int TYPE_UNSIGNED_FLOAT = 14; public static final int TYPE_UNSIGNED_DOUBLE = 15; public static final int TYPE_UNSIGNED_DATE = 19; public static final int TYPE_UNSIGNED_TIME = 18; public static final int TYPE_UNSIGNED_TIMESTAMP = 20; } ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLHelper.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.Admin; import org.apache.hadoop.hbase.util.Pair; import org.apache.phoenix.jdbc.PhoenixConnection; import org.apache.phoenix.schema.ColumnNotFoundException; import org.apache.phoenix.schema.MetaDataClient; import org.apache.phoenix.schema.PTable; import org.apache.phoenix.schema.types.PDataType; import org.apache.phoenix.util.SchemaUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; import java.sql.Statement; import java.util.HashMap; import java.util.List; import java.util.Map; /** * @author yanghan.y */ public class HbaseSQLHelper { private static final Logger LOG = LoggerFactory.getLogger(HbaseSQLHelper.class); public static ThinClientPTable ptable; /** * 将datax的配置解析成sql writer的配置 */ public static HbaseSQLWriterConfig parseConfig(Configuration cfg) { return HbaseSQLWriterConfig.parse(cfg); } /** * 将hbase config字符串解析成zk quorum和znode。 * 因为hbase使用的配置名称 xxx.xxxx.xxx会被{@link Configuration#from(String)}识别成json路径, * 而不是一个完整的配置项,所以,hbase的配置必须通过直接调用json API进行解析。 * @param hbaseCfgString 配置中{@link Key#HBASE_CONFIG}的值 * @return 返回2个string,第一个是zk quorum,第二个是znode */ public static Pair getHbaseConfig(String hbaseCfgString) { assert hbaseCfgString != null; Map hbaseConfigMap = JSON.parseObject(hbaseCfgString, new TypeReference>() {}); String zkQuorum = hbaseConfigMap.get(Key.HBASE_ZK_QUORUM); String znode = hbaseConfigMap.get(Key.HBASE_ZNODE_PARENT); if (znode == null || znode.isEmpty()) { znode = Constant.DEFAULT_ZNODE; } return new Pair(zkQuorum, znode); } public static Map getThinConnectConfig(String hbaseCfgString) { assert hbaseCfgString != null; return JSON.parseObject(hbaseCfgString, new TypeReference>() {}); } /** * 校验配置 */ public static void validateConfig(HbaseSQLWriterConfig cfg) { // 校验集群地址:尝试连接,连不上就说明有问题,抛错退出 Connection conn = getJdbcConnection(cfg); // 检查表:存在,可用 checkTable(conn, cfg.getNamespace(), cfg.getTableName(), cfg.isThinClient()); // 校验元数据:配置中给出的列必须是目的表中已经存在的列 PTable schema = null; try { schema = getTableSchema(conn, cfg.getNamespace(), cfg.getTableName(), cfg.isThinClient()); } catch (SQLException e) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.GET_HBASE_CONNECTION_ERROR, "无法获取目的表" + cfg.getTableName() + "的元数据信息,表可能不是SQL表或表名配置错误,请检查您的配置 或者 联系 HBase 管理员.", e); } try { List columnNames = cfg.getColumns(); for (String colName : columnNames) { schema.getColumnForColumnName(colName); } } catch (ColumnNotFoundException e) { // 用户配置的列名在元数据中不存在 throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "您配置的列" + e.getColumnName() + "在目的表" + cfg.getTableName() + "的元数据中不存在,请检查您的配置 或者 联系 HBase 管理员.", e); } catch (SQLException e) { // 列名有二义性或者其他问题 throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "目的表" + cfg.getTableName() + "的列信息校验失败,请检查您的配置 或者 联系 HBase 管理员.", e); } } /** * 获取JDBC连接,轻量级连接,使用完后必须显式close */ public static Connection getJdbcConnection(HbaseSQLWriterConfig cfg) { String connStr = cfg.getConnectionString(); LOG.debug("Connecting to HBase cluster [" + connStr + "] ..."); Connection conn; try { Class.forName("org.apache.phoenix.jdbc.PhoenixDriver"); if (cfg.isThinClient()) { conn = getThinClientJdbcConnection(cfg); } else { conn = DriverManager.getConnection(connStr); } conn.setAutoCommit(false); } catch (Throwable e) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.GET_HBASE_CONNECTION_ERROR, "无法连接hbase集群,配置不正确或目标集群不可用,请检查配置和集群状态 或者 联系 HBase 管理员.", e); } LOG.debug("Connected to HBase cluster successfully."); return conn; } /** * 创建 thin client jdbc连接 * @param cfg * @return * @throws SQLException */ public static Connection getThinClientJdbcConnection(HbaseSQLWriterConfig cfg) throws SQLException { String connStr = cfg.getConnectionString(); LOG.info("Connecting to HBase cluster [" + connStr + "] use thin client ..."); Connection conn = DriverManager.getConnection(connStr, cfg.getUsername(), cfg.getPassword()); String userNamespaceQuery = "use " + cfg.getNamespace(); Statement statement = null; try { statement = conn.createStatement(); statement.executeUpdate(userNamespaceQuery); return conn; } catch (Exception e) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.GET_HBASE_CONNECTION_ERROR, "无法连接配置的namespace, 请检查配置 或者 联系 HBase 管理员.", e); } finally { if (statement != null) { statement.close(); } } } /** * 获取一张表的元数据信息 * @param conn hbsae sql的jdbc连接 * @param fullTableName 目标表的完整表名 * @return 表的元数据 */ public static PTable getTableSchema(Connection conn, String fullTableName) throws SQLException { PhoenixConnection hconn = conn.unwrap(PhoenixConnection.class); MetaDataClient mdc = new MetaDataClient(hconn); String schemaName = SchemaUtil.getSchemaNameFromFullName(fullTableName); String tableName = SchemaUtil.getTableNameFromFullName(fullTableName); return mdc.updateCache(schemaName, tableName).getTable(); } /** * 获取一张表的元数据信息 * @param conn * @param namespace * @param fullTableName * @param isThinClient 是否使用thin client * @return 表的元数据 * @throws SQLException */ public static PTable getTableSchema(Connection conn, String namespace, String fullTableName, boolean isThinClient) throws SQLException { LOG.info("Start to get table schema of namespace=" + namespace + " , fullTableName=" + fullTableName); if (!isThinClient) { return getTableSchema(conn, fullTableName); } else { if (ptable == null) { ResultSet result = conn.getMetaData().getColumns(null, namespace, fullTableName, null); try { ThinClientPTable retTable = new ThinClientPTable(); retTable.setColTypeMap(parseColType(result)); ptable = retTable; }finally { if (result != null) { result.close(); } } } return ptable; } } /** * 解析字段 * @param rs * @return * @throws SQLException */ public static Map parseColType(ResultSet rs) throws SQLException { Map cols = new HashMap(); ResultSetMetaData md = rs.getMetaData(); int columnCount = md.getColumnCount(); while (rs.next()) { String colName = null; PDataType colType = null; for (int i = 1; i <= columnCount; i++) { if (md.getColumnLabel(i).equals("TYPE_NAME")) { colType = PDataType.fromSqlTypeName((String) rs.getObject(i)); } else if (md.getColumnLabel(i).equals("COLUMN_NAME")) { colName = (String) rs.getObject(i); } } if (colType == null || colName == null) { throw new SQLException("ColType or colName is null, colType : " + colType + " , colName : " + colName); } cols.put(colName, new ThinClientPTable.ThinClientPColumn(colName, colType)); } return cols; } /** * 清空表 */ public static void truncateTable(Connection conn, String tableName) { PhoenixConnection sqlConn = null; Admin admin = null; try { sqlConn = conn.unwrap(PhoenixConnection.class); admin = sqlConn.getQueryServices().getAdmin(); TableName hTableName = TableName.valueOf(tableName); // 确保表存在、可用 checkTable(admin, hTableName); // 清空表 admin.disableTable(hTableName); admin.truncateTable(hTableName, true); LOG.debug("Table " + tableName + " has been truncated."); } catch (Throwable t) { // 清空表失败 throw DataXException.asDataXException(HbaseSQLWriterErrorCode.TRUNCATE_HBASE_ERROR, "清空目的表" + tableName + "失败,请联系 HBase 管理员.", t); } finally { if (admin != null) { closeAdmin(admin); } } } /** * 检查表 * @param conn * @param namespace * @param tableName * @param isThinClient * @throws DataXException */ public static void checkTable(Connection conn, String namespace, String tableName, boolean isThinClient) throws DataXException { if (!isThinClient) { checkTable(conn, tableName); } else { //ignore check table when use thin client } } /** * 检查表:表要存在,enabled */ public static void checkTable(Connection conn, String tableName) throws DataXException { PhoenixConnection sqlConn = null; Admin admin = null; try { sqlConn = conn.unwrap(PhoenixConnection.class); admin = sqlConn.getQueryServices().getAdmin(); TableName hTableName = TableName.valueOf(tableName); checkTable(admin, hTableName); } catch (SQLException t) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.TRUNCATE_HBASE_ERROR, "表" + tableName + "状态检查未通过,请检查您的集群和表状态 或者 联系 Hbase 管理员.", t); } catch (IOException t) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.TRUNCATE_HBASE_ERROR, "表" + tableName + "状态检查未通过,请检查您的集群和表状态 或者 联系 Hbase 管理员.", t); } finally { if (admin != null) { closeAdmin(admin); } } } private static void checkTable(Admin admin, TableName tableName) throws IOException { if(!admin.tableExists(tableName)){ throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "HBase目的表" + tableName.toString() + "不存在, 请检查您的配置 或者 联系 Hbase 管理员."); } if(!admin.isTableAvailable(tableName)){ throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "HBase目的表" + tableName.toString() + "不可用, 请检查您的配置 或者 联系 Hbase 管理员."); } if(admin.isTableDisabled(tableName)){ throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "HBase目的表" + tableName.toString() + "不可用, 请检查您的配置 或者 联系 Hbase 管理员."); } } private static void closeAdmin(Admin admin){ try { if(null != admin) admin.close(); } catch (IOException e) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.CLOSE_HBASE_AMIN_ERROR, e); } } } ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLWriter.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import java.sql.Connection; import java.util.ArrayList; import java.util.List; /** * @author yanghan.y */ public class HbaseSQLWriter extends Writer { public static class Job extends Writer.Job { private HbaseSQLWriterConfig config; @Override public void init() { // 解析配置 config = HbaseSQLHelper.parseConfig(this.getPluginJobConf()); // 校验配置,会访问集群来检查表 HbaseSQLHelper.validateConfig(config); } @Override public void prepare() { // 写之前是否要清空目标表,默认不清空 if(config.truncate()) { Connection conn = HbaseSQLHelper.getJdbcConnection(config); HbaseSQLHelper.truncateTable(conn, config.getTableName()); } } @Override public List split(int mandatoryNumber) { List splitResultConfigs = new ArrayList(); for (int j = 0; j < mandatoryNumber; j++) { splitResultConfigs.add(config.getOriginalConfig().clone()); } return splitResultConfigs; } @Override public void destroy() { // NOOP } } public static class Task extends Writer.Task { private Configuration taskConfig; private HbaseSQLWriterTask hbaseSQLWriterTask; @Override public void init() { this.taskConfig = super.getPluginJobConf(); this.hbaseSQLWriterTask = new HbaseSQLWriterTask(this.taskConfig); } @Override public void startWrite(RecordReceiver lineReceiver) { this.hbaseSQLWriterTask.startWriter(lineReceiver, super.getTaskPluginCollector()); } @Override public void destroy() { // hbaseSQLTask不需要close } } } ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLWriterConfig.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.google.common.base.Strings; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; import java.util.Map; /** * HBase SQL writer config * * @author yanghan.y */ public class HbaseSQLWriterConfig { private final static Logger LOG = LoggerFactory.getLogger(HbaseSQLWriterConfig.class); private Configuration originalConfig; // 原始的配置数据 // 集群配置 private String connectionString; // 表配置 private String tableName; private List columns; // 目的表的所有列的列名,包括主键和非主键,不包括时间列 // 其他配置 private NullModeType nullMode; private int batchSize; // 一次批量写入多少行 private boolean truncate; // 导入开始前是否要清空目的表 private boolean isThinClient; private String namespace; private String username; private String password; /** * @return 获取原始的datax配置 */ public Configuration getOriginalConfig() { return originalConfig; } /** * @return 获取连接字符串,使用ZK模式 */ public String getConnectionString() { return connectionString; } /** * @return 获取表名 */ public String getTableName() { return tableName; } /** * @return 返回所有的列,包括主键列和非主键列,但不包括version列 */ public List getColumns() { return columns; } /** * * @return */ public NullModeType getNullMode() { return nullMode; } /** * @return 批量写入的最大行数 */ public int getBatchSize() { return batchSize; } /** * @return 在writer初始化的时候是否要清空目标表 */ public boolean truncate() { return truncate; } public boolean isThinClient() { return isThinClient; } public String getNamespace() { return namespace; } public String getPassword() { return password; } public String getUsername() { return username; } /** * @param dataxCfg * @return */ public static HbaseSQLWriterConfig parse(Configuration dataxCfg) { assert dataxCfg != null; HbaseSQLWriterConfig cfg = new HbaseSQLWriterConfig(); cfg.originalConfig = dataxCfg; // 1. 解析集群配置 parseClusterConfig(cfg, dataxCfg); // 2. 解析列配置 parseTableConfig(cfg, dataxCfg); // 3. 解析其他配置 cfg.nullMode = NullModeType.getByTypeName(dataxCfg.getString(Key.NULL_MODE, Constant.DEFAULT_NULL_MODE)); cfg.batchSize = dataxCfg.getInt(Key.BATCH_SIZE, Constant.DEFAULT_BATCH_ROW_COUNT); cfg.truncate = dataxCfg.getBool(Key.TRUNCATE, Constant.DEFAULT_TRUNCATE); cfg.isThinClient = dataxCfg.getBool(Key.THIN_CLIENT, Constant.DEFAULT_USE_THIN_CLIENT); // 4. 打印解析出来的配置 LOG.info("HBase SQL writer config parsed:" + cfg.toString()); return cfg; } private static void parseClusterConfig(HbaseSQLWriterConfig cfg, Configuration dataxCfg) { // 获取hbase集群的连接信息字符串 String hbaseCfg = dataxCfg.getString(Key.HBASE_CONFIG); if (StringUtils.isBlank(hbaseCfg)) { // 集群配置必须存在且不为空 throw DataXException.asDataXException( HbaseSQLWriterErrorCode.REQUIRED_VALUE, "读 Hbase 时需要配置hbaseConfig,其内容为 Hbase 连接信息,请联系 Hbase PE 获取该信息."); } if (dataxCfg.getBool(Key.THIN_CLIENT, Constant.DEFAULT_USE_THIN_CLIENT)) { Map thinConnectConfig = HbaseSQLHelper.getThinConnectConfig(hbaseCfg); String thinConnectStr = thinConnectConfig.get(Key.HBASE_THIN_CONNECT_URL); cfg.namespace = thinConnectConfig.get(Key.HBASE_THIN_CONNECT_NAMESPACE); cfg.username = thinConnectConfig.get(Key.HBASE_THIN_CONNECT_USERNAME); cfg.password = thinConnectConfig.get(Key.HBASE_THIN_CONNECT_PASSWORD); if (Strings.isNullOrEmpty(thinConnectStr)) { throw DataXException.asDataXException( HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "thinClient=true的轻客户端模式下HBase的hbase.thin.connect.url配置不能为空,请联系HBase PE获取该信息."); } if (Strings.isNullOrEmpty(cfg.namespace) || Strings.isNullOrEmpty(cfg.username) || Strings .isNullOrEmpty(cfg.password)) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "thinClient=true的轻客户端模式下HBase的hbase.thin.connect.namespce|username|password配置不能为空,请联系HBase " + "PE获取该信息."); } cfg.connectionString = thinConnectStr; } else { // 解析zk服务器和znode信息 Pair zkCfg; try { zkCfg = HbaseSQLHelper.getHbaseConfig(hbaseCfg); } catch (Throwable t) { // 解析hbase配置错误 throw DataXException.asDataXException( HbaseSQLWriterErrorCode.REQUIRED_VALUE, "解析hbaseConfig出错,请确认您配置的hbaseConfig为合法的json数据格式,内容正确."); } String zkQuorum = zkCfg.getFirst(); String znode = zkCfg.getSecond(); if (zkQuorum == null || zkQuorum.isEmpty()) { throw DataXException.asDataXException( HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "HBase的hbase.zookeeper.quorum配置不能为空,请联系HBase PE获取该信息."); } if (znode == null || znode.isEmpty()) { throw DataXException.asDataXException( HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "HBase的zookeeper.znode.parent配置不能为空,请联系HBase PE获取该信息."); } // 生成sql使用的连接字符串, 格式: jdbc:phoenix:zk_quorum:2181:/znode_parent cfg.connectionString = "jdbc:phoenix:" + zkQuorum + ":2181:" + znode; } } private static void parseTableConfig(HbaseSQLWriterConfig cfg, Configuration dataxCfg) { // 解析并检查表名 cfg.tableName = dataxCfg.getString(Key.TABLE); if (cfg.tableName == null || cfg.tableName.isEmpty()) { throw DataXException.asDataXException( HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "HBase的tableName配置不能为空,请检查并修改配置."); } try { TableName tn = TableName.valueOf(cfg.tableName); } catch (Exception e) { throw DataXException.asDataXException( HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "您配置的tableName(" + cfg.tableName + ")含有非法字符,请检查您的配置 或者 联系 Hbase 管理员."); } // 解析列配置 cfg.columns = dataxCfg.getList(Key.COLUMN, String.class); if (cfg.columns == null || cfg.columns.isEmpty()) { throw DataXException.asDataXException( HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "HBase的columns配置不能为空,请添加目标表的列名配置."); } } @Override public String toString() { StringBuilder ret = new StringBuilder(); // 集群配置 ret.append("\n[jdbc]"); ret.append(connectionString); ret.append("\n"); // 表配置 ret.append("[tableName]"); ret.append(tableName); ret.append("\n"); ret.append("[column]"); for (String col : columns) { ret.append(col); ret.append(","); } ret.setLength(ret.length() - 1); ret.append("\n"); // 其他配置 ret.append("[nullMode]"); ret.append(nullMode); ret.append("\n"); ret.append("[batchSize]"); ret.append(batchSize); ret.append("\n"); ret.append("[truncate]"); ret.append(truncate); ret.append("\n"); return ret.toString(); } /** * 禁止直接实例化本类,必须调用{@link #parse}接口来初始化 */ private HbaseSQLWriterConfig() { } } ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum HbaseSQLWriterErrorCode implements ErrorCode { REQUIRED_VALUE("Hbasewriter-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("Hbasewriter-01", "您填写的参数值不合法."), GET_HBASE_CONNECTION_ERROR("Hbasewriter-02", "获取Hbase连接时出错."), GET_HBASE_TABLE_ERROR("Hbasewriter-03", "获取 Hbase table时出错."), CLOSE_HBASE_CONNECTION_ERROR("Hbasewriter-04", "关闭Hbase连接时出错."), CLOSE_HBASE_AMIN_ERROR("Hbasewriter-05", "关闭Hbase admin时出错."), CLOSE_HBASE_TABLE_ERROR("Hbasewriter-06", "关闭Hbase table时时出错."), PUT_HBASE_ERROR("Hbasewriter-07", "写入hbase时发生IO异常."), DELETE_HBASE_ERROR("Hbasewriter-08", "delete hbase表时发生异常."), TRUNCATE_HBASE_ERROR("Hbasewriter-09", "truncate hbase表时发生异常."), ; private final String code; private final String description; private HbaseSQLWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/HbaseSQLWriterTask.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.google.common.collect.Lists; import org.apache.phoenix.schema.PTable; import org.apache.phoenix.schema.types.PDataType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.math.BigDecimal; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Types; import java.util.Arrays; import java.util.List; /** * @author yanghan.y */ public class HbaseSQLWriterTask { private final static Logger LOG = LoggerFactory.getLogger(HbaseSQLWriterTask.class); private TaskPluginCollector taskPluginCollector; private HbaseSQLWriterConfig cfg; private Connection connection = null; private PreparedStatement ps = null; // 需要向hbsae写入的列的数量,即用户配置的column参数中列的个数。时间戳不包含在内 private int numberOfColumnsToWrite; // 期待从源头表的Record中拿到多少列 private int numberOfColumnsToRead; private boolean needExplicitVersion = false; private int[] columnTypes; public HbaseSQLWriterTask(Configuration configuration) { // 这里仅解析配置,不访问远端集群,配置的合法性检查在writer的init过程中进行 cfg = HbaseSQLHelper.parseConfig(configuration); } public void startWriter(RecordReceiver lineReceiver, TaskPluginCollector taskPluginCollector) { this.taskPluginCollector = taskPluginCollector; Record record; try { // 准备阶段 prepare(); List buffer = Lists.newArrayListWithExpectedSize(cfg.getBatchSize()); while ((record = lineReceiver.getFromReader()) != null) { // 校验列数量是否符合预期 if (record.getColumnNumber() != numberOfColumnsToRead) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "数据源给出的列数量[" + record.getColumnNumber() + "]与您配置中的列数量[" + numberOfColumnsToRead + "]不同, 请检查您的配置 或者 联系 Hbase 管理员."); } buffer.add(record); if (buffer.size() > cfg.getBatchSize()) { doBatchUpsert(buffer); buffer.clear(); } } // end while loop // 处理剩余的record if (!buffer.isEmpty()) { doBatchUpsert(buffer); buffer.clear(); } } catch (Throwable t) { // 确保所有异常都转化为DataXException throw DataXException.asDataXException(HbaseSQLWriterErrorCode.PUT_HBASE_ERROR, t); } finally { close(); } } private void prepare() throws SQLException { if (connection == null) { connection = HbaseSQLHelper.getJdbcConnection(cfg); connection.setAutoCommit(false); // 批量提交 } if (ps == null) { // 一个Task的生命周期中只使用一个PreparedStatement对象,所以,在 ps = createPreparedStatement(); columnTypes = getColumnSqlType(cfg.getColumns()); } } private void close() { if (ps != null) { try { ps.close(); } catch (SQLException e) { // 不会出错 LOG.error("Failed closing PreparedStatement", e); } } if (connection != null) { try { connection.close(); } catch (SQLException e) { // 不会出错 LOG.error("Failed closing Connection", e); } } } /** * 批量提交一组数据,如果失败,则尝试一行行提交,如果仍然失败,抛错给用户 */ private void doBatchUpsert(List records) throws SQLException { try { // 将所有record提交到connection缓存 for (Record r : records) { setupStatement(r); ps.executeUpdate(); } // 将缓存的数据提交到hbase connection.commit(); } catch (SQLException e) { LOG.error("Failed batch committing " + records.size() + " records", e); // 批量提交失败,则一行行重试,以确定那一行出错 connection.rollback(); doSingleUpsert(records); } catch (Exception e) { throw DataXException.asDataXException(HbaseSQLWriterErrorCode.PUT_HBASE_ERROR, e); } } /** * 单行提交,将出错的行记录到脏数据中。由脏数据收集模块判断任务是否继续 */ private void doSingleUpsert(List records) throws SQLException { for (Record r : records) { try { setupStatement(r); ps.executeUpdate(); connection.commit(); } catch (SQLException e) { //出错了,记录脏数据 LOG.error("Failed writing hbase", e); this.taskPluginCollector.collectDirtyRecord(r, e); } } } /** * 生成sql模板,并根据模板创建PreparedStatement */ private PreparedStatement createPreparedStatement() throws SQLException { // 生成列名集合,列之间用逗号分隔: col1,col2,col3,... StringBuilder columnNamesBuilder = new StringBuilder(); if (cfg.isThinClient()) { for (String col : cfg.getColumns()) { // thin 客户端不使用双引号 columnNamesBuilder.append(col); columnNamesBuilder.append(","); } } else { for (String col : cfg.getColumns()) { // 列名使用双引号,则不自动转换为全大写,而是保留用户配置的大小写 columnNamesBuilder.append("\""); columnNamesBuilder.append(col); columnNamesBuilder.append("\""); columnNamesBuilder.append(","); } } columnNamesBuilder.setLength(columnNamesBuilder.length() - 1); // 移除末尾多余的逗号 String columnNames = columnNamesBuilder.toString(); numberOfColumnsToWrite = cfg.getColumns().size(); numberOfColumnsToRead = numberOfColumnsToWrite; // 开始的时候,要读的列数娱要写的列数相等 // 生成UPSERT模板 String tableName = cfg.getTableName(); StringBuilder upsertBuilder = null; if (cfg.isThinClient()) { upsertBuilder = new StringBuilder("upsert into " + tableName + " (" + columnNames + " ) values ("); } else { // 表名使用双引号,则不自动转换为全大写,而是保留用户配置的大小写 upsertBuilder = new StringBuilder("upsert into \"" + tableName + "\" (" + columnNames + " ) values ("); } for (int i = 0; i < cfg.getColumns().size(); i++) { upsertBuilder.append("?,"); } upsertBuilder.setLength(upsertBuilder.length() - 1); // 移除末尾多余的逗号 upsertBuilder.append(")"); String sql = upsertBuilder.toString(); PreparedStatement ps = connection.prepareStatement(sql); LOG.debug("SQL template generated: " + sql); return ps; } /** * 根据列名来从数据库元数据中获取这一列对应的SQL类型 */ private int[] getColumnSqlType(List columnNames) throws SQLException { int[] types = new int[numberOfColumnsToWrite]; PTable ptable = HbaseSQLHelper .getTableSchema(connection, cfg.getNamespace(), cfg.getTableName(), cfg.isThinClient()); for (int i = 0; i < columnNames.size(); i++) { String name = columnNames.get(i); PDataType type = ptable.getColumnForColumnName(name).getDataType(); types[i] = type.getSqlType(); LOG.debug("Column name : " + name + ", sql type = " + type.getSqlType() + " " + type.getSqlTypeName()); } return types; } private void setupStatement(Record record) throws SQLException { // 一开始的时候就已经校验过record中的列数量与ps中需要的值数量相等 for (int i = 0; i < numberOfColumnsToWrite; i++) { Column col = record.getColumn(i); int sqlType = columnTypes[i]; // PreparedStatement中的索引从1开始,所以用i+1 setupColumn(i + 1, sqlType, col); } } private void setupColumn(int pos, int sqlType, Column col) throws SQLException { if (col.getRawData() != null) { switch (sqlType) { case Types.CHAR: case Types.VARCHAR: ps.setString(pos, col.asString()); break; case Types.BINARY: case Types.VARBINARY: ps.setBytes(pos, col.asBytes()); break; case Types.BOOLEAN: ps.setBoolean(pos, col.asBoolean()); break; case Types.TINYINT: case Constant.TYPE_UNSIGNED_TINYINT: ps.setByte(pos, col.asLong().byteValue()); break; case Types.SMALLINT: case Constant.TYPE_UNSIGNED_SMALLINT: ps.setShort(pos, col.asLong().shortValue()); break; case Types.INTEGER: case Constant.TYPE_UNSIGNED_INTEGER: ps.setInt(pos, col.asLong().intValue()); break; case Types.BIGINT: case Constant.TYPE_UNSIGNED_LONG: ps.setLong(pos, col.asLong()); break; case Types.FLOAT: ps.setFloat(pos, col.asDouble().floatValue()); break; case Types.DOUBLE: ps.setDouble(pos, col.asDouble()); break; case Types.DECIMAL: ps.setBigDecimal(pos, col.asBigDecimal()); break; case Types.DATE: case Constant.TYPE_UNSIGNED_DATE: ps.setDate(pos, new java.sql.Date(col.asDate().getTime())); break; case Types.TIME: case Constant.TYPE_UNSIGNED_TIME: ps.setTime(pos, new java.sql.Time(col.asDate().getTime())); break; case Types.TIMESTAMP: case Constant.TYPE_UNSIGNED_TIMESTAMP: ps.setTimestamp(pos, new java.sql.Timestamp(col.asDate().getTime())); break; default: throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "不支持您配置的列类型:" + sqlType + ", 请检查您的配置 或者 联系 Hbase 管理员."); } // end switch } else { // 没有值,按空值的配置情况处理 switch (cfg.getNullMode()){ case Skip: // 跳过空值,则不插入该列, ps.setNull(pos, sqlType); break; case Empty: // 插入"空值",请注意不同类型的空值不同 // 另外,对SQL来说,空值本身是有值的,这与直接操作HBASE Native API时的空值完全不同 ps.setObject(pos, getEmptyValue(sqlType)); break; default: // nullMode的合法性在初始化配置的时候已经校验过,这里一定不会出错 throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "Hbasewriter 不支持该 nullMode 类型: " + cfg.getNullMode() + ", 目前支持的 nullMode 类型是:" + Arrays.asList(NullModeType.values())); } } } /** * 根据类型获取"空值" * 值类型的空值都是0,bool是false,String是空字符串 * @param sqlType sql数据类型,定义于{@link Types} */ private Object getEmptyValue(int sqlType) { switch (sqlType) { case Types.VARCHAR: return ""; case Types.BOOLEAN: return false; case Types.TINYINT: case Constant.TYPE_UNSIGNED_TINYINT: return (byte) 0; case Types.SMALLINT: case Constant.TYPE_UNSIGNED_SMALLINT: return (short) 0; case Types.INTEGER: case Constant.TYPE_UNSIGNED_INTEGER: return (int) 0; case Types.BIGINT: case Constant.TYPE_UNSIGNED_LONG: return (long) 0; case Types.FLOAT: return (float) 0.0; case Types.DOUBLE: return (double) 0.0; case Types.DECIMAL: return new BigDecimal(0); case Types.DATE: case Constant.TYPE_UNSIGNED_DATE: return new java.sql.Date(0); case Types.TIME: case Constant.TYPE_UNSIGNED_TIME: return new java.sql.Time(0); case Types.TIMESTAMP: case Constant.TYPE_UNSIGNED_TIMESTAMP: return new java.sql.Timestamp(0); case Types.BINARY: case Types.VARBINARY: return new byte[0]; default: throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "不支持您配置的列类型:" + sqlType + ", 请检查您的配置 或者 联系 Hbase 管理员."); } } } ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import org.apache.hadoop.hbase.HConstants; public final class Key { /** * 【必选】hbase集群配置,连接一个hbase集群需要的最小配置只有两个:zk和znode */ public final static String HBASE_CONFIG = "hbaseConfig"; public final static String HBASE_ZK_QUORUM = HConstants.ZOOKEEPER_QUORUM; public final static String HBASE_ZNODE_PARENT = HConstants.ZOOKEEPER_ZNODE_PARENT; public final static String HBASE_THIN_CONNECT_URL = "hbase.thin.connect.url"; public final static String HBASE_THIN_CONNECT_NAMESPACE = "hbase.thin.connect.namespace"; public final static String HBASE_THIN_CONNECT_USERNAME = "hbase.thin.connect.username"; public final static String HBASE_THIN_CONNECT_PASSWORD = "hbase.thin.connect.password"; /** * 【必选】writer要写入的表的表名 */ public final static String TABLE = "table"; /** * 【必选】列配置 */ public final static String COLUMN = "column"; public static final String NAME = "name"; /** * 【可选】遇到空值默认跳过 */ public static final String NULL_MODE = "nullMode"; /** * 【可选】 * 在writer初始化的时候,是否清空目的表 * 如果全局启动多个writer,则必须确保所有的writer都prepare之后,再开始导数据。 */ public static final String TRUNCATE = "truncate"; public static final String THIN_CLIENT = "thinClient"; /** * 【可选】批量写入的最大行数,默认100行 */ public static final String BATCH_SIZE = "batchSize"; } ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/NullModeType.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; public enum NullModeType { Skip("skip"), Empty("empty") ; private String mode; NullModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static NullModeType getByTypeName(String modeName) { for (NullModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(HbaseSQLWriterErrorCode.ILLEGAL_VALUE, "Hbasewriter 不支持该 nullMode 类型:" + modeName + ", 目前支持的 nullMode 类型是:" + Arrays.asList(values())); } } ================================================ FILE: hbase11xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xsqlwriter/ThinClientPTable.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xsqlwriter; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.phoenix.hbase.index.util.KeyValueBuilder; import org.apache.phoenix.index.IndexMaintainer; import org.apache.phoenix.jdbc.PhoenixConnection; import org.apache.phoenix.schema.AmbiguousColumnException; import org.apache.phoenix.schema.ColumnFamilyNotFoundException; import org.apache.phoenix.schema.ColumnNotFoundException; import org.apache.phoenix.schema.PColumn; import org.apache.phoenix.schema.PColumnFamily; import org.apache.phoenix.schema.PIndexState; import org.apache.phoenix.schema.PName; import org.apache.phoenix.schema.PRow; import org.apache.phoenix.schema.PTable; import org.apache.phoenix.schema.PTableKey; import org.apache.phoenix.schema.PTableType; import org.apache.phoenix.schema.RowKeySchema; import org.apache.phoenix.schema.SortOrder; import org.apache.phoenix.schema.types.PDataType; import java.util.List; import java.util.Map; public class ThinClientPTable implements PTable { private Map colMap; public void setColTypeMap(Map colMap) { this.colMap = colMap; } @Override public long getTimeStamp() { throw new UnsupportedOperationException("Not implement"); } @Override public long getSequenceNumber() { throw new UnsupportedOperationException("Not implement"); } @Override public long getIndexDisableTimestamp() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getName() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getSchemaName() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getTableName() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getTenantId() { throw new UnsupportedOperationException("Not implement"); } @Override public PTableType getType() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getPKName() { throw new UnsupportedOperationException("Not implement"); } @Override public List getPKColumns() { throw new UnsupportedOperationException("Not implement"); } @Override public List getColumns() { throw new UnsupportedOperationException("Not implement"); } @Override public List getColumnFamilies() { throw new UnsupportedOperationException("Not implement"); } @Override public PColumnFamily getColumnFamily(byte[] bytes) throws ColumnFamilyNotFoundException { throw new UnsupportedOperationException("Not implement"); } @Override public PColumnFamily getColumnFamily(String s) throws ColumnFamilyNotFoundException { throw new UnsupportedOperationException("Not implement"); } @Override public PColumn getColumnForColumnName(String colname) throws ColumnNotFoundException, AmbiguousColumnException { if (!colMap.containsKey(colname)) { throw new ColumnNotFoundException("Col " + colname + " not found"); } return colMap.get(colname); } @Override public PColumn getColumnForColumnQualifier(byte[] bytes, byte[] bytes1) throws ColumnNotFoundException, AmbiguousColumnException { throw new UnsupportedOperationException("Not implement"); } @Override public PColumn getPKColumn(String s) throws ColumnNotFoundException { throw new UnsupportedOperationException("Not implement"); } @Override public PRow newRow(KeyValueBuilder keyValueBuilder, long l, ImmutableBytesWritable immutableBytesWritable, boolean b, byte[]... bytes) { throw new UnsupportedOperationException("Not implement"); } @Override public PRow newRow(KeyValueBuilder keyValueBuilder, ImmutableBytesWritable immutableBytesWritable, boolean b, byte[]... bytes) { throw new UnsupportedOperationException("Not implement"); } @Override public int newKey(ImmutableBytesWritable immutableBytesWritable, byte[][] bytes) { throw new UnsupportedOperationException("Not implement"); } @Override public RowKeySchema getRowKeySchema() { throw new UnsupportedOperationException("Not implement"); } @Override public Integer getBucketNum() { throw new UnsupportedOperationException("Not implement"); } @Override public List getIndexes() { throw new UnsupportedOperationException("Not implement"); } @Override public PIndexState getIndexState() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getParentName() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getParentTableName() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getParentSchemaName() { throw new UnsupportedOperationException("Not implement"); } @Override public List getPhysicalNames() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getPhysicalName() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isImmutableRows() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean getIndexMaintainers(ImmutableBytesWritable immutableBytesWritable, PhoenixConnection phoenixConnection) { throw new UnsupportedOperationException("Not implement"); } @Override public IndexMaintainer getIndexMaintainer(PTable pTable, PhoenixConnection phoenixConnection) { throw new UnsupportedOperationException("Not implement"); } @Override public PName getDefaultFamilyName() { return null; } @Override public boolean isWALDisabled() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isMultiTenant() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean getStoreNulls() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isTransactional() { throw new UnsupportedOperationException("Not implement"); } @Override public ViewType getViewType() { throw new UnsupportedOperationException("Not implement"); } @Override public String getViewStatement() { throw new UnsupportedOperationException("Not implement"); } @Override public Short getViewIndexId() { throw new UnsupportedOperationException("Not implement"); } @Override public PTableKey getKey() { throw new UnsupportedOperationException("Not implement"); } @Override public IndexType getIndexType() { throw new UnsupportedOperationException("Not implement"); } @Override public int getBaseColumnCount() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean rowKeyOrderOptimizable() { throw new UnsupportedOperationException("Not implement"); } @Override public int getRowTimestampColPos() { throw new UnsupportedOperationException("Not implement"); } @Override public long getUpdateCacheFrequency() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isNamespaceMapped() { throw new UnsupportedOperationException("Not implement"); } @Override public String getAutoPartitionSeqName() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isAppendOnlySchema() { throw new UnsupportedOperationException("Not implement"); } @Override public ImmutableStorageScheme getImmutableStorageScheme() { throw new UnsupportedOperationException("Not implement"); } @Override public QualifierEncodingScheme getEncodingScheme() { throw new UnsupportedOperationException("Not implement"); } @Override public EncodedCQCounter getEncodedCQCounter() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean useStatsForParallelization() { throw new UnsupportedOperationException("Not implement"); } @Override public int getEstimatedSize() { throw new UnsupportedOperationException("Not implement"); } public static class ThinClientPColumn implements PColumn { private String colName; private PDataType pDataType; public ThinClientPColumn(String colName, PDataType pDataType) { this.colName = colName; this.pDataType = pDataType; } @Override public PName getName() { throw new UnsupportedOperationException("Not implement"); } @Override public PName getFamilyName() { throw new UnsupportedOperationException("Not implement"); } @Override public int getPosition() { throw new UnsupportedOperationException("Not implement"); } @Override public Integer getArraySize() { throw new UnsupportedOperationException("Not implement"); } @Override public byte[] getViewConstant() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isViewReferenced() { throw new UnsupportedOperationException("Not implement"); } @Override public int getEstimatedSize() { throw new UnsupportedOperationException("Not implement"); } @Override public String getExpressionStr() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isRowTimestamp() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isDynamic() { throw new UnsupportedOperationException("Not implement"); } @Override public byte[] getColumnQualifierBytes() { throw new UnsupportedOperationException("Not implement"); } @Override public boolean isNullable() { throw new UnsupportedOperationException("Not implement"); } @Override public PDataType getDataType() { return pDataType; } @Override public Integer getMaxLength() { throw new UnsupportedOperationException("Not implement"); } @Override public Integer getScale() { throw new UnsupportedOperationException("Not implement"); } @Override public SortOrder getSortOrder() { throw new UnsupportedOperationException("Not implement"); } } } ================================================ FILE: hbase11xsqlwriter/src/main/resources/plugin.json ================================================ { "name": "hbase11xsqlwriter", "class": "com.alibaba.datax.plugin.writer.hbase11xsqlwriter.HbaseSQLWriter", "description": "useScene: prod. mechanism: use hbase sql UPSERT to put data, index tables will be updated too.", "developer": "alibaba" } ================================================ FILE: hbase11xwriter/doc/.gitkeep ================================================ ================================================ FILE: hbase11xwriter/doc/hbase11xwriter.md ================================================ # Hbase094XWriter & Hbase11XWriter 插件文档 ___ ## 1 快速介绍 HbaseWriter 插件实现了从向Hbase中写取数据。在底层实现上,HbaseWriter 通过 HBase 的 Java 客户端连接远程 HBase 服务,并通过 put 方式写入Hbase。 ### 1.1支持功能 1、目前HbaseWriter支持的Hbase版本有:Hbase0.94.x和Hbase1.1.x。 * 若您的hbase版本为Hbase0.94.x,writer端的插件请选择:hbase094xwriter,即: ``` "writer": { "name": "hbase094xwriter" } ``` * 若您的hbase版本为Hbase1.1.x,writer端的插件请选择:hbase11xwriter,即: ``` "writer": { "name": "hbase11xwriter" } ``` 2、目前HbaseWriter支持源端多个字段拼接作为hbase 表的 rowkey,具体配置参考:rowkeyColumn配置; 3、写入hbase的时间戳(版本)支持:用当前时间作为版本,指定源端列作为版本,指定一个时间 三种方式作为版本; ### 1.2 限制 1、目前只支持源端为横表写入,不支持竖表(源端读出的为四元组: rowKey,family:qualifier,timestamp,value)模式的数据写入;本期目标主要是替换DataX2中的habsewriter,下次迭代考虑支持。 2、目前不支持写入hbase前清空表数据,若需要清空数据请联系HBase PE ## 2 实现原理 简而言之,HbaseWriter 通过 HBase 的 Java 客户端,通过 HTable, Put等 API,将从上游Reader读取的数据写入HBase你hbase11xwriter与hbase094xwriter的主要不同在于API的调用不同,Hbase1.1.x废弃了很多Hbase0.94.x的api。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从本地写入hbase1.1.x的作业: ``` { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": "/Users/shf/workplace/datax_test/hbase11xwriter/txt/normal.txt", "charset": "UTF-8", "column": [ { "index": 0, "type": "String" }, { "index": 1, "type": "string" }, { "index": 2, "type": "string" }, { "index": 3, "type": "string" }, { "index": 4, "type": "string" }, { "index": 5, "type": "string" }, { "index": 6, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "hbase11xwriter", "parameter": { "hbaseConfig": { "hbase.zookeeper.quorum": "***" }, "table": "writer", "mode": "normal", "rowkeyColumn": [ { "index":0, "type":"string" }, { "index":-1, "type":"string", "value":"_" } ], "column": [ { "index":1, "name": "cf1:q1", "type": "string" }, { "index":2, "name": "cf1:q2", "type": "string" }, { "index":3, "name": "cf1:q3", "type": "string" }, { "index":4, "name": "cf2:q1", "type": "string" }, { "index":5, "name": "cf2:q2", "type": "string" }, { "index":6, "name": "cf2:q3", "type": "string" } ], "versionColumn":{ "index": -1, "value":"123456789" }, "encoding": "utf-8" } } } ] } } ``` ### 3.2 参数说明 * **hbaseConfig** * 描述:连接HBase集群需要的配置信息,JSON格式。必填的项是hbase.zookeeper.quorum,表示HBase的ZK链接地址。同时可以补充更多HBase client的配置,如:设置scan的cache、batch来优化与服务器的交互。 * 必选:是
* 默认值:无
* **mode** * 描述:写hbase的模式,目前只支持normal 模式,后续考虑动态列模式
* 必选:是
* 默认值:无
* **table** * 描述:要写的 hbase 表名(大小写敏感)
* 必选:是
* 默认值:无
* **encoding** * 描述:编码方式,UTF-8 或是 GBK,用于 String 转 HBase byte[]时的编码
* 必选:否
* 默认值:UTF-8
* **column** * 描述:要写入的hbase字段。index:指定该列对应reader端column的索引,从0开始;name:指定hbase表中的列,必须为 列族:列名 的格式;type:指定写入数据类型,用于转换HBase byte[]。配置格式如下: ``` "column": [ { "index":1, "name": "cf1:q1", "type": "string" }, { "index":2, "name": "cf1:q2", "type": "string" } ] ``` * 必选:是
* 默认值:无
* **rowkeyColumn** * 描述:要写入的hbase的rowkey列。index:指定该列对应reader端column的索引,从0开始,若为常量index为-1;type:指定写入数据类型,用于转换HBase byte[];value:配置常量,常作为多个字段的拼接符。hbasewriter会将rowkeyColumn中所有列按照配置顺序进行拼接作为写入hbase的rowkey,不能全为常量。配置格式如下: ``` "rowkeyColumn": [ { "index":0, "type":"string" }, { "index":-1, "type":"string", "value":"_" } ] ``` * 必选:是
* 默认值:无
* **versionColumn** * 描述:指定写入hbase的时间戳。支持:当前时间、指定时间列,指定时间,三者选一。若不配置表示用当前时间。index:指定对应reader端column的索引,从0开始,需保证能转换为long,若是Date类型,会尝试用yyyy-MM-dd HH:mm:ss和yyyy-MM-dd HH:mm:ss SSS去解析;若为指定时间index为-1;value:指定时间的值,long值。配置格式如下: ``` "versionColumn":{ "index":1 } ``` 或者 ``` "versionColumn":{ "index":-1, "value":123456789 } ``` * 必选:否
* 默认值:无
* **nullMode** * 描述:读取的null值时,如何处理。支持两种方式:(1)skip:表示不向hbase写这列;(2)empty:写入HConstants.EMPTY_BYTE_ARRAY,即new byte [0]
* 必选:否
* 默认值:skip
* **walFlag** * 描述:在HBae client向集群中的RegionServer提交数据时(Put/Delete操作),首先会先写WAL(Write Ahead Log)日志(即HLog,一个RegionServer上的所有Region共享一个HLog),只有当WAL日志写成功后,再接着写MemStore,然后客户端被通知提交数据成功;如果写WAL日志失败,客户端则被通知提交失败。关闭(false)放弃写WAL日志,从而提高数据写入的性能。
* 必选:否
* 默认值:false
* **writeBufferSize** * 描述:设置HBae client的写buffer大小,单位字节。配合autoflush使用。autoflush,开启(true)表示Hbase client在写的时候有一条put就执行一次更新;关闭(false),表示Hbase client在写的时候只有当put填满客户端写缓存时,才实际向HBase服务端发起写请求
* 必选:否
* 默认值:8M
### 3.3 HBase支持的列类型 * BOOLEAN * SHORT * INT * LONG * FLOAT * DOUBLE * STRING 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 ## 4 性能报告 略 ## 5 约束限制 略 ## 6 FAQ *** ================================================ FILE: hbase11xwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hbase11xwriter hbase11xwriter 0.0.1-SNAPSHOT jar 1.1.3 2.5.0 1.8 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.apache.hadoop hadoop-hdfs ${hadoop.version} org.apache.hbase hbase-client ${hbase.version} jdk.tools jdk.tools org.apache.hbase hbase-common ${hbase.version} com.google.guava guava 12.0.1 commons-codec commons-codec ${commons-codec.version} com.alibaba.hbase alihbase-connector 1.0.4 org.apache.hbase hbase-client junit junit test com.alibaba.datax datax-core ${datax-project-version} test org.mockito mockito-all 1.9.5 test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hbase11xwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/hbase11xwriter target/ hbase11xwriter-0.0.1-SNAPSHOT.jar plugin/writer/hbase11xwriter false plugin/writer/hbase11xwriter/libs runtime ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/ColumnType.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang.StringUtils; import java.util.Arrays; /** * 只对 normal 模式读取时有用,多版本读取时,不存在列类型的 */ public enum ColumnType { STRING("string"), BOOLEAN("boolean"), SHORT("short"), INT("int"), LONG("long"), FLOAT("float"), DOUBLE("double") ; private String typeName; ColumnType(String typeName) { this.typeName = typeName; } public static ColumnType getByTypeName(String typeName) { if(StringUtils.isBlank(typeName)){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持该类型:%s, 目前支持的类型是:%s", typeName, Arrays.asList(values()))); } for (ColumnType columnType : values()) { if (StringUtils.equalsIgnoreCase(columnType.typeName, typeName.trim())) { return columnType; } } throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持该类型:%s, 目前支持的类型是:%s", typeName, Arrays.asList(values()))); } @Override public String toString() { return this.typeName; } } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; public final class Constant { public static final String DEFAULT_ENCODING = "UTF-8"; public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String DEFAULT_NULL_MODE = "skip"; public static final long DEFAULT_WRITE_BUFFER_SIZE = 8 * 1024 * 1024; } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Hbase11xHelper.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.Charset; import java.util.List; import java.util.Map; public class Hbase11xHelper { private static final Logger LOG = LoggerFactory.getLogger(Hbase11xHelper.class); public static org.apache.hadoop.conf.Configuration getHbaseConfiguration(String hbaseConfig) { if (StringUtils.isBlank(hbaseConfig)) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.REQUIRED_VALUE, "读 Hbase 时需要配置hbaseConfig,其内容为 Hbase 连接信息,请联系 Hbase PE 获取该信息."); } org.apache.hadoop.conf.Configuration hConfiguration = HBaseConfiguration.create(); try { Map hbaseConfigMap = JSON.parseObject(hbaseConfig, new TypeReference>() {}); // 用户配置的 key-value 对 来表示 hbaseConfig Validate.isTrue(hbaseConfigMap != null, "hbaseConfig不能为空Map结构!"); for (Map.Entry entry : hbaseConfigMap.entrySet()) { hConfiguration.set(entry.getKey(), entry.getValue()); } } catch (Exception e) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.GET_HBASE_CONNECTION_ERROR, e); } return hConfiguration; } public static org.apache.hadoop.hbase.client.Connection getHbaseConnection(String hbaseConfig) { org.apache.hadoop.conf.Configuration hConfiguration = Hbase11xHelper.getHbaseConfiguration(hbaseConfig); org.apache.hadoop.hbase.client.Connection hConnection = null; try { hConnection = ConnectionFactory.createConnection(hConfiguration); } catch (Exception e) { Hbase11xHelper.closeConnection(hConnection); throw DataXException.asDataXException(Hbase11xWriterErrorCode.GET_HBASE_CONNECTION_ERROR, e); } return hConnection; } public static Table getTable(com.alibaba.datax.common.util.Configuration configuration){ String hbaseConfig = configuration.getString(Key.HBASE_CONFIG); String userTable = configuration.getString(Key.TABLE); long writeBufferSize = configuration.getLong(Key.WRITE_BUFFER_SIZE, Constant.DEFAULT_WRITE_BUFFER_SIZE); org.apache.hadoop.hbase.client.Connection hConnection = Hbase11xHelper.getHbaseConnection(hbaseConfig); TableName hTableName = TableName.valueOf(userTable); org.apache.hadoop.hbase.client.Admin admin = null; org.apache.hadoop.hbase.client.Table hTable = null; try { admin = hConnection.getAdmin(); Hbase11xHelper.checkHbaseTable(admin,hTableName); hTable = hConnection.getTable(hTableName); BufferedMutatorParams bufferedMutatorParams = new BufferedMutatorParams(hTableName); bufferedMutatorParams.writeBufferSize(writeBufferSize); } catch (Exception e) { Hbase11xHelper.closeTable(hTable); Hbase11xHelper.closeAdmin(admin); Hbase11xHelper.closeConnection(hConnection); throw DataXException.asDataXException(Hbase11xWriterErrorCode.GET_HBASE_TABLE_ERROR, e); } return hTable; } public static BufferedMutator getBufferedMutator(com.alibaba.datax.common.util.Configuration configuration){ String hbaseConfig = configuration.getString(Key.HBASE_CONFIG); String userTable = configuration.getString(Key.TABLE); long writeBufferSize = configuration.getLong(Key.WRITE_BUFFER_SIZE, Constant.DEFAULT_WRITE_BUFFER_SIZE); org.apache.hadoop.conf.Configuration hConfiguration = Hbase11xHelper.getHbaseConfiguration(hbaseConfig); org.apache.hadoop.hbase.client.Connection hConnection = Hbase11xHelper.getHbaseConnection(hbaseConfig); TableName hTableName = TableName.valueOf(userTable); org.apache.hadoop.hbase.client.Admin admin = null; BufferedMutator bufferedMutator = null; try { admin = hConnection.getAdmin(); Hbase11xHelper.checkHbaseTable(admin,hTableName); //参考HTable getBufferedMutator() bufferedMutator = hConnection.getBufferedMutator( new BufferedMutatorParams(hTableName) .pool(HTable.getDefaultExecutor(hConfiguration)) .writeBufferSize(writeBufferSize)); } catch (Exception e) { Hbase11xHelper.closeBufferedMutator(bufferedMutator); Hbase11xHelper.closeAdmin(admin); Hbase11xHelper.closeConnection(hConnection); throw DataXException.asDataXException(Hbase11xWriterErrorCode.GET_HBASE_BUFFEREDMUTATOR_ERROR, e); } return bufferedMutator; } public static void deleteTable(com.alibaba.datax.common.util.Configuration configuration) { String userTable = configuration.getString(Key.TABLE); LOG.info(String.format("由于您配置了deleteType delete,HBasWriter begins to delete table %s .", userTable)); Scan scan = new Scan(); org.apache.hadoop.hbase.client.Table hTable =Hbase11xHelper.getTable(configuration); ResultScanner scanner = null; try { scanner = hTable.getScanner(scan); for (Result rr = scanner.next(); rr != null; rr = scanner.next()) { hTable.delete(new Delete(rr.getRow())); } } catch (Exception e) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.DELETE_HBASE_ERROR, e); }finally { if(scanner != null){ scanner.close(); } Hbase11xHelper.closeTable(hTable); } } public static void truncateTable(com.alibaba.datax.common.util.Configuration configuration) { String hbaseConfig = configuration.getString(Key.HBASE_CONFIG); String userTable = configuration.getString(Key.TABLE); LOG.info(String.format("由于您配置了 truncate 为true,HBasWriter begins to truncate table %s .", userTable)); TableName hTableName = TableName.valueOf(userTable); org.apache.hadoop.hbase.client.Connection hConnection = Hbase11xHelper.getHbaseConnection(hbaseConfig); org.apache.hadoop.hbase.client.Admin admin = null; try{ admin = hConnection.getAdmin(); Hbase11xHelper.checkHbaseTable(admin,hTableName); admin.disableTable(hTableName); admin.truncateTable(hTableName,true); }catch (Exception e) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.TRUNCATE_HBASE_ERROR, e); }finally { Hbase11xHelper.closeAdmin(admin); Hbase11xHelper.closeConnection(hConnection); } } public static void closeConnection(Connection hConnection){ try { if(null != hConnection) hConnection.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.CLOSE_HBASE_CONNECTION_ERROR, e); } } public static void closeAdmin(Admin admin){ try { if(null != admin) admin.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.CLOSE_HBASE_AMIN_ERROR, e); } } public static void closeBufferedMutator(BufferedMutator bufferedMutator){ try { if(null != bufferedMutator) bufferedMutator.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.CLOSE_HBASE_BUFFEREDMUTATOR_ERROR, e); } } public static void closeTable(Table table){ try { if(null != table) table.close(); } catch (IOException e) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.CLOSE_HBASE_TABLE_ERROR, e); } } private static void checkHbaseTable(Admin admin, TableName hTableName) throws IOException { if(!admin.tableExists(hTableName)){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "HBase源头表" + hTableName.toString() + "不存在, 请检查您的配置 或者 联系 Hbase 管理员."); } if(!admin.isTableAvailable(hTableName)){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "HBase源头表" +hTableName.toString() + " 不可用, 请检查您的配置 或者 联系 Hbase 管理员."); } if(admin.isTableDisabled(hTableName)){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "HBase源头表" +hTableName.toString() + "is disabled, 请检查您的配置 或者 联系 Hbase 管理员."); } } public static void validateParameter(com.alibaba.datax.common.util.Configuration originalConfig) { originalConfig.getNecessaryValue(Key.HBASE_CONFIG, Hbase11xWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.TABLE, Hbase11xWriterErrorCode.REQUIRED_VALUE); Hbase11xHelper.validateMode(originalConfig); String encoding = originalConfig.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); if (!Charset.isSupported(encoding)) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持您所配置的编码:[%s]", encoding)); } originalConfig.set(Key.ENCODING, encoding); Boolean walFlag = originalConfig.getBool(Key.WAL_FLAG, false); originalConfig.set(Key.WAL_FLAG, walFlag); long writeBufferSize = originalConfig.getLong(Key.WRITE_BUFFER_SIZE,Constant.DEFAULT_WRITE_BUFFER_SIZE); originalConfig.set(Key.WRITE_BUFFER_SIZE, writeBufferSize); } private static void validateMode(com.alibaba.datax.common.util.Configuration originalConfig){ String mode = originalConfig.getNecessaryValue(Key.MODE,Hbase11xWriterErrorCode.REQUIRED_VALUE); ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: { validateRowkeyColumn(originalConfig); validateColumn(originalConfig); validateVersionColumn(originalConfig); break; } default: throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbase11xWriter不支持该 mode 类型:%s", mode)); } } private static void validateColumn(com.alibaba.datax.common.util.Configuration originalConfig){ List columns = originalConfig.getListConfiguration(Key.COLUMN); if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.REQUIRED_VALUE, "column为必填项,其形式为:column:[{\"index\": 0,\"name\": \"cf0:column0\",\"type\": \"string\"},{\"index\": 1,\"name\": \"cf1:column1\",\"type\": \"long\"}]"); } for (Configuration aColumn : columns) { Integer index = aColumn.getInt(Key.INDEX); String type = aColumn.getNecessaryValue(Key.TYPE,Hbase11xWriterErrorCode.REQUIRED_VALUE); String name = aColumn.getNecessaryValue(Key.NAME,Hbase11xWriterErrorCode.REQUIRED_VALUE); ColumnType.getByTypeName(type); if(name.split(":").length != 2){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format("您column配置项中name配置的列格式[%s]不正确,name应该配置为 列族:列名 的形式, 如 {\"index\": 1,\"name\": \"cf1:q1\",\"type\": \"long\"}", name)); } if(index == null || index < 0){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "您的column配置项不正确,配置项中中index为必填项,且为非负数,请检查并修改."); } } } private static void validateRowkeyColumn(com.alibaba.datax.common.util.Configuration originalConfig){ List rowkeyColumn = originalConfig.getListConfiguration(Key.ROWKEY_COLUMN); if (rowkeyColumn == null || rowkeyColumn.isEmpty()) { throw DataXException.asDataXException(Hbase11xWriterErrorCode.REQUIRED_VALUE, "rowkeyColumn为必填项,其形式为:rowkeyColumn:[{\"index\": 0,\"type\": \"string\"},{\"index\": -1,\"type\": \"string\",\"value\": \"_\"}]"); } int rowkeyColumnSize = rowkeyColumn.size(); //包含{"index":0,"type":"string"} 或者 {"index":-1,"type":"string","value":"_"} for (Configuration aRowkeyColumn : rowkeyColumn) { Integer index = aRowkeyColumn.getInt(Key.INDEX); String type = aRowkeyColumn.getNecessaryValue(Key.TYPE,Hbase11xWriterErrorCode.REQUIRED_VALUE); ColumnType.getByTypeName(type); if(index == null ){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.REQUIRED_VALUE, "rowkeyColumn配置项中index为必填项"); } //不能只有-1列,即rowkey连接串 if(rowkeyColumnSize ==1 && index == -1){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "rowkeyColumn配置项不能全为常量列,至少指定一个rowkey列"); } if(index == -1){ aRowkeyColumn.getNecessaryValue(Key.VALUE,Hbase11xWriterErrorCode.REQUIRED_VALUE); } } } private static void validateVersionColumn(com.alibaba.datax.common.util.Configuration originalConfig){ Configuration versionColumn = originalConfig.getConfiguration(Key.VERSION_COLUMN); //为null,表示用当前时间;指定列,需要index if(versionColumn != null){ Integer index = versionColumn.getInt(Key.INDEX); if(index == null ){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.REQUIRED_VALUE, "versionColumn配置项中index为必填项"); } if(index == -1){ //指定时间,需要index=-1,value versionColumn.getNecessaryValue(Key.VALUE,Hbase11xWriterErrorCode.REQUIRED_VALUE); }else if(index < 0){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "您versionColumn配置项中index配置不正确,只能取-1或者非负数"); } } } } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Hbase11xWriter.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.StringUtils; import java.util.ArrayList; import java.util.List; /** * Hbase11xWriter * Created by shf on 16/3/17. */ public class Hbase11xWriter extends Writer { public static class Job extends Writer.Job { private Configuration originConfig = null; @Override public void init() { this.originConfig = this.getPluginJobConf(); Hbase11xHelper.validateParameter(this.originConfig); } @Override public void prepare(){ Boolean truncate = originConfig.getBool(Key.TRUNCATE,false); if(truncate){ Hbase11xHelper.truncateTable(this.originConfig); } } @Override public List split(int mandatoryNumber) { List splitResultConfigs = new ArrayList(); for (int j = 0; j < mandatoryNumber; j++) { splitResultConfigs.add(originConfig.clone()); } return splitResultConfigs; } @Override public void destroy() { } } public static class Task extends Writer.Task { private Configuration taskConfig; private HbaseAbstractTask hbaseTaskProxy; @Override public void init() { this.taskConfig = super.getPluginJobConf(); String mode = this.taskConfig.getString(Key.MODE); ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: this.hbaseTaskProxy = new NormalTask(this.taskConfig); break; default: throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "Hbasereader 不支持此类模式:" + modeType); } } @Override public void startWrite(RecordReceiver lineReceiver) { this.hbaseTaskProxy.startWriter(lineReceiver,super.getTaskPluginCollector()); } @Override public void destroy() { if (this.hbaseTaskProxy != null) { this.hbaseTaskProxy.close(); } } } } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Hbase11xWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.spi.ErrorCode; /** * Hbase11xWriterErrorCode * Created by shf on 16/3/8. */ public enum Hbase11xWriterErrorCode implements ErrorCode { REQUIRED_VALUE("Hbasewriter-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("Hbasewriter-01", "您填写的参数值不合法."), GET_HBASE_CONNECTION_ERROR("Hbasewriter-02", "获取Hbase连接时出错."), GET_HBASE_TABLE_ERROR("Hbasewriter-03", "获取 Hbase table时出错."), CLOSE_HBASE_CONNECTION_ERROR("Hbasewriter-04", "关闭Hbase连接时出错."), CLOSE_HBASE_AMIN_ERROR("Hbasewriter-05", "关闭Hbase admin时出错."), CLOSE_HBASE_TABLE_ERROR("Hbasewriter-06", "关闭Hbase table时时出错."), PUT_HBASE_ERROR("Hbasewriter-07", "写入hbase时发生IO异常."), DELETE_HBASE_ERROR("Hbasewriter-08", "delete hbase表时发生异常."), TRUNCATE_HBASE_ERROR("Hbasewriter-09", "truncate hbase表时发生异常."), CONSTRUCT_ROWKEY_ERROR("Hbasewriter-10", "构建rowkey时发生异常."), CONSTRUCT_VERSION_ERROR("Hbasewriter-11", "构建version时发生异常."), GET_HBASE_BUFFEREDMUTATOR_ERROR("Hbasewriter-12", "获取hbase BufferedMutator 时出错."), CLOSE_HBASE_BUFFEREDMUTATOR_ERROR("Hbasewriter-13", "关闭 Hbase BufferedMutator时出错."), ; private final String code; private final String description; private Hbase11xWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/HbaseAbstractTask.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.client.BufferedMutator; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.Charset; import java.util.List; public abstract class HbaseAbstractTask { private final static Logger LOG = LoggerFactory.getLogger(HbaseAbstractTask.class); public NullModeType nullMode = null; public List columns; public List rowkeyColumn; public Configuration versionColumn; //public Table htable; public String encoding; public Boolean walFlag; public BufferedMutator bufferedMutator; public HbaseAbstractTask(com.alibaba.datax.common.util.Configuration configuration) { //this.htable = Hbase11xHelper.getTable(configuration); this.bufferedMutator = Hbase11xHelper.getBufferedMutator(configuration); this.columns = configuration.getListConfiguration(Key.COLUMN); this.rowkeyColumn = configuration.getListConfiguration(Key.ROWKEY_COLUMN); this.versionColumn = configuration.getConfiguration(Key.VERSION_COLUMN); this.encoding = configuration.getString(Key.ENCODING,Constant.DEFAULT_ENCODING); this.nullMode = NullModeType.getByTypeName(configuration.getString(Key.NULL_MODE,Constant.DEFAULT_NULL_MODE)); this.walFlag = configuration.getBool(Key.WAL_FLAG, false); } public void startWriter(RecordReceiver lineReceiver, TaskPluginCollector taskPluginCollector){ Record record; try { while ((record = lineReceiver.getFromReader()) != null) { Put put; try { put = convertRecordToPut(record); } catch (Exception e) { taskPluginCollector.collectDirtyRecord(record, e); continue; } try { //this.htable.put(put); this.bufferedMutator.mutate(put); } catch (IllegalArgumentException e) { if(e.getMessage().equals("No columns to insert") && nullMode.equals(NullModeType.Skip)){ LOG.info(String.format("record is empty, 您配置nullMode为[skip],将会忽略这条记录,record[%s]", record.toString())); continue; }else { taskPluginCollector.collectDirtyRecord(record, e); continue; } } } }catch (IOException e){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.PUT_HBASE_ERROR,e); }finally { //Hbase11xHelper.closeTable(this.htable); Hbase11xHelper.closeBufferedMutator(this.bufferedMutator); } } public abstract Put convertRecordToPut(Record record); public void close() { //Hbase11xHelper.closeTable(this); Hbase11xHelper.closeBufferedMutator(this.bufferedMutator); } public byte[] getColumnByte(ColumnType columnType, Column column){ byte[] bytes; if(column.getRawData() != null){ switch (columnType) { case INT: bytes = Bytes.toBytes(column.asLong().intValue()); break; case LONG: bytes = Bytes.toBytes(column.asLong()); break; case DOUBLE: bytes = Bytes.toBytes(column.asDouble()); break; case FLOAT: bytes = Bytes.toBytes(column.asDouble().floatValue()); break; case SHORT: bytes = Bytes.toBytes(column.asLong().shortValue()); break; case BOOLEAN: bytes = Bytes.toBytes(column.asBoolean()); break; case STRING: bytes = this.getValueByte(columnType,column.asString()); break; default: throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "HbaseWriter列不支持您配置的列类型:" + columnType); } }else{ switch (nullMode){ case Skip: bytes = null; break; case Empty: bytes = HConstants.EMPTY_BYTE_ARRAY; break; default: throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "HbaseWriter nullMode不支持您配置的类型,只支持skip或者empty"); } } return bytes; } public byte[] getValueByte(ColumnType columnType, String value){ byte[] bytes; if(value != null){ switch (columnType) { case INT: bytes = Bytes.toBytes(Integer.parseInt(value)); break; case LONG: bytes = Bytes.toBytes(Long.parseLong(value)); break; case DOUBLE: bytes = Bytes.toBytes(Double.parseDouble(value)); break; case FLOAT: bytes = Bytes.toBytes(Float.parseFloat(value)); break; case SHORT: bytes = Bytes.toBytes(Short.parseShort(value)); break; case BOOLEAN: bytes = Bytes.toBytes(Boolean.parseBoolean(value)); break; case STRING: bytes = value.getBytes(Charset.forName(encoding)); break; default: throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "HbaseWriter列不支持您配置的列类型:" + columnType); } }else{ bytes = HConstants.EMPTY_BYTE_ARRAY; } return bytes; } } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; public final class Key { public final static String HBASE_CONFIG = "hbaseConfig"; public final static String TABLE = "table"; /** * mode 可以取 normal 或者 multiVersionFixedColumn 或者 multiVersionDynamicColumn 三个值,无默认值。 *

* normal 配合 column(Map 结构的)使用 *

* multiVersion */ public final static String MODE = "mode"; public final static String ROWKEY_COLUMN = "rowkeyColumn"; public final static String VERSION_COLUMN = "versionColumn"; /** * 默认为 utf8 */ public final static String ENCODING = "encoding"; public final static String COLUMN = "column"; public static final String INDEX = "index"; public static final String NAME = "name"; public static final String TYPE = "type"; public static final String VALUE = "value"; public static final String FORMAT = "format"; /** * 默认为 EMPTY_BYTES */ public static final String NULL_MODE = "nullMode"; public static final String TRUNCATE = "truncate"; public static final String AUTO_FLUSH = "autoFlush"; public static final String WAL_FLAG = "walFlag"; public static final String WRITE_BUFFER_SIZE = "writeBufferSize"; } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/ModeType.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; public enum ModeType { Normal("normal"), MultiVersion("multiVersion") ; private String mode; ModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static ModeType getByTypeName(String modeName) { for (ModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持该 mode 类型:%s, 目前支持的 mode 类型是:%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/MultiVersionTask.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.hbase.client.Put; public class MultiVersionTask extends HbaseAbstractTask { public MultiVersionTask(Configuration configuration) { super(configuration); } @Override public Put convertRecordToPut(Record record) { if (record.getColumnNumber() != 4 ) { // multversion 模式下源头读取字段列数为4元组(rowkey,column,timestamp,value),目的端需告诉[] throw DataXException .asDataXException( Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format( "HbaseWriter multversion模式下列配置信息有错误.源头应该为四元组,实际源头读取字段数:%s,请检查您的配置并作出修改.", record.getColumnNumber())); } Put put = null; //rowkey // ColumnType rowkeyType = ColumnType.getByTypeName(String.valueOf(columnList.get(0).get(Key.TYPE))); // if(record.getColumn(0).getRawData() == null){ // throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, "HbaseWriter的rowkey不能为空,请选择合适的rowkey列"); // } // //timestamp // if(record.getColumn(2).getRawData()!= null){ // put = new Put(getColumnByte(rowkeyType,record.getColumn(0)),record.getColumn(2).asLong()); // }else{ // put = new Put(getColumnByte(rowkeyType,record.getColumn(0))); // } // //column family,qualifie // Map userColumn = columnList.get(1); // ColumnType columnType = ColumnType.getByTypeName(userColumn.get(Key.TYPE)); // String columnName = userColumn.get(Key.NAME); // String promptInfo = "Hbasewriter 中,column 的列配置格式应该是:列族:列名. 您配置的列错误:" + columnName; // String[] cfAndQualifier = columnName.split(":"); // Validate.isTrue(cfAndQualifier != null && cfAndQualifier.length == 2 // && StringUtils.isNotBlank(cfAndQualifier[0]) // && StringUtils.isNotBlank(cfAndQualifier[1]), promptInfo); // // if(!columnName.equals(record.getColumn(1).asString())){ // throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, // String.format("您的配置中源端和目的端列名不一致,源端为[%s],目的端为[%s],请检查您的配置并作出修改.",record.getColumn(1).asString(),columnName)); // // } // //value // Column column = record.getColumn(3); // put.addColumn(Bytes.toBytes( // cfAndQualifier[0]), // Bytes.toBytes(cfAndQualifier[1]), // getColumnByte(columnType,column) // ); return put; } } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/NormalTask.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.commons.lang3.time.DateUtils; import org.apache.commons.net.ntp.TimeStamp; import org.apache.hadoop.hbase.client.Durability; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Timestamp; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import java.util.Map; public class NormalTask extends HbaseAbstractTask { private static final Logger LOG = LoggerFactory.getLogger(NormalTask.class); public NormalTask(Configuration configuration) { super(configuration); } @Override public Put convertRecordToPut(Record record){ byte[] rowkey = getRowkey(record); Put put = null; if(this.versionColumn == null){ put = new Put(rowkey); if(!super.walFlag){ //等价与0.94 put.setWriteToWAL(super.walFlag); put.setDurability(Durability.SKIP_WAL); } }else { long timestamp = getVersion(record); put = new Put(rowkey,timestamp); } for (Configuration aColumn : columns) { Integer index = aColumn.getInt(Key.INDEX); String type = aColumn.getString(Key.TYPE); ColumnType columnType = ColumnType.getByTypeName(type); String name = aColumn.getString(Key.NAME); String promptInfo = "Hbasewriter 中,column 的列配置格式应该是:列族:列名. 您配置的列错误:" + name; String[] cfAndQualifier = name.split(":"); Validate.isTrue(cfAndQualifier != null && cfAndQualifier.length == 2 && StringUtils.isNotBlank(cfAndQualifier[0]) && StringUtils.isNotBlank(cfAndQualifier[1]), promptInfo); if(index >= record.getColumnNumber()){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format("您的column配置项中中index值超出范围,根据reader端配置,index的值小于%s,而您配置的值为%s,请检查并修改.",record.getColumnNumber(),index)); } byte[] columnBytes = getColumnByte(columnType,record.getColumn(index)); //columnBytes 为null忽略这列 if(null != columnBytes){ put.addColumn(Bytes.toBytes( cfAndQualifier[0]), Bytes.toBytes(cfAndQualifier[1]), columnBytes); }else{ continue; } } return put; } public byte[] getRowkey(Record record){ byte[] rowkeyBuffer = {}; for (Configuration aRowkeyColumn : rowkeyColumn) { Integer index = aRowkeyColumn.getInt(Key.INDEX); String type = aRowkeyColumn.getString(Key.TYPE); ColumnType columnType = ColumnType.getByTypeName(type); if(index == -1){ String value = aRowkeyColumn.getString(Key.VALUE); rowkeyBuffer = Bytes.add(rowkeyBuffer,getValueByte(columnType,value)); }else{ if(index >= record.getColumnNumber()){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.CONSTRUCT_ROWKEY_ERROR, String.format("您的rowkeyColumn配置项中中index值超出范围,根据reader端配置,index的值小于%s,而您配置的值为%s,请检查并修改.",record.getColumnNumber(),index)); } byte[] value = getColumnByte(columnType,record.getColumn(index)); rowkeyBuffer = Bytes.add(rowkeyBuffer, value); } } return rowkeyBuffer; } public long getVersion(Record record){ int index = versionColumn.getInt(Key.INDEX); long timestamp; if(index == -1){ //指定时间作为版本 timestamp = versionColumn.getLong(Key.VALUE); if(timestamp < 0){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.CONSTRUCT_VERSION_ERROR, "您指定的版本非法!"); } }else{ //指定列作为版本,long/doubleColumn直接record.aslong, 其它类型尝试用yyyy-MM-dd HH:mm:ss,yyyy-MM-dd HH:mm:ss SSS去format if(index >= record.getColumnNumber()){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.CONSTRUCT_VERSION_ERROR, String.format("您的versionColumn配置项中中index值超出范围,根据reader端配置,index的值小于%s,而您配置的值为%s,请检查并修改.",record.getColumnNumber(),index)); } if(record.getColumn(index).getRawData() == null){ throw DataXException.asDataXException(Hbase11xWriterErrorCode.CONSTRUCT_VERSION_ERROR, "您指定的版本为空!"); } SimpleDateFormat df_senconds = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); SimpleDateFormat df_ms = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss SSS"); if(record.getColumn(index) instanceof LongColumn || record.getColumn(index) instanceof DoubleColumn){ timestamp = record.getColumn(index).asLong(); }else { Date date; try{ date = df_ms.parse(record.getColumn(index).asString()); }catch (ParseException e){ try { date = df_senconds.parse(record.getColumn(index).asString()); } catch (ParseException e1) { LOG.info(String.format("您指定第[%s]列作为hbase写入版本,但在尝试用yyyy-MM-dd HH:mm:ss 和 yyyy-MM-dd HH:mm:ss SSS 去解析为Date时均出错,请检查并修改",index)); throw DataXException.asDataXException(Hbase11xWriterErrorCode.CONSTRUCT_VERSION_ERROR, e1); } } timestamp = date.getTime(); } } return timestamp; } } ================================================ FILE: hbase11xwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase11xwriter/NullModeType.java ================================================ package com.alibaba.datax.plugin.writer.hbase11xwriter; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; public enum NullModeType { Skip("skip"), Empty("empty") ; private String mode; NullModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static NullModeType getByTypeName(String modeName) { for (NullModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Hbase11xWriterErrorCode.ILLEGAL_VALUE, String.format("Hbasewriter 不支持该 nullMode 类型:%s, 目前支持的 nullMode 类型是:%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: hbase11xwriter/src/main/resources/plugin.json ================================================ { "name": "hbase11xwriter", "class": "com.alibaba.datax.plugin.writer.hbase11xwriter.Hbase11xWriter", "description": "use put: prod. mechanism: use hbase java api put data.", "developer": "alibaba" } ================================================ FILE: hbase11xwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "hbase11xwriter", "parameter": { "hbaseConfig": { "hbase.rootdir": "", "hbase.cluster.distributed": "", "hbase.zookeeper.quorum": "" }, "table": "", "mode": "", "rowkeyColumn": [ ], "column": [ ], "versionColumn":{ "index": "", "value":"" }, "encoding": "" } } ================================================ FILE: hbase20xsqlreader/doc/hbase20xsqlreader.md ================================================ # hbase20xsqlreader 插件文档 ___ ## 1 快速介绍 hbase20xsqlreader插件实现了从Phoenix(HBase SQL)读取数据,对应版本为HBase2.X和Phoenix5.X。 ## 2 实现原理 简而言之,hbase20xsqlreader通过Phoenix轻客户端去连接Phoenix QueryServer,并根据用户配置信息生成查询SELECT 语句,然后发送到QueryServer读取HBase数据,并将返回结果使用DataX自定义的数据类型拼装为抽象的数据集,最终传递给下游Writer处理。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Phoenix同步抽取数据到本地的作业: ``` { "job": { "content": [ { "reader": { "name": "hbase20xsqlreader", //指定插件为hbase20xsqlreader "parameter": { "queryServerAddress": "http://127.0.0.1:8765", //填写连接Phoenix QueryServer地址 "serialization": "PROTOBUF", //QueryServer序列化格式 "table": "TEST", //读取表名 "column": ["ID", "NAME"], //所要读取列名 "splitKey": "ID" //切分列,必须是表主键 } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": "3" } } } } ``` ### 3.2 参数说明 * **queryServerAddress** * 描述:hbase20xsqlreader需要通过Phoenix轻客户端去连接Phoenix QueryServer,因此这里需要填写对应QueryServer地址。 增强版/Lindorm 用户若需透传user, password参数,可以在queryServerAddress后增加对应可选属性. 格式参考:http://127.0.0.1:8765;user=root;password=root * 必选:是
* 默认值:无
* **serialization** * 描述:QueryServer使用的序列化协议 * 必选:否
* 默认值:PROTOBUF
* **table** * 描述:所要读取表名 * 必选:是
* 默认值:无
* **schema** * 描述:表所在的schema * 必选:否
* 默认值:无
* **column** * 描述:填写需要从phoenix表中读取的列名集合,使用JSON的数组描述字段信息,空值表示读取所有列。 * 必选: 否
* 默认值:全部列
* **splitKey** * 描述:读取表时对表进行切分并行读取,切分时有两种方式:1.根据该列的最大最小值按照指定channel个数均分,这种方式仅支持整形和字符串类型切分列;2.根据设置的splitPoint进行切分 * 必选:是
* 默认值:无
* **splitPoints** * 描述:由于根据切分列最大最小值切分时不能保证避免数据热点,splitKey支持用户根据数据特征动态指定切分点,对表数据进行切分。建议切分点根据Region的startkey和endkey设置,保证每个查询对应单个Region * 必选: 否
* 默认值:无
* **where** * 描述:支持对表查询增加过滤条件,每个切分都会携带该过滤条件。 * 必选: 否
* 默认值:无
* **querySql** * 描述:支持指定多个查询语句,但查询列类型和数目必须保持一致,用户可根据实际情况手动输入表查询语句或多表联合查询语句,设置该参数后,除queryserverAddress参数必须设置外,其余参数将失去作用或可不设置。 * 必选: 否
* 默认值:无
### 3.3 类型转换 目前hbase20xsqlreader支持大部分Phoenix类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出MysqlReader针对Mysql类型转换列表: | DataX 内部类型| Phoenix 数据类型 | | -------- | ----- | | String |CHAR, VARCHAR| | Bytes |BINARY, VARBINARY| | Bool |BOOLEAN | | Long |INTEGER, TINYINT, SMALLINT, BIGINT | | Double |FLOAT, DECIMAL, DOUBLE, | | Date |DATE, TIME, TIMESTAMP | ## 4 性能报告 略 ## 5 约束限制 * 切分表时切分列仅支持单个列,且该列必须是表主键 * 不设置splitPoint默认使用自动切分,此时切分列仅支持整形和字符型 * 表名和SCHEMA名及列名大小写敏感,请与Phoenix表实际大小写保持一致 * 仅支持通过Phoenix QeuryServer读取数据,因此您的Phoenix必须启动QueryServer服务才能使用本插件 ## 6 FAQ *** ================================================ FILE: hbase20xsqlreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hbase20xsqlreader 0.0.1-SNAPSHOT jar 5.2.5-HBase-2.x com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.aliyun.phoenix ali-phoenix-shaded-thin-client ${phoenix.version} junit junit test org.mockito mockito-core 2.0.44-beta test com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-service-face test com.alibaba.datax plugin-rdbms-util 0.0.1-SNAPSHOT compile src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hbase20xsqlreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/hbase20xsqlreader target/ hbase20xsqlreader-0.0.1-SNAPSHOT.jar plugin/reader/hbase20xsqlreader false plugin/reader/hbase20xsqlreader/libs runtime ================================================ FILE: hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.hbase20xsqlreader; public class Constant { public static final String PK_TYPE = "pkType"; public static final Object PK_TYPE_STRING = "pkTypeString"; public static final Object PK_TYPE_LONG = "pkTypeLong"; public static final String DEFAULT_SERIALIZATION = "PROTOBUF"; public static final String CONNECT_STRING_TEMPLATE = "jdbc:phoenix:thin:url=%s;serialization=%s"; public static final String CONNECT_DRIVER_STRING = "org.apache.phoenix.queryserver.client.Driver"; public static final String SELECT_COLUMNS_TEMPLATE = "SELECT COLUMN_NAME, COLUMN_FAMILY FROM SYSTEM.CATALOG WHERE TABLE_NAME='%s' AND COLUMN_NAME IS NOT NULL"; public static String QUERY_SQL_TEMPLATE_WITHOUT_WHERE = "select %s from %s "; public static String QUERY_SQL_TEMPLATE = "select %s from %s where (%s)"; public static String QUERY_MIN_MAX_TEMPLATE = "SELECT MIN(%s),MAX(%s) FROM %s"; public static String QUERY_COLUMN_TYPE_TEMPLATE = "SELECT %s FROM %s LIMIT 1"; public static String QUERY_SQL_PER_SPLIT = "querySqlPerSplit"; } ================================================ FILE: hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/HBase20SQLReaderHelper.java ================================================ package com.alibaba.datax.plugin.reader.hbase20xsqlreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.RdbmsRangeSplitWrap; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.ImmutablePair; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.math.BigInteger; import java.sql.*; import java.util.ArrayList; import java.util.List; public class HBase20SQLReaderHelper { private static final Logger LOG = LoggerFactory.getLogger(HBase20SQLReaderHelper.class); private Configuration configuration; private Connection connection; private List querySql; private String fullTableName; private List columnNames; private String splitKey; private List splitPoints; public HBase20SQLReaderHelper (Configuration configuration) { this.configuration = configuration; } /** * 校验配置参数是否正确 */ public void validateParameter() { // queryserver地址必须配置 String queryServerAddress = configuration.getNecessaryValue(Key.QUERYSERVER_ADDRESS, HBase20xSQLReaderErrorCode.REQUIRED_VALUE); String serialization = configuration.getString(Key.SERIALIZATION_NAME, Constant.DEFAULT_SERIALIZATION); connection = getConnection(queryServerAddress, serialization); //判断querySql是否配置,如果配置则table配置可为空,否则table必须配置 querySql = configuration.getList(Key.QUERY_SQL, String.class); if (querySql == null || querySql.isEmpty()) { LOG.info("Split according to splitKey or split points."); String schema = configuration.getString(Key.SCHEMA, null); String tableName = configuration.getNecessaryValue(Key.TABLE, HBase20xSQLReaderErrorCode.REQUIRED_VALUE); if (schema != null && !schema.isEmpty()) { fullTableName = "\"" + schema + "\".\"" + tableName + "\""; } else { fullTableName = "\"" + tableName + "\""; } // 如果列名未配置,默认读取全部列* columnNames = configuration.getList(Key.COLUMN, String.class); splitKey = configuration.getString(Key.SPLIT_KEY, null); splitPoints = configuration.getList(Key.SPLIT_POINT); checkTable(schema, tableName); dealWhere(); } else { // 用户指定querySql,切分不做处理,根据给定sql读取数据即可 LOG.info("Split according to query sql."); } } public Connection getConnection(String queryServerAddress, String serialization) { String url = String.format(Constant.CONNECT_STRING_TEMPLATE, queryServerAddress, serialization); LOG.debug("Connecting to QueryServer [" + url + "] ..."); Connection conn; try { Class.forName(Constant.CONNECT_DRIVER_STRING); conn = DriverManager.getConnection(url); conn.setAutoCommit(false); } catch (Throwable e) { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.GET_QUERYSERVER_CONNECTION_ERROR, "无法连接QueryServer,配置不正确或服务未启动,请检查配置和服务状态或者联系HBase管理员.", e); } LOG.debug("Connected to QueryServer successfully."); return conn; } /** * 检查表名、列名和切分列是否存在 */ public void checkTable(String schema, String tableName) { Statement statement = null; ResultSet resultSet = null; try { statement = connection.createStatement(); String selectSql = String.format(Constant.SELECT_COLUMNS_TEMPLATE, tableName); // 处理schema不为空情况 if (schema == null || schema.isEmpty()) { selectSql = selectSql + " AND TABLE_SCHEM IS NULL"; } else { selectSql = selectSql + " AND TABLE_SCHEM = '" + schema + "'"; } resultSet = statement.executeQuery(selectSql); List primaryColumnNames = new ArrayList(); List allColumnName = new ArrayList(); while (resultSet.next()) { String columnName = resultSet.getString(1); allColumnName.add(columnName); // 列族为空表示该列为主键列 if (resultSet.getString(2) == null) { primaryColumnNames.add(columnName); } } if (columnNames != null && !columnNames.isEmpty()) { for (String columnName : columnNames) { if (!allColumnName.contains(columnName)) { // 用户配置的列名在元数据中不存在 throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_VALUE, "您配置的列" + columnName + "在表" + tableName + "的元数据中不存在,请检查您的配置或者联系HBase管理员."); } } } else { columnNames = allColumnName; configuration.set(Key.COLUMN, allColumnName); } if (splitKey != null) { // 切分列必须是主键列,否则会严重影响读取性能 if (!primaryColumnNames.contains(splitKey)) { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_VALUE, "您配置的切分列" + splitKey + "不是表" + tableName + "的主键,请检查您的配置或者联系HBase管理员."); } } } catch (SQLException e) { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.GET_PHOENIX_TABLE_ERROR, "获取表" + tableName + "信息失败,请检查您的集群和表状态或者联系HBase管理员.", e); } finally { closeJdbc(null, statement, resultSet); } } public void closeJdbc(Connection connection, Statement statement, ResultSet resultSet) { try { if (resultSet != null) { resultSet.close(); } if (statement != null) { statement.close(); } if (connection != null) { connection.close(); } } catch (SQLException e) { LOG.warn("数据库连接关闭异常.", HBase20xSQLReaderErrorCode.CLOSE_PHOENIX_CONNECTION_ERROR, e); } } public void dealWhere() { String where = configuration.getString(Key.WHERE, null); if(StringUtils.isNotBlank(where)) { String whereImprove = where.trim(); if(whereImprove.endsWith(";") || whereImprove.endsWith(";")) { whereImprove = whereImprove.substring(0,whereImprove.length()-1); } configuration.set(Key.WHERE, whereImprove); } } /** * 对表进行切分 */ public List doSplit(int adviceNumber) { List pluginParams = new ArrayList(); List rangeList; String where = configuration.getString(Key.WHERE); boolean hasWhere = StringUtils.isNotBlank(where); if (querySql == null || querySql.isEmpty()) { // 如果splitPoints为空,则根据splitKey自动切分,不过这种切分方式无法保证数据均分,且只支持整形和字符型列 if (splitPoints == null || splitPoints.isEmpty()) { LOG.info("Split according min and max value of splitColumn..."); Pair minMaxPK = getPkRange(configuration); if (null == minMaxPK) { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_SPLIT_PK, "根据切分主键切分表失败. DataX仅支持切分主键为一个,并且类型为整数或者字符串类型. " + "请尝试使用其他的切分主键或者联系 HBase管理员 进行处理."); } if (null == minMaxPK.getLeft() || null == minMaxPK.getRight()) { // 切分后获取到的start/end 有 Null 的情况 pluginParams.add(configuration); return pluginParams; } boolean isStringType = Constant.PK_TYPE_STRING.equals(configuration .getString(Constant.PK_TYPE)); boolean isLongType = Constant.PK_TYPE_LONG.equals(configuration .getString(Constant.PK_TYPE)); if (isStringType) { rangeList = RdbmsRangeSplitWrap.splitAndWrap( String.valueOf(minMaxPK.getLeft()), String.valueOf(minMaxPK.getRight()), adviceNumber, splitKey, "'", null); } else if (isLongType) { rangeList = RdbmsRangeSplitWrap.splitAndWrap( new BigInteger(minMaxPK.getLeft().toString()), new BigInteger(minMaxPK.getRight().toString()), adviceNumber, splitKey); } else { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_SPLIT_PK, "您配置的切分主键(splitPk) 类型 DataX 不支持. DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. " + "请尝试使用其他的切分主键或者联系HBase管理员进行处理."); } } else { LOG.info("Split according splitPoints..."); // 根据指定splitPoints进行切分 rangeList = buildSplitRange(); } String tempQuerySql; if (null != rangeList && !rangeList.isEmpty()) { for (String range : rangeList) { Configuration tempConfig = configuration.clone(); tempQuerySql = buildQuerySql(columnNames, fullTableName, where) + (hasWhere ? " and " : " where ") + range; LOG.info("Query SQL: " + tempQuerySql); tempConfig.set(Constant.QUERY_SQL_PER_SPLIT, tempQuerySql); pluginParams.add(tempConfig); } } else { Configuration tempConfig = configuration.clone(); tempQuerySql = buildQuerySql(columnNames, fullTableName, where) + (hasWhere ? " and " : " where ") + String.format(" %s IS NOT NULL", splitKey); LOG.info("Query SQL: " + tempQuerySql); tempConfig.set(Constant.QUERY_SQL_PER_SPLIT, tempQuerySql); pluginParams.add(tempConfig); } } else { // 指定querySql不需要切分 for (String sql : querySql) { Configuration tempConfig = configuration.clone(); tempConfig.set(Constant.QUERY_SQL_PER_SPLIT, sql); pluginParams.add(tempConfig); } } return pluginParams; } public static String buildQuerySql(List columnNames, String table, String where) { String querySql; StringBuilder columnBuilder = new StringBuilder(); for (String columnName : columnNames) { columnBuilder.append("\"").append(columnName).append("\","); } columnBuilder.setLength(columnBuilder.length() -1); if (StringUtils.isBlank(where)) { querySql = String.format(Constant.QUERY_SQL_TEMPLATE_WITHOUT_WHERE, columnBuilder.toString(), table); } else { querySql = String.format(Constant.QUERY_SQL_TEMPLATE, columnBuilder.toString(), table, where); } return querySql; } private List buildSplitRange() { String getSplitKeyTypeSQL = String.format(Constant.QUERY_COLUMN_TYPE_TEMPLATE, splitKey, fullTableName); Statement statement = null; ResultSet resultSet = null; List splitConditions = new ArrayList(); try { statement = connection.createStatement(); resultSet = statement.executeQuery(getSplitKeyTypeSQL); ResultSetMetaData rsMetaData = resultSet.getMetaData(); int type = rsMetaData.getColumnType(1); String symbol = "%s"; switch (type) { case Types.CHAR: case Types.VARCHAR: symbol = "'%s'"; break; case Types.DATE: symbol = "TO_DATE('%s')"; break; case Types.TIME: symbol = "TO_TIME('%s')"; break; case Types.TIMESTAMP: symbol = "TO_TIMESTAMP('%s')"; break; case Types.BINARY: case Types.VARBINARY: case Types.ARRAY: throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_SPLIT_PK, "切分列类型为" + rsMetaData.getColumnTypeName(1) + ",暂不支持该类型字段作为切分列。"); } String splitCondition = null; for (int i = 0; i <= splitPoints.size(); i++) { if (i == 0) { splitCondition = splitKey + " <= " + String.format(symbol, splitPoints.get(i)); } else if (i == splitPoints.size()) { splitCondition = splitKey + " > " + String.format(symbol, splitPoints.get(i - 1)); } else { splitCondition = splitKey + " > " + String.format(symbol, splitPoints.get(i - 1)) + " AND " + splitKey + " <= " + String.format(symbol, splitPoints.get(i)); } splitConditions.add(splitCondition); } return splitConditions; } catch (SQLException e) { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.GET_TABLE_COLUMNTYPE_ERROR, "获取切分列类型失败,请检查服务或给定表和切分列是否正常,或者联系HBase管理员进行处理。", e); } finally { closeJdbc(null, statement, resultSet); } } private Pair getPkRange(Configuration configuration) { String pkRangeSQL = String.format(Constant.QUERY_MIN_MAX_TEMPLATE, splitKey, splitKey, fullTableName); String where = configuration.getString(Key.WHERE); if (StringUtils.isNotBlank(where)) { pkRangeSQL = String.format("%s WHERE (%s AND %s IS NOT NULL)", pkRangeSQL, where, splitKey); } Statement statement = null; ResultSet resultSet = null; Pair minMaxPK = null; try { statement = connection.createStatement(); resultSet = statement.executeQuery(pkRangeSQL); ResultSetMetaData rsMetaData = resultSet.getMetaData(); if (isPKTypeValid(rsMetaData)) { if (isStringType(rsMetaData.getColumnType(1))) { if(configuration != null) { configuration .set(Constant.PK_TYPE, Constant.PK_TYPE_STRING); } if (resultSet.next()) { minMaxPK = new ImmutablePair( resultSet.getString(1), resultSet.getString(2)); } } else if (isLongType(rsMetaData.getColumnType(1))) { if(configuration != null) { configuration.set(Constant.PK_TYPE, Constant.PK_TYPE_LONG); } if (resultSet.next()) { minMaxPK = new ImmutablePair( resultSet.getLong(1), resultSet.getLong(2)); } } else { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_SPLIT_PK, "您配置的DataX切分主键(splitPk)有误. 因为您配置的切分主键(splitPk) 类型 DataX 不支持. " + "DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系HBASE管理员进行处理."); } } else { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_SPLIT_PK, "您配置的DataX切分主键(splitPk)有误. 因为您配置的切分主键(splitPk) 类型 DataX 不支持. " + "DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系HBASE管理员进行处理."); } } catch (SQLException e) { throw DataXException.asDataXException(HBase20xSQLReaderErrorCode.ILLEGAL_SPLIT_PK, e); } finally { closeJdbc(null, statement, resultSet); } return minMaxPK; } private static boolean isPKTypeValid(ResultSetMetaData rsMetaData) { boolean ret = false; try { int minType = rsMetaData.getColumnType(1); int maxType = rsMetaData.getColumnType(2); boolean isNumberType = isLongType(minType); boolean isStringType = isStringType(minType); if (minType == maxType && (isNumberType || isStringType)) { ret = true; } } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "DataX获取切分主键(splitPk)字段类型失败. 该错误通常是系统底层异常导致. 请联系旺旺:askdatax或者DBA处理."); } return ret; } private static boolean isLongType(int type) { boolean isValidLongType = type == Types.BIGINT || type == Types.INTEGER || type == Types.SMALLINT || type == Types.TINYINT; return isValidLongType; } private static boolean isStringType(int type) { return type == Types.CHAR || type == Types.NCHAR || type == Types.VARCHAR || type == Types.LONGVARCHAR || type == Types.NVARCHAR; } } ================================================ FILE: hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/HBase20xSQLReader.java ================================================ package com.alibaba.datax.plugin.reader.hbase20xsqlreader; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import java.util.List; public class HBase20xSQLReader extends Reader { public static class Job extends Reader.Job { private Configuration originalConfig; private HBase20SQLReaderHelper readerHelper; @Override public void init() { this.originalConfig = this.getPluginJobConf(); this.readerHelper = new HBase20SQLReaderHelper(this.originalConfig); readerHelper.validateParameter(); } @Override public List split(int adviceNumber) { return readerHelper.doSplit(adviceNumber); } @Override public void destroy() { // do nothing } } public static class Task extends Reader.Task { private Configuration readerConfig; private HBase20xSQLReaderTask hbase20xSQLReaderTask; @Override public void init() { this.readerConfig = super.getPluginJobConf(); hbase20xSQLReaderTask = new HBase20xSQLReaderTask(readerConfig, super.getTaskGroupId(), super.getTaskId()); } @Override public void startRead(RecordSender recordSender) { hbase20xSQLReaderTask.readRecord(recordSender); } @Override public void destroy() { // do nothing } } } ================================================ FILE: hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/HBase20xSQLReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.hbase20xsqlreader; import com.alibaba.datax.common.spi.ErrorCode; public enum HBase20xSQLReaderErrorCode implements ErrorCode { REQUIRED_VALUE("Hbasewriter-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("Hbasewriter-01", "您填写的参数值不合法."), GET_QUERYSERVER_CONNECTION_ERROR("Hbasewriter-02", "获取QueryServer连接时出错."), GET_PHOENIX_TABLE_ERROR("Hbasewriter-03", "获取 Phoenix table时出错."), GET_TABLE_COLUMNTYPE_ERROR("Hbasewriter-05", "获取表列类型时出错."), CLOSE_PHOENIX_CONNECTION_ERROR("Hbasewriter-06", "关闭JDBC连接时时出错."), ILLEGAL_SPLIT_PK("Hbasewriter-07", "非法splitKey配置."), PHOENIX_COLUMN_TYPE_CONVERT_ERROR("Hbasewriter-08", "phoenix的列类型转换错误."), QUERY_DATA_ERROR("Hbasewriter-09", "truncate hbase表时发生异常."), ; private final String code; private final String description; private HBase20xSQLReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/HBase20xSQLReaderTask.java ================================================ package com.alibaba.datax.plugin.reader.hbase20xsqlreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.statistics.PerfRecord; import com.alibaba.datax.common.util.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.math.BigDecimal; import java.sql.*; public class HBase20xSQLReaderTask { private static final Logger LOG = LoggerFactory.getLogger(HBase20xSQLReaderTask.class); private Configuration readerConfig; private int taskGroupId = -1; private int taskId=-1; public HBase20xSQLReaderTask(Configuration config, int taskGroupId, int taskId) { this.readerConfig = config; this.taskGroupId = taskGroupId; this.taskId = taskId; } public void readRecord(RecordSender recordSender) { String querySql = readerConfig.getString(Constant.QUERY_SQL_PER_SPLIT); LOG.info("Begin to read record by Sql: [{}\n] {}.", querySql); HBase20SQLReaderHelper helper = new HBase20SQLReaderHelper(readerConfig); Connection conn = helper.getConnection(readerConfig.getString(Key.QUERYSERVER_ADDRESS), readerConfig.getString(Key.SERIALIZATION_NAME, Constant.DEFAULT_SERIALIZATION)); Statement statement = null; ResultSet resultSet = null; try { long rsNextUsedTime = 0; long lastTime = System.nanoTime(); statement = conn.createStatement(); // 统计查询时间 PerfRecord queryPerfRecord = new PerfRecord(taskGroupId,taskId, PerfRecord.PHASE.SQL_QUERY); queryPerfRecord.start(); resultSet = statement.executeQuery(querySql); ResultSetMetaData meta = resultSet.getMetaData(); int columnNum = meta.getColumnCount(); // 统计的result_Next时间 PerfRecord allResultPerfRecord = new PerfRecord(taskGroupId, taskId, PerfRecord.PHASE.RESULT_NEXT_ALL); allResultPerfRecord.start(); while (resultSet.next()) { Record record = recordSender.createRecord(); rsNextUsedTime += (System.nanoTime() - lastTime); for (int i = 1; i <= columnNum; i++) { Column column = this.convertPhoenixValueToDataxColumn(meta.getColumnType(i), resultSet.getObject(i)); record.addColumn(column); } lastTime = System.nanoTime(); recordSender.sendToWriter(record); } allResultPerfRecord.end(rsNextUsedTime); LOG.info("Finished read record by Sql: [{}\n] {}.", querySql); } catch (SQLException e) { throw DataXException.asDataXException( HBase20xSQLReaderErrorCode.QUERY_DATA_ERROR, "查询Phoenix数据出现异常,请检查服务状态或与HBase管理员联系!", e); } finally { helper.closeJdbc(conn, statement, resultSet); } } private Column convertPhoenixValueToDataxColumn(int sqlType, Object value) { Column column; switch (sqlType) { case Types.CHAR: case Types.VARCHAR: column = new StringColumn((String) value); break; case Types.BINARY: case Types.VARBINARY: column = new BytesColumn((byte[]) value); break; case Types.BOOLEAN: column = new BoolColumn((Boolean) value); break; case Types.INTEGER: column = new LongColumn((Integer) value); break; case Types.TINYINT: Byte aByte = (Byte) value; column = new LongColumn(null == aByte ? null : aByte.longValue()); break; case Types.SMALLINT: Short aShort = (Short) value; column = new LongColumn(null == aShort ? null : aShort.longValue()); break; case Types.BIGINT: column = new LongColumn((Long) value); break; case Types.FLOAT: column = new DoubleColumn(null == value ? null : (Float.valueOf(value.toString()))); break; case Types.DECIMAL: column = new DoubleColumn((BigDecimal)value); break; case Types.DOUBLE: column = new DoubleColumn((Double) value); break; case Types.DATE: column = new DateColumn((Date) value); break; case Types.TIME: column = new DateColumn((Time) value); break; case Types.TIMESTAMP: column = new DateColumn((Timestamp) value); break; default: throw DataXException.asDataXException( HBase20xSQLReaderErrorCode.PHOENIX_COLUMN_TYPE_CONVERT_ERROR, "遇到不可识别的phoenix类型," + "sqlType :" + sqlType); } return column; } } ================================================ FILE: hbase20xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.hbase20xsqlreader; public class Key { /** * 【必选】writer要读取的表的表名 */ public final static String TABLE = "table"; /** * 【必选】writer要读取哪些列 */ public final static String COLUMN = "column"; /** * 【必选】Phoenix QueryServer服务地址 */ public final static String QUERYSERVER_ADDRESS = "queryServerAddress"; /** * 【可选】序列化格式,默认为PROTOBUF */ public static final String SERIALIZATION_NAME = "serialization"; /** * 【可选】Phoenix表所属schema,默认为空 */ public static final String SCHEMA = "schema"; /** * 【可选】读取数据时切分列 */ public static final String SPLIT_KEY = "splitKey"; /** * 【可选】读取数据时切分点 */ public static final String SPLIT_POINT = "splitPoint"; /** * 【可选】读取数据过滤条件配置 */ public static final String WHERE = "where"; /** * 【可选】查询语句配置 */ public static final String QUERY_SQL = "querySql"; } ================================================ FILE: hbase20xsqlreader/src/main/resources/plugin.json ================================================ { "name": "hbase20xsqlreader", "class": "com.alibaba.datax.plugin.reader.hbase20xsqlreader.HBase20xSQLReader", "description": "useScene: prod. mechanism: read data from phoenix through queryserver.", "developer": "alibaba" } ================================================ FILE: hbase20xsqlreader/src/main/resources/plugin_job_template.json ================================================ { "name": "hbase20xsqlreader", "parameter": { "queryserverAddress": "", "serialization": "PROTOBUF", "schema": "", "table": "TABLE1", "column": ["ID", "NAME"], "splitKey": "rowkey", "splitPoint":[], "where": "" } } ================================================ FILE: hbase20xsqlwriter/doc/hbase20xsqlwriter.md ================================================ # HBase20xsqlwriter插件文档 ## 1. 快速介绍 HBase20xsqlwriter实现了向hbase中的SQL表(phoenix)批量导入数据的功能。Phoenix因为对rowkey做了数据编码,所以,直接使用HBaseAPI进行写入会面临手工数据转换的问题,麻烦且易错。本插件提供了SQL方式直接向Phoenix表写入数据。 在底层实现上,通过Phoenix QueryServer的轻客户端驱动,执行UPSERT语句向Phoenix写入数据。 ### 1.1 支持的功能 * 支持带索引的表的数据导入,可以同步更新所有的索引表 ### 1.2 限制 * 要求版本为Phoenix5.x及HBase2.x * 仅支持通过Phoenix QeuryServer导入数据,因此您Phoenix必须启动QueryServer服务才能使用本插件 * 不支持清空已有表数据 * 仅支持通过phoenix创建的表,不支持原生HBase表 * 不支持带时间戳的数据导入 ## 2. 实现原理 通过Phoenix轻客户端,连接Phoenix QueryServer服务,执行UPSERT语句向表中批量写入数据。因为使用上层接口,所以,可以同步更新索引表。 ## 3. 配置说明 ### 3.1 配置样例 ```json { "job": { "entry": { "jvm": "-Xms2048m -Xmx2048m" }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": "/Users/shf/workplace/datax_test/hbase20xsqlwriter/txt/normal.txt", "charset": "UTF-8", "column": [ { "index": 0, "type": "String" }, { "index": 1, "type": "string" }, { "index": 2, "type": "string" }, { "index": 3, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "hbase20xsqlwriter", "parameter": { "batchSize": "100", "column": [ "UID", "TS", "EVENTID", "CONTENT" ], "queryServerAddress": "http://127.0.0.1:8765", "nullMode": "skip", "table": "目标hbase表名,大小写有关" } } } ], "setting": { "speed": { "channel": 5 } } } } ``` ### 3.2 参数说明 * **name** * 描述:插件名字,必须是`hbase11xsqlwriter` * 必选:是 * 默认值:无 * **schema** * 描述:表所在的schema * 必选:否
* 默认值:无
* **table** * 描述:要导入的表名,大小写敏感,通常phoenix表都是**大写**表名 * 必选:是 * 默认值:无 * **column** * 描述:列名,大小写敏感,通常phoenix的列名都是**大写**。 * 需要注意列的顺序,必须与reader输出的列的顺序一一对应。 * 不需要填写数据类型,会自动从phoenix获取列的元数据 * 必选:是 * 默认值:无 * **queryServerAddress** * 描述:Phoenix QueryServer地址,为必填项,格式:http://${hostName}:${ip},如http://172.16.34.58:8765。 增强版/Lindorm 用户若需透传user, password参数,可以在queryServerAddress后增加对应可选属性. 格式参考:http://127.0.0.1:8765;user=root;password=root * 必选:是 * 默认值:无 * **serialization** * 描述:QueryServer使用的序列化协议 * 必选:否 * 默认值:PROTOBUF * **batchSize** * 描述:批量写入的最大行数 * 必选:否 * 默认值:256 * **nullMode** * 描述:读取到的列值为null时,如何处理。目前有两种方式: * skip:跳过这一列,即不插入这一列(如果该行的这一列之前已经存在,则会被删除) * empty:插入空值,值类型的空值是0,varchar的空值是空字符串 * 必选:否 * 默认值:skip ## 4. 性能报告 无 ## 5. 约束限制 writer中的列的定义顺序必须与reader的列顺序匹配。reader中的列顺序定义了输出的每一行中,列的组织顺序。而writer的列顺序,定义的是在收到的数据中,writer期待的列的顺序。例如: reader的列顺序是: c1, c2, c3, c4 writer的列顺序是: x1, x2, x3, x4 则reader输出的列c1就会赋值给writer的列x1。如果writer的列顺序是x1, x2, x4, x3,则c3会赋值给x4,c4会赋值给x3. ## 6. FAQ 1. 并发开多少合适?速度慢时增加并发有用吗? 数据导入进程默认JVM的堆大小是2GB,并发(channel数)是通过多线程实现的,开过多的线程有时并不能提高导入速度,反而可能因为过于频繁的GC导致性能下降。一般建议并发数(channel)为5-10. 2. batchSize设置多少比较合适? 默认是256,但应根据每行的大小来计算最合适的batchSize。通常一次操作的数据量在2MB-4MB左右,用这个值除以行大小,即可得到batchSize。 ================================================ FILE: hbase20xsqlwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hbase20xsqlwriter 0.0.1-SNAPSHOT jar 5.2.5-HBase-2.x 1.8 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.aliyun.phoenix ali-phoenix-shaded-thin-client ${phoenix.version} junit junit test com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-service-face test org.mockito mockito-all 1.9.5 test src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hbase20xsqlwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/hbase20xsqlwriter target/ hbase20xsqlwriter-0.0.1-SNAPSHOT.jar plugin/writer/hbase20xsqlwriter false plugin/writer/hbase20xsqlwriter/libs runtime ================================================ FILE: hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.hbase20xsqlwriter; public final class Constant { public static final String DEFAULT_NULL_MODE = "skip"; public static final String DEFAULT_SERIALIZATION = "PROTOBUF"; public static final int DEFAULT_BATCH_ROW_COUNT = 256; // 默认一次写256行 public static final int TYPE_UNSIGNED_TINYINT = 11; public static final int TYPE_UNSIGNED_SMALLINT = 13; public static final int TYPE_UNSIGNED_INTEGER = 9; public static final int TYPE_UNSIGNED_LONG = 10; public static final int TYPE_UNSIGNED_FLOAT = 14; public static final int TYPE_UNSIGNED_DOUBLE = 15; public static final int TYPE_UNSIGNED_DATE = 19; public static final int TYPE_UNSIGNED_TIME = 18; public static final int TYPE_UNSIGNED_TIMESTAMP = 20; } ================================================ FILE: hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/HBase20xSQLHelper.java ================================================ package com.alibaba.datax.plugin.writer.hbase20xsqlwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.ArrayList; import java.util.List; public class HBase20xSQLHelper { private static final Logger LOG = LoggerFactory.getLogger(HBase20xSQLHelper.class); /** * phoenix瘦客户端连接前缀 */ public static final String CONNECT_STRING_PREFIX = "jdbc:phoenix:thin:"; /** * phoenix驱动名 */ public static final String CONNECT_DRIVER_STRING = "org.apache.phoenix.queryserver.client.Driver"; /** * 从系统表查找配置表信息 */ public static final String SELECT_CATALOG_TABLE_STRING = "SELECT COLUMN_NAME FROM SYSTEM.CATALOG WHERE TABLE_NAME='%s' AND COLUMN_NAME IS NOT NULL"; /** * 验证配置参数是否正确 */ public static void validateParameter(com.alibaba.datax.common.util.Configuration originalConfig) { // 表名和queryserver地址必须配置,否则抛异常 String tableName = originalConfig.getNecessaryValue(Key.TABLE, HBase20xSQLWriterErrorCode.REQUIRED_VALUE); String queryServerAddress = originalConfig.getNecessaryValue(Key.QUERYSERVER_ADDRESS, HBase20xSQLWriterErrorCode.REQUIRED_VALUE); // 序列化格式,可不配置,默认PROTOBUF String serialization = originalConfig.getString(Key.SERIALIZATION_NAME, Constant.DEFAULT_SERIALIZATION); String connStr = getConnectionUrl(queryServerAddress, serialization); // 校验jdbc连接是否正常 Connection conn = getThinClientConnection(connStr); List columnNames = originalConfig.getList(Key.COLUMN, String.class); if (columnNames == null || columnNames.isEmpty()) { throw DataXException.asDataXException( HBase20xSQLWriterErrorCode.ILLEGAL_VALUE, "HBase的columns配置不能为空,请添加目标表的列名配置."); } String schema = originalConfig.getString(Key.SCHEMA); // 检查表以及配置列是否存在 checkTable(conn, schema, tableName, columnNames); } /** * 获取JDBC连接,轻量级连接,使用完后必须显式close */ public static Connection getThinClientConnection(String connStr) { LOG.debug("Connecting to QueryServer [" + connStr + "] ..."); Connection conn; try { Class.forName(CONNECT_DRIVER_STRING); conn = DriverManager.getConnection(connStr); conn.setAutoCommit(false); } catch (Throwable e) { throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.GET_QUERYSERVER_CONNECTION_ERROR, "无法连接QueryServer,配置不正确或服务未启动,请检查配置和服务状态或者联系HBase管理员.", e); } LOG.debug("Connected to QueryServer successfully."); return conn; } public static Connection getJdbcConnection(Configuration conf) { String queryServerAddress = conf.getNecessaryValue(Key.QUERYSERVER_ADDRESS, HBase20xSQLWriterErrorCode.REQUIRED_VALUE); // 序列化格式,可不配置,默认PROTOBUF String serialization = conf.getString(Key.SERIALIZATION_NAME, "PROTOBUF"); String connStr = getConnectionUrl(queryServerAddress, serialization); return getThinClientConnection(connStr); } public static String getConnectionUrl(String queryServerAddress, String serialization) { String urlFmt = CONNECT_STRING_PREFIX + "url=%s;serialization=%s"; return String.format(urlFmt, queryServerAddress, serialization); } public static void checkTable(Connection conn, String schema, String tableName, List columnNames) throws DataXException { String selectSystemTable = getSelectSystemSQL(schema, tableName); Statement st = null; ResultSet rs = null; try { st = conn.createStatement(); rs = st.executeQuery(selectSystemTable); List allColumns = new ArrayList(); if (rs.next()) { allColumns.add(rs.getString(1)); } else { LOG.error(tableName + "表不存在,请检查表名是否正确或是否已创建.", HBase20xSQLWriterErrorCode.GET_HBASE_TABLE_ERROR); throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.GET_HBASE_TABLE_ERROR, tableName + "表不存在,请检查表名是否正确或是否已创建."); } while (rs.next()) { allColumns.add(rs.getString(1)); } for (String columnName : columnNames) { if (!allColumns.contains(columnName)) { // 用户配置的列名在元数据中不存在 throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.ILLEGAL_VALUE, "您配置的列" + columnName + "在目的表" + tableName + "的元数据中不存在,请检查您的配置或者联系HBase管理员."); } } } catch (SQLException t) { throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.GET_HBASE_TABLE_ERROR, "获取表" + tableName + "信息失败,请检查您的集群和表状态或者联系HBase管理员.", t); } finally { closeJdbc(conn, st, rs); } } private static String getSelectSystemSQL(String schema, String tableName) { String sql = String.format(SELECT_CATALOG_TABLE_STRING, tableName); if (schema != null) { sql = sql + " AND TABLE_SCHEM = '" + schema + "'"; } return sql; } public static void closeJdbc(Connection connection, Statement statement, ResultSet resultSet) { try { if (resultSet != null) { resultSet.close(); } if (statement != null) { statement.close(); } if (connection != null) { connection.close(); } } catch (SQLException e) { LOG.warn("数据库连接关闭异常.", HBase20xSQLWriterErrorCode.CLOSE_HBASE_CONNECTION_ERROR); } } } ================================================ FILE: hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/HBase20xSQLWriter.java ================================================ package com.alibaba.datax.plugin.writer.hbase20xsqlwriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import java.util.ArrayList; import java.util.List; public class HBase20xSQLWriter extends Writer { public static class Job extends Writer.Job { private Configuration config = null; @Override public void init() { this.config = this.getPluginJobConf(); HBase20xSQLHelper.validateParameter(this.config); } @Override public List split(int mandatoryNumber) { List splitResultConfigs = new ArrayList(); for (int j = 0; j < mandatoryNumber; j++) { splitResultConfigs.add(config.clone()); } return splitResultConfigs; } @Override public void destroy() { //doNothing } } public static class Task extends Writer.Task { private Configuration taskConfig; private HBase20xSQLWriterTask writerTask; @Override public void init() { this.taskConfig = super.getPluginJobConf(); this.writerTask = new HBase20xSQLWriterTask(this.taskConfig); } @Override public void startWrite(RecordReceiver lineReceiver) { this.writerTask.startWriter(lineReceiver, super.getTaskPluginCollector()); } @Override public void destroy() { // 不需要close } } } ================================================ FILE: hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/HBase20xSQLWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.hbase20xsqlwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum HBase20xSQLWriterErrorCode implements ErrorCode { REQUIRED_VALUE("Hbasewriter-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("Hbasewriter-01", "您填写的参数值不合法."), GET_QUERYSERVER_CONNECTION_ERROR("Hbasewriter-02", "获取QueryServer连接时出错."), GET_HBASE_TABLE_ERROR("Hbasewriter-03", "获取 Hbase table时出错."), CLOSE_HBASE_CONNECTION_ERROR("Hbasewriter-04", "关闭Hbase连接时出错."), GET_TABLE_COLUMNTYPE_ERROR("Hbasewriter-05", "获取表列类型时出错."), PUT_HBASE_ERROR("Hbasewriter-07", "写入hbase时发生IO异常."), ; private final String code; private final String description; private HBase20xSQLWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/HBase20xSQLWriterTask.java ================================================ package com.alibaba.datax.plugin.writer.hbase20xsqlwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.math.BigDecimal; import java.sql.*; import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class HBase20xSQLWriterTask { private final static Logger LOG = LoggerFactory.getLogger(HBase20xSQLWriterTask.class); private Configuration configuration; private TaskPluginCollector taskPluginCollector; private Connection connection = null; private PreparedStatement pstmt = null; // 需要向hbsae写入的列的数量,即用户配置的column参数中列的个数。时间戳不包含在内 private int numberOfColumnsToWrite; // 期待从源头表的Record中拿到多少列 private int numberOfColumnsToRead; private int[] columnTypes; private List columns; private String fullTableName; private NullModeType nullModeType; private int batchSize; public HBase20xSQLWriterTask(Configuration configuration) { // 这里仅解析配置,不访问远端集群,配置的合法性检查在writer的init过程中进行 this.configuration = configuration; } public void startWriter(RecordReceiver lineReceiver, TaskPluginCollector taskPluginCollector) { this.taskPluginCollector = taskPluginCollector; try { // 准备阶段 initialize(); // 写入数据 writeData(lineReceiver); } catch (Throwable e) { throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.PUT_HBASE_ERROR, e); } finally { // 关闭jdbc连接 HBase20xSQLHelper.closeJdbc(connection, pstmt, null); } } /** * 初始化JDBC操作对象及列类型 * @throws SQLException */ private void initialize() throws SQLException { if (connection == null) { connection = HBase20xSQLHelper.getJdbcConnection(configuration); connection.setAutoCommit(false); } nullModeType = NullModeType.getByTypeName(configuration.getString(Key.NULLMODE, Constant.DEFAULT_NULL_MODE)); batchSize = configuration.getInt(Key.BATCHSIZE, Constant.DEFAULT_BATCH_ROW_COUNT); String schema = configuration.getString(Key.SCHEMA); String tableName = configuration.getNecessaryValue(Key.TABLE, HBase20xSQLWriterErrorCode.REQUIRED_VALUE); fullTableName = "\"" + tableName + "\""; if (schema != null && !schema.isEmpty()) { fullTableName = "\"" + schema + "\".\"" + tableName + "\""; } columns = configuration.getList(Key.COLUMN, String.class); if (pstmt == null) { // 一个Task的生命周期中只使用一个PreparedStatement对象 pstmt = createPreparedStatement(); columnTypes = getColumnSqlType(); } } /** * 生成sql模板,并根据模板创建PreparedStatement */ private PreparedStatement createPreparedStatement() throws SQLException { // 生成列名集合,列之间用逗号分隔: col1,col2,col3,... StringBuilder columnNamesBuilder = new StringBuilder(); for (String col : columns) { // 列名使用双引号,则不自动转换为全大写,而是保留用户配置的大小写 columnNamesBuilder.append("\""); columnNamesBuilder.append(col); columnNamesBuilder.append("\""); columnNamesBuilder.append(","); } // 移除末尾多余的逗号 columnNamesBuilder.setLength(columnNamesBuilder.length() - 1); String columnNames = columnNamesBuilder.toString(); numberOfColumnsToWrite = columns.size(); numberOfColumnsToRead = numberOfColumnsToWrite; // 开始的时候,要读的列数娱要写的列数相等 // 生成UPSERT模板 StringBuilder upsertBuilder = new StringBuilder("upsert into " + fullTableName + " (" + columnNames + " ) values ("); for (int i = 0; i < numberOfColumnsToWrite; i++) { upsertBuilder.append("?,"); } upsertBuilder.setLength(upsertBuilder.length() - 1); // 移除末尾多余的逗号 upsertBuilder.append(")"); String sql = upsertBuilder.toString(); PreparedStatement ps = connection.prepareStatement(sql); LOG.debug("SQL template generated: " + sql); return ps; } /** * 根据列名来从数据库元数据中获取这一列对应的SQL类型 */ private int[] getColumnSqlType() throws SQLException { int[] types = new int[numberOfColumnsToWrite]; StringBuilder columnNamesBuilder = new StringBuilder(); for (String columnName : columns) { columnNamesBuilder.append("\"").append(columnName).append("\","); } columnNamesBuilder.setLength(columnNamesBuilder.length() - 1); // 查询一条数据获取表meta信息 String selectSql = "SELECT " + columnNamesBuilder + " FROM " + fullTableName + " LIMIT 1"; Statement statement = null; try { statement = connection.createStatement(); ResultSetMetaData meta = statement.executeQuery(selectSql).getMetaData(); for (int i = 0; i < columns.size(); i++) { String name = columns.get(i); types[i] = meta.getColumnType(i + 1); LOG.debug("Column name : " + name + ", sql type = " + types[i] + " " + meta.getColumnTypeName(i + 1)); } } catch (SQLException e) { throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.GET_TABLE_COLUMNTYPE_ERROR, "获取表" + fullTableName + "列类型失败,请检查配置和服务状态或者联系HBase管理员.", e); } finally { HBase20xSQLHelper.closeJdbc(null, statement, null); } return types; } /** * 从接收器中获取每条记录,写入Phoenix */ private void writeData(RecordReceiver lineReceiver) throws SQLException { List buffer = new ArrayList(batchSize); Record record = null; while ((record = lineReceiver.getFromReader()) != null) { // 校验列数量是否符合预期 if (record.getColumnNumber() != numberOfColumnsToRead) { throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.ILLEGAL_VALUE, "数据源给出的列数量[" + record.getColumnNumber() + "]与您配置中的列数量[" + numberOfColumnsToRead + "]不同, 请检查您的配置 或者 联系 Hbase 管理员."); } buffer.add(record); if (buffer.size() > batchSize) { doBatchUpsert(buffer); buffer.clear(); } } // 处理剩余的record if (!buffer.isEmpty()) { doBatchUpsert(buffer); buffer.clear(); } } /** * 批量提交一组数据,如果失败,则尝试一行行提交,如果仍然失败,抛错给用户 */ private void doBatchUpsert(List records) throws SQLException { try { // 将所有record提交到connection缓存 for (Record r : records) { setupStatement(r); pstmt.addBatch(); } pstmt.executeBatch(); // 将缓存的数据提交到phoenix connection.commit(); pstmt.clearParameters(); pstmt.clearBatch(); } catch (SQLException e) { LOG.error("Failed batch committing " + records.size() + " records", e); // 批量提交失败,则一行行重试,以确定哪一行出错 connection.rollback(); HBase20xSQLHelper.closeJdbc(null, pstmt, null); connection.setAutoCommit(true); pstmt = createPreparedStatement(); doSingleUpsert(records); } catch (Exception e) { throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.PUT_HBASE_ERROR, e); } } /** * 单行提交,将出错的行记录到脏数据中。由脏数据收集模块判断任务是否继续 */ private void doSingleUpsert(List records) throws SQLException { int rowNumber = 0; for (Record r : records) { try { rowNumber ++; setupStatement(r); pstmt.executeUpdate(); } catch (SQLException e) { //出错了,记录脏数据 LOG.error("Failed writing to phoenix, rowNumber: " + rowNumber); this.taskPluginCollector.collectDirtyRecord(r, e); } } } private void setupStatement(Record record) throws SQLException { for (int i = 0; i < numberOfColumnsToWrite; i++) { Column col = record.getColumn(i); int sqlType = columnTypes[i]; // PreparedStatement中的索引从1开始,所以用i+1 setupColumn(i + 1, sqlType, col); } } private void setupColumn(int pos, int sqlType, Column col) throws SQLException { if (col.getRawData() != null) { switch (sqlType) { case Types.CHAR: case Types.VARCHAR: pstmt.setString(pos, col.asString()); break; case Types.BINARY: case Types.VARBINARY: pstmt.setBytes(pos, col.asBytes()); break; case Types.BOOLEAN: pstmt.setBoolean(pos, col.asBoolean()); break; case Types.TINYINT: case Constant.TYPE_UNSIGNED_TINYINT: pstmt.setByte(pos, col.asLong().byteValue()); break; case Types.SMALLINT: case Constant.TYPE_UNSIGNED_SMALLINT: pstmt.setShort(pos, col.asLong().shortValue()); break; case Types.INTEGER: case Constant.TYPE_UNSIGNED_INTEGER: pstmt.setInt(pos, col.asLong().intValue()); break; case Types.BIGINT: case Constant.TYPE_UNSIGNED_LONG: pstmt.setLong(pos, col.asLong()); break; case Types.FLOAT: pstmt.setFloat(pos, col.asDouble().floatValue()); break; case Types.DOUBLE: pstmt.setDouble(pos, col.asDouble()); break; case Types.DECIMAL: pstmt.setBigDecimal(pos, col.asBigDecimal()); break; case Types.DATE: case Constant.TYPE_UNSIGNED_DATE: pstmt.setDate(pos, new Date(col.asDate().getTime())); break; case Types.TIME: case Constant.TYPE_UNSIGNED_TIME: pstmt.setTime(pos, new Time(col.asDate().getTime())); break; case Types.TIMESTAMP: case Constant.TYPE_UNSIGNED_TIMESTAMP: pstmt.setTimestamp(pos, new Timestamp(col.asDate().getTime())); break; default: throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.ILLEGAL_VALUE, "不支持您配置的列类型:" + sqlType + ", 请检查您的配置 或者 联系 Hbase 管理员."); } } else { // 没有值,按空值的配置情况处理 switch (nullModeType){ case Skip: // 跳过空值,则不插入该列, pstmt.setNull(pos, sqlType); break; case Empty: // 插入"空值",请注意不同类型的空值不同 // 另外,对SQL来说,空值本身是有值的,这与直接操作HBASE Native API时的空值完全不同 pstmt.setObject(pos, getEmptyValue(sqlType)); break; default: // nullMode的合法性在初始化配置的时候已经校验过,这里一定不会出错 throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.ILLEGAL_VALUE, "Hbasewriter 不支持该 nullMode 类型: " + nullModeType + ", 目前支持的 nullMode 类型是:" + Arrays.asList(NullModeType.values())); } } } /** * 根据类型获取"空值" * 值类型的空值都是0,bool是false,String是空字符串 * @param sqlType sql数据类型,定义于{@link Types} */ private Object getEmptyValue(int sqlType) { switch (sqlType) { case Types.VARCHAR: return ""; case Types.BOOLEAN: return false; case Types.TINYINT: case Constant.TYPE_UNSIGNED_TINYINT: return (byte) 0; case Types.SMALLINT: case Constant.TYPE_UNSIGNED_SMALLINT: return (short) 0; case Types.INTEGER: case Constant.TYPE_UNSIGNED_INTEGER: return (int) 0; case Types.BIGINT: case Constant.TYPE_UNSIGNED_LONG: return (long) 0; case Types.FLOAT: return (float) 0.0; case Types.DOUBLE: return (double) 0.0; case Types.DECIMAL: return new BigDecimal(0); case Types.DATE: case Constant.TYPE_UNSIGNED_DATE: return new Date(0); case Types.TIME: case Constant.TYPE_UNSIGNED_TIME: return new Time(0); case Types.TIMESTAMP: case Constant.TYPE_UNSIGNED_TIMESTAMP: return new Timestamp(0); case Types.BINARY: case Types.VARBINARY: return new byte[0]; default: throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.ILLEGAL_VALUE, "不支持您配置的列类型:" + sqlType + ", 请检查您的配置 或者 联系 Hbase 管理员."); } } } ================================================ FILE: hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.hbase20xsqlwriter; public class Key { /** * 【必选】writer要写入的表的表名 */ public final static String TABLE = "table"; /** * 【必选】writer要写入哪些列 */ public final static String COLUMN = "column"; /** * 【必选】Phoenix QueryServer服务地址 */ public final static String QUERYSERVER_ADDRESS = "queryServerAddress"; /** * 【可选】序列化格式,默认为PROTOBUF */ public static final String SERIALIZATION_NAME = "serialization"; /** * 【可选】批量写入的最大行数,默认100行 */ public static final String BATCHSIZE = "batchSize"; /** * 【可选】遇到空值默认跳过 */ public static final String NULLMODE = "nullMode"; /** * 【可选】Phoenix表所属schema,默认为空 */ public static final String SCHEMA = "schema"; } ================================================ FILE: hbase20xsqlwriter/src/main/java/com/alibaba/datax/plugin/writer/hbase20xsqlwriter/NullModeType.java ================================================ package com.alibaba.datax.plugin.writer.hbase20xsqlwriter; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; public enum NullModeType { Skip("skip"), Empty("empty") ; private String mode; NullModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static NullModeType getByTypeName(String modeName) { for (NullModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(HBase20xSQLWriterErrorCode.ILLEGAL_VALUE, "Hbasewriter 不支持该 nullMode 类型:" + modeName + ", 目前支持的 nullMode 类型是:" + Arrays.asList(values())); } } ================================================ FILE: hbase20xsqlwriter/src/main/resources/plugin.json ================================================ { "name": "hbase20xsqlwriter", "class": "com.alibaba.datax.plugin.writer.hbase20xsqlwriter.HBase20xSQLWriter", "description": "useScene: prod. mechanism: use hbase sql UPSERT to put data, index tables will be updated too.", "developer": "alibaba" } ================================================ FILE: hbase20xsqlwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "hbase20xsqlwriter", "parameter": { "queryServerAddress": "", "table": "", "serialization": "PROTOBUF", "column": [ ], "batchSize": "100", "nullMode": "skip", "schema": "" } } ================================================ FILE: hdfsreader/doc/hdfsreader.md ================================================ # DataX HdfsReader 插件文档 ------------ ## 1 快速介绍 HdfsReader提供了读取分布式文件系统数据存储的能力。在底层实现上,HdfsReader获取分布式文件系统上文件的数据,并转换为DataX传输协议传递给Writer。 **目前HdfsReader支持的文件格式有textfile(text)、orcfile(orc)、rcfile(rc)、sequence file(seq)和普通逻辑二维表(csv)类型格式的文件,且文件内容存放的必须是一张逻辑意义上的二维表。** **HdfsReader需要Jdk1.7及以上版本的支持。** ## 2 功能与限制 HdfsReader实现了从Hadoop分布式文件系统Hdfs中读取文件数据并转为DataX协议的功能。textfile是Hive建表时默认使用的存储格式,数据不做压缩,本质上textfile就是以文本的形式将数据存放在hdfs中,对于DataX而言,HdfsReader实现上类比TxtFileReader,有诸多相似之处。orcfile,它的全名是Optimized Row Columnar file,是对RCFile做了优化。据官方文档介绍,这种文件格式可以提供一种高效的方法来存储Hive数据。HdfsReader利用Hive提供的OrcSerde类,读取解析orcfile文件的数据。目前HdfsReader支持的功能如下: 1. 支持textfile、orcfile、rcfile、sequence file和csv格式的文件,且要求文件内容存放的是一张逻辑意义上的二维表。 2. 支持多种类型数据读取(使用String表示),支持列裁剪,支持列常量 3. 支持递归读取、支持正则表达式("*"和"?")。 4. 支持orcfile数据压缩,目前支持SNAPPY,ZLIB两种压缩方式。 5. 多个File可以支持并发读取。 6. 支持sequence file数据压缩,目前支持lzo压缩方式。 7. csv类型支持压缩格式有:gzip、bz2、zip、lzo、lzo_deflate、snappy。 8. 目前插件中Hive版本为1.1.1,Hadoop版本为2.7.1(Apache[为适配JDK1.7],在Hadoop 2.5.0, Hadoop 2.6.0 和Hive 1.2.0测试环境中写入正常;其它版本需后期进一步测试; 9. 支持kerberos认证(注意:如果用户需要进行kerberos认证,那么用户使用的Hadoop集群版本需要和hdfsreader的Hadoop版本保持一致,如果高于hdfsreader的Hadoop版本,不保证kerberos认证有效) 我们暂时不能做到: 1. 单个File支持多线程并发读取,这里涉及到单个File内部切分算法。二期考虑支持。 2. 目前还不支持hdfs HA; ## 3 功能说明 ### 3.1 配置样例 ```json { "job": { "setting": { "speed": { "channel": 3 } }, "content": [ { "reader": { "name": "hdfsreader", "parameter": { "path": "/user/hive/warehouse/mytable01/*", "defaultFS": "hdfs://xxx:port", "column": [ { "index": 0, "type": "long" }, { "index": 1, "type": "boolean" }, { "type": "string", "value": "hello" }, { "index": 2, "type": "double" } ], "fileType": "orc", "encoding": "UTF-8", "fieldDelimiter": "," } }, "writer": { "name": "streamwriter", "parameter": { "print": true } } } ] } } ``` ### 3.2 参数说明(各个配置项值前后不允许有空格) * **path** * 描述:要读取的文件路径,如果要读取多个文件,可以使用正则表达式"*",注意这里可以支持填写多个路径。。
当指定单个Hdfs文件,HdfsReader暂时只能使用单线程进行数据抽取。二期考虑在非压缩文件情况下针对单个File可以进行多线程并发读取。 当指定多个Hdfs文件,HdfsReader支持使用多线程进行数据抽取。线程并发数通过通道数指定。 当指定通配符,HdfsReader尝试遍历出多个文件信息。例如: 指定/*代表读取/目录下所有的文件,指定/bazhen/\*代表读取bazhen目录下游所有的文件。HdfsReader目前只支持"*"和"?"作为文件通配符。 **特别需要注意的是,DataX会将一个作业下同步的所有的文件视作同一张数据表。用户必须自己保证所有的File能够适配同一套schema信息。并且提供给DataX权限可读。** * 必选:是
* 默认值:无
* **defaultFS** * 描述:Hadoop hdfs文件系统namenode节点地址。
**目前HdfsReader已经支持Kerberos认证,如果需要权限认证,则需要用户配置kerberos参数,见下面** * 必选:是
* 默认值:无
* **fileType** * 描述:文件的类型,目前只支持用户配置为"text"、"orc"、"rc"、"seq"、"csv"。
text表示textfile文件格式 orc表示orcfile文件格式 rc表示rcfile文件格式 seq表示sequence file文件格式 csv表示普通hdfs文件格式(逻辑二维表) **特别需要注意的是,HdfsReader能够自动识别文件是orcfile、textfile或者还是其它类型的文件,但该项是必填项,HdfsReader则会只读取用户配置的类型的文件,忽略路径下其他格式的文件** **另外需要注意的是,由于textfile和orcfile是两种完全不同的文件格式,所以HdfsReader对这两种文件的解析方式也存在差异,这种差异导致hive支持的复杂复合类型(比如map,array,struct,union)在转换为DataX支持的String类型时,转换的结果格式略有差异,比如以map类型为例:** orcfile map类型经hdfsreader解析转换成datax支持的string类型后,结果为"{job=80, team=60, person=70}" textfile map类型经hdfsreader解析转换成datax支持的string类型后,结果为"job:80,team:60,person:70" 从上面的转换结果可以看出,数据本身没有变化,但是表示的格式略有差异,所以如果用户配置的文件路径中要同步的字段在Hive中是复合类型的话,建议配置统一的文件格式。 **如果需要统一复合类型解析出来的格式,我们建议用户在hive客户端将textfile格式的表导成orcfile格式的表** * 必选:是
* 默认值:无
* **column** * 描述:读取字段列表,type指定源数据的类型,index指定当前列来自于文本第几列(以0开始),value指定当前类型为常量,不从源头文件读取数据,而是根据value值自动生成对应的列。
默认情况下,用户可以全部按照String类型读取数据,配置如下: ```json "column": ["*"] ``` 用户可以指定Column字段信息,配置如下: ```json { "type": "long", "index": 0 //从本地文件文本第一列获取int字段 }, { "type": "string", "value": "alibaba" //HdfsReader内部生成alibaba的字符串字段作为当前字段 } ``` 对于用户指定Column信息,type必须填写,index/value必须选择其一。 * 必选:是
* 默认值:全部按照string类型读取
* **fieldDelimiter** * 描述:读取的字段分隔符
**另外需要注意的是,HdfsReader在读取textfile数据时,需要指定字段分割符,如果不指定默认为',',HdfsReader在读取orcfile时,用户无需指定字段分割符** * 必选:否
* 默认值:,
* **encoding** * 描述:读取文件的编码配置。
* 必选:否
* 默认值:utf-8
* **nullFormat** * 描述:文本文件中无法使用标准字符串定义null(空指针),DataX提供nullFormat定义哪些字符串可以表示为null。
例如如果用户配置: nullFormat:"\\N",那么如果源头数据是"\N",DataX视作null字段。 * 必选:否
* 默认值:无
* **haveKerberos** * 描述:是否有Kerberos认证,默认false
例如如果用户配置true,则配置项kerberosKeytabFilePath,kerberosPrincipal为必填。 * 必选:haveKerberos 为true必选
* 默认值:false
* **kerberosKeytabFilePath** * 描述:Kerberos认证 keytab文件路径,绝对路径
* 必选:否
* 默认值:无
* **kerberosPrincipal** * 描述:Kerberos认证Principal名,如xxxx/hadoopclient@xxx.xxx
* 必选:haveKerberos 为true必选
* 默认值:无
* **compress** * 描述:当fileType(文件类型)为csv下的文件压缩方式,目前仅支持 gzip、bz2、zip、lzo、lzo_deflate、hadoop-snappy、framing-snappy压缩;**值得注意的是,lzo存在两种压缩格式:lzo和lzo_deflate,用户在配置的时候需要留心,不要配错了;另外,由于snappy目前没有统一的stream format,datax目前只支持最主流的两种:hadoop-snappy(hadoop上的snappy stream format)和framing-snappy(google建议的snappy stream format)**;orc文件类型下无需填写。
* 必选:否
* 默认值:无
* **hadoopConfig** * 描述:hadoopConfig里可以配置与Hadoop相关的一些高级参数,比如HA的配置。
```json "hadoopConfig":{ "dfs.nameservices": "testDfs", "dfs.ha.namenodes.testDfs": "namenode1,namenode2",        "dfs.namenode.rpc-address.aliDfs.namenode1": "", "dfs.namenode.rpc-address.aliDfs.namenode2": "", "dfs.client.failover.proxy.provider.testDfs": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" } ``` * 必选:否
* 默认值:无
* **csvReaderConfig** * 描述:读取CSV类型文件参数配置,Map类型。读取CSV类型文件使用的CsvReader进行读取,会有很多配置,不配置则使用默认值。
* 必选:否
* 默认值:无
常见配置: ```json "csvReaderConfig":{ "safetySwitch": false, "skipEmptyRecords": false, "useTextQualifier": false } ``` 所有配置项及默认值,配置时 csvReaderConfig 的map中请**严格按照以下字段名字进行配置**: ``` boolean caseSensitive = true; char textQualifier = 34; boolean trimWhitespace = true; boolean useTextQualifier = true;//是否使用csv转义字符 char delimiter = 44;//分隔符 char recordDelimiter = 0; char comment = 35; boolean useComments = false; int escapeMode = 1; boolean safetySwitch = true;//单列长度是否限制100000字符 boolean skipEmptyRecords = true;//是否跳过空行 boolean captureRawRecord = true; ``` ### 3.3 类型转换 由于textfile和orcfile文件表的元数据信息由Hive维护并存放在Hive自己维护的数据库(如mysql)中,目前HdfsReader不支持对Hive元数 据数据库进行访问查询,因此用户在进行类型转换的时候,必须指定数据类型,如果用户配置的column为"*",则所有column默认转换为 string类型。HdfsReader提供了类型转换的建议表如下: | DataX 内部类型| Hive表 数据类型 | | -------- | ----- | | Long |TINYINT,SMALLINT,INT,BIGINT| | Double |FLOAT,DOUBLE| | String |String,CHAR,VARCHAR,STRUCT,MAP,ARRAY,UNION,BINARY| | Boolean |BOOLEAN| | Date |Date,TIMESTAMP| 其中: * Long是指Hdfs文件文本中使用整形的字符串表示形式,例如"123456789"。 * Double是指Hdfs文件文本中使用Double的字符串表示形式,例如"3.1415"。 * Boolean是指Hdfs文件文本中使用Boolean的字符串表示形式,例如"true"、"false"。不区分大小写。 * Date是指Hdfs文件文本中使用Date的字符串表示形式,例如"2014-12-31"。 特别提醒: * Hive支持的数据类型TIMESTAMP可以精确到纳秒级别,所以textfile、orcfile中TIMESTAMP存放的数据类似于"2015-08-21 22:40:47.397898389",如果转换的类型配置为DataX的Date,转换之后会导致纳秒部分丢失,所以如果需要保留纳秒部分的数据,请配置转换类型为DataX的String类型。 ### 3.4 按分区读取 Hive在建表的时候,可以指定分区partition,例如创建分区partition(day="20150820",hour="09"),对应的hdfs文件系统中,相应的表的目录下则会多出/20150820和/09两个目录,且/20150820是/09的父目录。了解了分区都会列成相应的目录结构,在按照某个分区读取某个表所有数据时,则只需配置好json中path的值即可。 比如需要读取表名叫mytable01下分区day为20150820这一天的所有数据,则配置如下: ```json "path": "/user/hive/warehouse/mytable01/20150820/*" ``` ## 4 性能报告 ## 5 约束限制 略 ## 6 FAQ 1. 如果报java.io.IOException: Maximum column length of 100,000 exceeded in column...异常信息,说明数据源column字段长度超过了100000字符。 需要在json的reader里增加如下配置 ```json "csvReaderConfig":{ "safetySwitch": false, "skipEmptyRecords": false, "useTextQualifier": false } ``` safetySwitch = false;//单列长度不限制100000字符 ================================================ FILE: hdfsreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hdfsreader com.alibaba.datax 0.0.1-SNAPSHOT jar 3.1.3 2.7.1 org.apache.logging.log4j log4j-api 2.17.1 org.apache.logging.log4j log4j-core 2.17.1 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.hadoop hadoop-hdfs ${hadoop.version} org.apache.hadoop hadoop-common ${hadoop.version} org.apache.hadoop hadoop-yarn-common ${hadoop.version} com.aliyun.oss hadoop-aliyun 2.7.2 org.apache.hadoop hadoop-mapreduce-client-core ${hadoop.version} org.apache.hive hive-exec ${hive.version} org.apache.hive hive-serde ${hive.version} org.apache.hive hive-service ${hive.version} jdk.tools jdk.tools org.apache.hive hive-common ${hive.version} org.apache.hive.hcatalog hive-hcatalog-core ${hive.version} com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.apache.parquet parquet-column 1.12.0 org.apache.parquet parquet-avro 1.12.0 org.apache.parquet parquet-common 1.12.0 org.apache.parquet parquet-format 2.10.0 org.apache.parquet parquet-jackson 1.12.0 org.apache.parquet parquet-encoding 1.12.0 org.apache.parquet parquet-hadoop 1.12.0 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hdfsreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/hdfsreader target/ hdfsreader-0.0.1-SNAPSHOT.jar plugin/reader/hdfsreader src/main/libs *.* plugin/reader/ossreader/libs src/main/libs *.* plugin/reader/hivereader/libs false plugin/reader/hdfsreader/libs runtime ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; /** * Created by mingya.wmy on 2015/8/14. */ public class Constant { public static final String SOURCE_FILES = "sourceFiles"; public static final String TEXT = "TEXT"; public static final String ORC = "ORC"; public static final String CSV = "CSV"; public static final String SEQ = "SEQ"; public static final String RC = "RC"; public static final String PARQUET = "PARQUET"; } ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/DFSUtil.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.ColumnEntry; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.BooleanUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.exception.ExceptionUtils; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.common.type.HiveDecimal; import org.apache.hadoop.hive.ql.io.RCFile; import org.apache.hadoop.hive.ql.io.RCFileRecordReader; import org.apache.hadoop.hive.ql.io.orc.OrcFile; import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat; import org.apache.hadoop.hive.ql.io.orc.OrcSerde; import org.apache.hadoop.hive.ql.io.orc.Reader; import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable; import org.apache.hadoop.hive.serde2.columnar.BytesRefWritable; import org.apache.hadoop.hive.serde2.objectinspector.StructField; import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.util.ReflectionUtils; import org.apache.parquet.example.data.Group; import org.apache.parquet.hadoop.ParquetReader; import org.apache.parquet.hadoop.example.GroupReadSupport; import org.apache.parquet.hadoop.util.HadoopInputFile; import org.apache.parquet.io.api.Binary; import org.apache.parquet.schema.MessageType; import org.apache.parquet.schema.MessageTypeParser; import org.apache.parquet.schema.PrimitiveType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.io.InputStream; import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.sql.Timestamp; import java.text.SimpleDateFormat; import java.time.LocalDate; import java.time.LocalDateTime; import java.time.LocalTime; import java.util.*; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors; /** * Created by mingya.wmy on 2015/8/12. */ public class DFSUtil { private static final Logger LOG = LoggerFactory.getLogger(HdfsReader.Job.class); private org.apache.hadoop.conf.Configuration hadoopConf = null; private String specifiedFileType = null; private Boolean haveKerberos = false; private String kerberosKeytabFilePath; private String kerberosPrincipal; private static final int DIRECTORY_SIZE_GUESS = 16 * 1024; public static final String HDFS_DEFAULTFS_KEY = "fs.defaultFS"; public static final String HADOOP_SECURITY_AUTHENTICATION_KEY = "hadoop.security.authentication"; private Boolean skipEmptyOrcFile = false; private Integer orcFileEmptySize = null; public DFSUtil(Configuration taskConfig) { hadoopConf = new org.apache.hadoop.conf.Configuration(); //io.file.buffer.size 性能参数 //http://blog.csdn.net/yangjl38/article/details/7583374 Configuration hadoopSiteParams = taskConfig.getConfiguration(Key.HADOOP_CONFIG); JSONObject hadoopSiteParamsAsJsonObject = JSON.parseObject(taskConfig.getString(Key.HADOOP_CONFIG)); if (null != hadoopSiteParams) { Set paramKeys = hadoopSiteParams.getKeys(); for (String each : paramKeys) { hadoopConf.set(each, hadoopSiteParamsAsJsonObject.getString(each)); } } hadoopConf.set(HDFS_DEFAULTFS_KEY, taskConfig.getString(Key.DEFAULT_FS)); //是否有Kerberos认证 this.haveKerberos = taskConfig.getBool(Key.HAVE_KERBEROS, false); if (haveKerberos) { this.kerberosKeytabFilePath = taskConfig.getString(Key.KERBEROS_KEYTAB_FILE_PATH); this.kerberosPrincipal = taskConfig.getString(Key.KERBEROS_PRINCIPAL); this.hadoopConf.set(HADOOP_SECURITY_AUTHENTICATION_KEY, "kerberos"); } this.kerberosAuthentication(this.kerberosPrincipal, this.kerberosKeytabFilePath); this.skipEmptyOrcFile = taskConfig.getBool(Key.SKIP_EMPTY_ORCFILE, false); LOG.info(String.format("hadoopConfig details:%s", JSON.toJSONString(this.hadoopConf))); } private void kerberosAuthentication(String kerberosPrincipal, String kerberosKeytabFilePath) { if (haveKerberos && StringUtils.isNotBlank(this.kerberosPrincipal) && StringUtils.isNotBlank(this.kerberosKeytabFilePath)) { UserGroupInformation.setConfiguration(this.hadoopConf); try { UserGroupInformation.loginUserFromKeytab(kerberosPrincipal, kerberosKeytabFilePath); } catch (Exception e) { String message = String.format("kerberos认证失败,请确定kerberosKeytabFilePath[%s]和kerberosPrincipal[%s]填写正确", kerberosKeytabFilePath, kerberosPrincipal); throw DataXException.asDataXException(HdfsReaderErrorCode.KERBEROS_LOGIN_ERROR, message, e); } } } /** * 获取指定路径列表下符合条件的所有文件的绝对路径 * * @param srcPaths 路径列表 * @param specifiedFileType 指定文件类型 */ public HashSet getAllFiles(List srcPaths, String specifiedFileType, Boolean skipEmptyOrcFile, Integer orcFileEmptySize) { this.specifiedFileType = specifiedFileType; this.skipEmptyOrcFile = skipEmptyOrcFile; this.orcFileEmptySize = orcFileEmptySize; if (!srcPaths.isEmpty()) { for (String eachPath : srcPaths) { LOG.info(String.format("get HDFS all files in path = [%s]", eachPath)); getHDFSAllFiles(eachPath); } } return sourceHDFSAllFilesList; } private HashSet sourceHDFSAllFilesList = new HashSet(); public HashSet getHDFSAllFiles(String hdfsPath) { try { FileSystem hdfs = FileSystem.get(hadoopConf); //判断hdfsPath是否包含正则符号 if (hdfsPath.contains("*") || hdfsPath.contains("?")) { Path path = new Path(hdfsPath); FileStatus stats[] = hdfs.globStatus(path); for (FileStatus f : stats) { if (f.isFile()) { long fileLength = f.getLen(); if (fileLength == 0) { String message = String.format("文件[%s]长度为0,将会跳过不作处理!", hdfsPath); LOG.warn(message); } else if (BooleanUtils.isTrue(this.skipEmptyOrcFile) && this.orcFileEmptySize != null && fileLength <= this.orcFileEmptySize) { String message = String.format("The orc file [%s] is empty, file size: %s, DataX will skip it !", f.getPath().toString(), fileLength); LOG.warn(message); } else { addSourceFileByType(f.getPath().toString()); } } else if (f.isDirectory()) { getHDFSAllFilesNORegex(f.getPath().toString(), hdfs); } } } else { getHDFSAllFilesNORegex(hdfsPath, hdfs); } return sourceHDFSAllFilesList; } catch (IOException e) { String message = String.format("无法读取路径[%s]下的所有文件,请确认您的配置项fs.defaultFS, path的值是否正确," + "是否有读写权限,网络是否已断开!", hdfsPath); LOG.error(message); throw DataXException.asDataXException(HdfsReaderErrorCode.PATH_CONFIG_ERROR, e); } } private HashSet getHDFSAllFilesNORegex(String path, FileSystem hdfs) throws IOException { // 获取要读取的文件的根目录 Path listFiles = new Path(path); // If the network disconnected, this method will retry 45 times // each time the retry interval for 20 seconds // 获取要读取的文件的根目录的所有二级子文件目录 FileStatus stats[] = hdfs.listStatus(listFiles); for (FileStatus f : stats) { // 判断是不是目录,如果是目录,递归调用 if (f.isDirectory()) { LOG.info(String.format("[%s] 是目录, 递归获取该目录下的文件", f.getPath().toString())); getHDFSAllFilesNORegex(f.getPath().toString(), hdfs); } else if (f.isFile()) { long fileLength = f.getLen(); if (fileLength == 0) { String message = String.format("The file [%s] is empty, DataX will skip it !", f.getPath().toString()); LOG.warn(message); continue; } else if (BooleanUtils.isTrue(this.skipEmptyOrcFile) && this.orcFileEmptySize != null && fileLength <= this.orcFileEmptySize) { String message = String.format("The orc file [%s] is empty, file size: %s, DataX will skip it !", f.getPath().toString(), fileLength); LOG.warn(message); continue; } addSourceFileByType(f.getPath().toString()); } else { String message = String.format("该路径[%s]文件类型既不是目录也不是文件,插件自动忽略。", f.getPath().toString()); LOG.info(message); } } return sourceHDFSAllFilesList; } // 根据用户指定的文件类型,将指定的文件类型的路径加入sourceHDFSAllFilesList private void addSourceFileByType(String filePath) { // 检查file的类型和用户配置的fileType类型是否一致 boolean isMatchedFileType = checkHdfsFileType(filePath, this.specifiedFileType); if (isMatchedFileType) { LOG.info(String.format("[%s]是[%s]类型的文件, 将该文件加入source files列表", filePath, this.specifiedFileType)); sourceHDFSAllFilesList.add(filePath); } else { String message = String.format("文件[%s]的类型与用户配置的fileType类型不一致," + "请确认您配置的目录下面所有文件的类型均为[%s]" , filePath, this.specifiedFileType); LOG.error(message); throw DataXException.asDataXException( HdfsReaderErrorCode.FILE_TYPE_UNSUPPORT, message); } } public InputStream getInputStream(String filepath) { InputStream inputStream; Path path = new Path(filepath); try { FileSystem fs = FileSystem.get(hadoopConf); //If the network disconnected, this method will retry 45 times //each time the retry interval for 20 seconds inputStream = fs.open(path); return inputStream; } catch (IOException e) { String message = String.format("读取文件 : [%s] 时出错,请确认文件:[%s]存在且配置的用户有权限读取", filepath, filepath); throw DataXException.asDataXException(HdfsReaderErrorCode.READ_FILE_ERROR, message, e); } } public void sequenceFileStartRead(String sourceSequenceFilePath, Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector) { LOG.info(String.format("Start Read sequence file [%s].", sourceSequenceFilePath)); Path seqFilePath = new Path(sourceSequenceFilePath); SequenceFile.Reader reader = null; try { //获取SequenceFile.Reader实例 reader = new SequenceFile.Reader(this.hadoopConf, SequenceFile.Reader.file(seqFilePath)); //获取key 与 value Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), this.hadoopConf); Text value = new Text(); while (reader.next(key, value)) { if (StringUtils.isNotBlank(value.toString())) { UnstructuredStorageReaderUtil.transportOneRecord(recordSender, readerSliceConfig, taskPluginCollector, value.toString()); } } } catch (Exception e) { String message = String.format("SequenceFile.Reader读取文件[%s]时出错", sourceSequenceFilePath); LOG.error(message); throw DataXException.asDataXException(HdfsReaderErrorCode.READ_SEQUENCEFILE_ERROR, message, e); } finally { IOUtils.closeStream(reader); LOG.info("Finally, Close stream SequenceFile.Reader."); } } public void rcFileStartRead(String sourceRcFilePath, Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector) { LOG.info(String.format("Start Read rcfile [%s].", sourceRcFilePath)); List column = UnstructuredStorageReaderUtil .getListColumnEntry(readerSliceConfig, com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN); // warn: no default value '\N' String nullFormat = readerSliceConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.NULL_FORMAT); Path rcFilePath = new Path(sourceRcFilePath); FileSystem fs = null; RCFileRecordReader recordReader = null; try { fs = FileSystem.get(rcFilePath.toUri(), hadoopConf); long fileLen = fs.getFileStatus(rcFilePath).getLen(); FileSplit split = new FileSplit(rcFilePath, 0, fileLen, (String[]) null); recordReader = new RCFileRecordReader(hadoopConf, split); LongWritable key = new LongWritable(); BytesRefArrayWritable value = new BytesRefArrayWritable(); Text txt = new Text(); while (recordReader.next(key, value)) { String[] sourceLine = new String[value.size()]; txt.clear(); for (int i = 0; i < value.size(); i++) { BytesRefWritable v = value.get(i); txt.set(v.getData(), v.getStart(), v.getLength()); sourceLine[i] = txt.toString(); } UnstructuredStorageReaderUtil.transportOneRecord(recordSender, column, sourceLine, nullFormat, taskPluginCollector); } } catch (IOException e) { String message = String.format("读取文件[%s]时出错", sourceRcFilePath); LOG.error(message); throw DataXException.asDataXException(HdfsReaderErrorCode.READ_RCFILE_ERROR, message, e); } finally { try { if (recordReader != null) { recordReader.close(); LOG.info("Finally, Close RCFileRecordReader."); } } catch (IOException e) { LOG.warn(String.format("finally: 关闭RCFileRecordReader失败, %s", e.getMessage())); } } } public void orcFileStartRead(String sourceOrcFilePath, Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector) { LOG.info(String.format("Start Read orcfile [%s].", sourceOrcFilePath)); List column = UnstructuredStorageReaderUtil .getListColumnEntry(readerSliceConfig, com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN); String nullFormat = readerSliceConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.NULL_FORMAT); StringBuilder allColumns = new StringBuilder(); StringBuilder allColumnTypes = new StringBuilder(); boolean isReadAllColumns = false; int columnIndexMax = -1; // 判断是否读取所有列 if (null == column || column.size() == 0) { int allColumnsCount = getAllColumnsCount(sourceOrcFilePath); columnIndexMax = allColumnsCount - 1; isReadAllColumns = true; } else { columnIndexMax = getMaxIndex(column); } for (int i = 0; i <= columnIndexMax; i++) { allColumns.append("col"); allColumnTypes.append("string"); if (i != columnIndexMax) { allColumns.append(","); allColumnTypes.append(":"); } } if (columnIndexMax >= 0) { JobConf conf = new JobConf(hadoopConf); Path orcFilePath = new Path(sourceOrcFilePath); Properties p = new Properties(); p.setProperty("columns", allColumns.toString()); p.setProperty("columns.types", allColumnTypes.toString()); try { OrcSerde serde = new OrcSerde(); serde.initialize(conf, p); StructObjectInspector inspector = (StructObjectInspector) serde.getObjectInspector(); InputFormat in = new OrcInputFormat(); FileInputFormat.setInputPaths(conf, orcFilePath.toString()); //If the network disconnected, will retry 45 times, each time the retry interval for 20 seconds //Each file as a split //TODO multy threads // OrcInputFormat getSplits params numSplits not used, splits size = block numbers InputSplit[] splits; try { splits = in.getSplits(conf, 1); } catch (Exception splitException) { if (Boolean.TRUE.equals(this.skipEmptyOrcFile)) { boolean isOrcFileEmptyException = checkIsOrcEmptyFileExecption(splitException); if (isOrcFileEmptyException) { LOG.info("skipEmptyOrcFile: true, \"{}\" is an empty orc file, skip it!", sourceOrcFilePath); return; } } throw splitException; } for (InputSplit split : splits) { { RecordReader reader = in.getRecordReader(split, conf, Reporter.NULL); Object key = reader.createKey(); Object value = reader.createValue(); // 获取列信息 List fields = inspector.getAllStructFieldRefs(); List recordFields; while (reader.next(key, value)) { recordFields = new ArrayList(); for (int i = 0; i <= columnIndexMax; i++) { Object field = inspector.getStructFieldData(value, fields.get(i)); recordFields.add(field); } List hivePartitionColumnEntrys = UnstructuredStorageReaderUtil.getListColumnEntry(readerSliceConfig, com.alibaba.datax.plugin.unstructuredstorage.reader.Key.HIVE_PARTION_COLUMN); ArrayList hivePartitionColumns = new ArrayList<>(); hivePartitionColumns = UnstructuredStorageReaderUtil.getHivePartitionColumns(sourceOrcFilePath, hivePartitionColumnEntrys); transportOneRecord(column, recordFields, recordSender, taskPluginCollector, isReadAllColumns, nullFormat,hivePartitionColumns); } reader.close(); } } } catch (Exception e) { String message = String.format("从orcfile文件路径[%s]中读取数据发生异常,请联系系统管理员。" , sourceOrcFilePath); LOG.error(message); throw DataXException.asDataXException(HdfsReaderErrorCode.READ_FILE_ERROR, message); } } else { String message = String.format("请确认您所读取的列配置正确!columnIndexMax 小于0,column:%s", JSON.toJSONString(column)); throw DataXException.asDataXException(HdfsReaderErrorCode.BAD_CONFIG_VALUE, message); } } private boolean checkIsOrcEmptyFileExecption(Exception e) { if (e == null) { return false; } String fullStackTrace = ExceptionUtils.getStackTrace(e); if (fullStackTrace.contains("org.apache.orc.impl.ReaderImpl.getRawDataSizeOfColumn") && fullStackTrace.contains("Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1")) { return true; } return false; } private Record transportOneRecord(List columnConfigs, List recordFields , RecordSender recordSender, TaskPluginCollector taskPluginCollector, boolean isReadAllColumns, String nullFormat, ArrayList hiveParitionColumns) { Record record = recordSender.createRecord(); Column columnGenerated; try { if (isReadAllColumns) { // 读取所有列,创建都为String类型的column for (Object recordField : recordFields) { String columnValue = null; if (recordField != null) { columnValue = recordField.toString(); } columnGenerated = new StringColumn(columnValue); record.addColumn(columnGenerated); } } else { for (ColumnEntry columnConfig : columnConfigs) { String columnType = columnConfig.getType(); Integer columnIndex = columnConfig.getIndex(); String columnConst = columnConfig.getValue(); String columnValue = null; if (null != columnIndex) { if (null != recordFields.get(columnIndex)) columnValue = recordFields.get(columnIndex).toString(); } else { columnValue = columnConst; } Type type = Type.valueOf(columnType.toUpperCase()); // it's all ok if nullFormat is null if (StringUtils.equals(columnValue, nullFormat)) { columnValue = null; } switch (type) { case STRING: columnGenerated = new StringColumn(columnValue); break; case LONG: try { columnGenerated = new LongColumn(columnValue); } catch (Exception e) { throw new IllegalArgumentException(String.format( "类型转换错误, 无法将[%s] 转换为[%s]", columnValue, "LONG")); } break; case DOUBLE: try { columnGenerated = new DoubleColumn(columnValue); } catch (Exception e) { throw new IllegalArgumentException(String.format( "类型转换错误, 无法将[%s] 转换为[%s]", columnValue, "DOUBLE")); } break; case BOOLEAN: try { columnGenerated = new BoolColumn(columnValue); } catch (Exception e) { throw new IllegalArgumentException(String.format( "类型转换错误, 无法将[%s] 转换为[%s]", columnValue, "BOOLEAN")); } break; case DATE: try { if (columnValue == null) { columnGenerated = new DateColumn((Date) null); } else { String formatString = columnConfig.getFormat(); if (StringUtils.isNotBlank(formatString)) { // 用户自己配置的格式转换 SimpleDateFormat format = new SimpleDateFormat( formatString); columnGenerated = new DateColumn( format.parse(columnValue)); } else { // 框架尝试转换 columnGenerated = new DateColumn( new StringColumn(columnValue) .asDate()); } } } catch (Exception e) { throw new IllegalArgumentException(String.format( "类型转换错误, 无法将[%s] 转换为[%s]", columnValue, "DATE")); } break; default: String errorMessage = String.format( "您配置的列类型暂不支持 : [%s]", columnType); LOG.error(errorMessage); throw DataXException .asDataXException( UnstructuredStorageReaderErrorCode.NOT_SUPPORT_TYPE, errorMessage); } record.addColumn(columnGenerated); } } recordSender.sendToWriter(record); } catch (IllegalArgumentException iae) { taskPluginCollector .collectDirtyRecord(record, iae.getMessage()); } catch (IndexOutOfBoundsException ioe) { taskPluginCollector .collectDirtyRecord(record, ioe.getMessage()); } catch (Exception e) { if (e instanceof DataXException) { throw (DataXException) e; } // 每一种转换失败都是脏数据处理,包括数字格式 & 日期格式 taskPluginCollector.collectDirtyRecord(record, e.getMessage()); } return record; } private int getAllColumnsCount(String filePath) { Path path = new Path(filePath); try { Reader reader = OrcFile.createReader(path, OrcFile.readerOptions(hadoopConf)); return reader.getTypes().get(0).getSubtypesCount(); } catch (IOException e) { String message = "读取orcfile column列数失败,请联系系统管理员"; throw DataXException.asDataXException(HdfsReaderErrorCode.READ_FILE_ERROR, message); } } private int getMaxIndex(List columnConfigs) { int maxIndex = -1; for (ColumnEntry columnConfig : columnConfigs) { Integer columnIndex = columnConfig.getIndex(); if (columnIndex != null && columnIndex < 0) { String message = String.format("您column中配置的index不能小于0,请修改为正确的index,column配置:%s", JSON.toJSONString(columnConfigs)); LOG.error(message); throw DataXException.asDataXException(HdfsReaderErrorCode.CONFIG_INVALID_EXCEPTION, message); } else if (columnIndex != null && columnIndex > maxIndex) { maxIndex = columnIndex; } } return maxIndex; } private enum Type { STRING, LONG, BOOLEAN, DOUBLE, DATE, } public boolean checkHdfsFileType(String filepath, String specifiedFileType) { Path file = new Path(filepath); try { FileSystem fs = FileSystem.get(hadoopConf); FSDataInputStream in = fs.open(file); if (StringUtils.equalsIgnoreCase(specifiedFileType, Constant.CSV) || StringUtils.equalsIgnoreCase(specifiedFileType, Constant.TEXT)) { boolean isORC = isORCFile(file, fs, in);// 判断是否是 ORC File if (isORC) { return false; } boolean isRC = isRCFile(filepath, in);// 判断是否是 RC File if (isRC) { return false; } boolean isSEQ = isSequenceFile(filepath, in);// 判断是否是 Sequence File if (isSEQ) { return false; } // 如果不是ORC,RC和SEQ,则默认为是TEXT或CSV类型 return !isORC && !isRC && !isSEQ; } else if (StringUtils.equalsIgnoreCase(specifiedFileType, Constant.ORC)) { return isORCFile(file, fs, in); } else if (StringUtils.equalsIgnoreCase(specifiedFileType, Constant.RC)) { return isRCFile(filepath, in); } else if (StringUtils.equalsIgnoreCase(specifiedFileType, Constant.SEQ)) { return isSequenceFile(filepath, in); } else if (StringUtils.equalsIgnoreCase(specifiedFileType, Constant.PARQUET)) { return true; } } catch (Exception e) { String message = String.format("检查文件[%s]类型失败,目前支持ORC,SEQUENCE,RCFile,TEXT,CSV五种格式的文件," + "请检查您文件类型和文件是否正确。", filepath); LOG.error(message); throw DataXException.asDataXException(HdfsReaderErrorCode.READ_FILE_ERROR, message, e); } return false; } // 判断file是否是ORC File private boolean isORCFile(Path file, FileSystem fs, FSDataInputStream in) { try { // figure out the size of the file using the option or filesystem long size = fs.getFileStatus(file).getLen(); //read last bytes into buffer to get PostScript int readSize = (int) Math.min(size, DIRECTORY_SIZE_GUESS); in.seek(size - readSize); ByteBuffer buffer = ByteBuffer.allocate(readSize); in.readFully(buffer.array(), buffer.arrayOffset() + buffer.position(), buffer.remaining()); //read the PostScript //get length of PostScript int psLen = buffer.get(readSize - 1) & 0xff; int len = OrcFile.MAGIC.length(); if (psLen < len + 1) { return false; } int offset = buffer.arrayOffset() + buffer.position() + buffer.limit() - 1 - len; byte[] array = buffer.array(); // now look for the magic string at the end of the postscript. if (Text.decode(array, offset, len).equals(OrcFile.MAGIC)) { return true; } else { // If it isn't there, this may be the 0.11.0 version of ORC. // Read the first 3 bytes of the file to check for the header in.seek(0); byte[] header = new byte[len]; in.readFully(header, 0, len); // if it isn't there, this isn't an ORC file if (Text.decode(header, 0, len).equals(OrcFile.MAGIC)) { return true; } } } catch (IOException e) { LOG.info(String.format("检查文件类型: [%s] 不是ORC File.", file.toString())); } return false; } // 判断file是否是RC file private boolean isRCFile(String filepath, FSDataInputStream in) { // The first version of RCFile used the sequence file header. final byte[] ORIGINAL_MAGIC = new byte[]{(byte) 'S', (byte) 'E', (byte) 'Q'}; // The 'magic' bytes at the beginning of the RCFile final byte[] RC_MAGIC = new byte[]{(byte) 'R', (byte) 'C', (byte) 'F'}; // the version that was included with the original magic, which is mapped // into ORIGINAL_VERSION final byte ORIGINAL_MAGIC_VERSION_WITH_METADATA = 6; // All of the versions should be place in this list. final int ORIGINAL_VERSION = 0; // version with SEQ final int NEW_MAGIC_VERSION = 1; // version with RCF final int CURRENT_VERSION = NEW_MAGIC_VERSION; byte version; byte[] magic = new byte[RC_MAGIC.length]; try { in.seek(0); in.readFully(magic); if (Arrays.equals(magic, ORIGINAL_MAGIC)) { byte vers = in.readByte(); if (vers != ORIGINAL_MAGIC_VERSION_WITH_METADATA) { return false; } version = ORIGINAL_VERSION; } else { if (!Arrays.equals(magic, RC_MAGIC)) { return false; } // Set 'version' version = in.readByte(); if (version > CURRENT_VERSION) { return false; } } if (version == ORIGINAL_VERSION) { try { Class keyCls = hadoopConf.getClassByName(Text.readString(in)); Class valCls = hadoopConf.getClassByName(Text.readString(in)); if (!keyCls.equals(RCFile.KeyBuffer.class) || !valCls.equals(RCFile.ValueBuffer.class)) { return false; } } catch (ClassNotFoundException e) { return false; } } boolean decompress = in.readBoolean(); // is compressed? if (version == ORIGINAL_VERSION) { // is block-compressed? it should be always false. boolean blkCompressed = in.readBoolean(); if (blkCompressed) { return false; } } return true; } catch (IOException e) { LOG.info(String.format("检查文件类型: [%s] 不是RC File.", filepath)); } return false; } // 判断file是否是Sequence file private boolean isSequenceFile(String filepath, FSDataInputStream in) { byte[] SEQ_MAGIC = new byte[]{(byte) 'S', (byte) 'E', (byte) 'Q'}; byte[] magic = new byte[SEQ_MAGIC.length]; try { in.seek(0); in.readFully(magic); if (Arrays.equals(magic, SEQ_MAGIC)) { return true; } else { return false; } } catch (IOException e) { LOG.info(String.format("检查文件类型: [%s] 不是Sequence File.", filepath)); } return false; } public void parquetFileStartRead(String sourceParquetFilePath, Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector) { String schemaString = readerSliceConfig.getString(Key.PARQUET_SCHEMA); if (StringUtils.isNotBlank(schemaString)) { LOG.info("You config parquet schema, use it {}", schemaString); } else { schemaString = getParquetSchema(sourceParquetFilePath, hadoopConf); LOG.info("Parquet schema parsed from: {} , schema is {}", sourceParquetFilePath, schemaString); if (StringUtils.isBlank(schemaString)) { throw DataXException.asDataXException("ParquetSchema is required, please check your config"); } } MessageType parquetSchema = null; List parquetTypes = null; Map parquetMetaMap = null; int fieldCount = 0; try { parquetSchema = MessageTypeParser.parseMessageType(schemaString); fieldCount = parquetSchema.getFieldCount(); parquetTypes = parquetSchema.getFields(); parquetMetaMap = ParquetMessageHelper.parseParquetTypes(parquetTypes); } catch (Exception e) { String message = String.format("Error parsing to MessageType via Schema string [%s]", schemaString); LOG.error(message); throw DataXException.asDataXException(HdfsReaderErrorCode.PARSE_MESSAGE_TYPE_FROM_SCHEMA_ERROR, e); } List column = UnstructuredStorageReaderUtil.getListColumnEntry(readerSliceConfig, com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN); String nullFormat = readerSliceConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.NULL_FORMAT); boolean isUtcTimestamp = readerSliceConfig.getBool(Key.PARQUET_UTC_TIMESTAMP, false); boolean isReadAllColumns = (column == null || column.size() == 0) ? true : false; LOG.info("ReadingAllColums: " + isReadAllColumns); /** * 支持 hive 表中间加列场景 * * 开关默认 false,在 hive表存在中间加列的场景打开,需要根据 name排序 * 不默认打开的原因 * 1、存量hdfs任务,只根据 index获取字段,无name字段配置 * 2、中间加列场景比较少 * 3、存量任务可能存在列错位的问题,不能随意纠正 */ boolean supportAddMiddleColumn = readerSliceConfig.getBool(Key.SUPPORT_ADD_MIDDLE_COLUMN, false); boolean printNullValueException = readerSliceConfig.getBool("printNullValueException", false); List ignoreIndex = readerSliceConfig.getList("ignoreIndex", new ArrayList(), Integer.class); JobConf conf = new JobConf(hadoopConf); ParquetReader reader = null; try { Path parquetFilePath = new Path(sourceParquetFilePath); GroupReadSupport readSupport = new GroupReadSupport(); readSupport.init(conf, null, parquetSchema); // 这里初始化parquetReader的时候,会getFileSystem,如果是HA集群,期间会根据hadoopConfig中区加载failover类,这里初始化builder带上conf ParquetReader.Builder parquetReaderBuilder = ParquetReader.builder(readSupport, parquetFilePath); parquetReaderBuilder.withConf(hadoopConf); reader = parquetReaderBuilder.build(); Group g = null; // 从文件名中解析分区信息 List hivePartitionColumnEntrys = UnstructuredStorageReaderUtil.getListColumnEntry(readerSliceConfig, com.alibaba.datax.plugin.unstructuredstorage.reader.Key.HIVE_PARTION_COLUMN); ArrayList hivePartitionColumns = new ArrayList<>(); hivePartitionColumns = UnstructuredStorageReaderUtil.getHivePartitionColumns(sourceParquetFilePath, hivePartitionColumnEntrys); List schemaFieldList = null; Map colNameIndexMap = null; Map indexMap = null; if (supportAddMiddleColumn) { boolean nonName = column.stream().anyMatch(columnEntry -> StringUtils.isEmpty(columnEntry.getName())); if (nonName) { throw new DataXException("You configured column item without name, please correct it"); } List parquetFileFields = getParquetFileFields(parquetFilePath, hadoopConf); schemaFieldList = parquetFileFields.stream().map(org.apache.parquet.schema.Type::getName).collect(Collectors.toList()); colNameIndexMap = new ConcurrentHashMap<>(); Map finalColNameIndexMap = colNameIndexMap; column.forEach(columnEntry -> finalColNameIndexMap.put(columnEntry.getIndex(), columnEntry.getName())); Iterator> iterator = finalColNameIndexMap.entrySet().iterator(); while (iterator.hasNext()) { Map.Entry next = iterator.next(); if (!schemaFieldList.contains(next.getValue())) { finalColNameIndexMap.remove((next.getKey())); } } LOG.info("SupportAddMiddleColumn is true, fields from parquet file is {}, " + "colNameIndexMap is {}", JSON.toJSONString(schemaFieldList), JSON.toJSONString(colNameIndexMap)); fieldCount = column.size(); indexMap = new HashMap<>(); for (int j = 0; j < fieldCount; j++) { if (colNameIndexMap.containsKey(j)) { int index = findIndex(schemaFieldList, findEleInMap(colNameIndexMap, j)); indexMap.put(j, index); } } } while ((g = reader.read()) != null) { List formattedRecord = new ArrayList(fieldCount); try { for (int j = 0; j < fieldCount; j++) { Object data = null; try { if (null != ignoreIndex && !ignoreIndex.isEmpty() && ignoreIndex.contains(j)) { data = null; } else { if (supportAddMiddleColumn) { if (!colNameIndexMap.containsKey(j)) { formattedRecord.add(null); continue; } else { data = DFSUtil.this.readFields(g, parquetTypes.get(indexMap.get(j)), indexMap.get(j), parquetMetaMap, isUtcTimestamp); } } else { data = DFSUtil.this.readFields(g, parquetTypes.get(j), j, parquetMetaMap, isUtcTimestamp); } } } catch (RuntimeException e) { if (printNullValueException) { LOG.warn(e.getMessage()); } } formattedRecord.add(data); } transportOneRecord(column, formattedRecord, recordSender, taskPluginCollector, isReadAllColumns, nullFormat, hivePartitionColumns); } catch (Exception e) { throw DataXException.asDataXException(HdfsReaderErrorCode.READ_PARQUET_ERROR, e); } } } catch (Exception e) { throw DataXException.asDataXException(HdfsReaderErrorCode.READ_PARQUET_ERROR, e); } finally { org.apache.commons.io.IOUtils.closeQuietly(reader); } } private String findEleInMap(Map map, Integer key) { Iterator> iterator = map.entrySet().iterator(); while (iterator.hasNext()) { Map.Entry next = iterator.next(); if (key.equals(next.getKey())) { return next.getValue(); } } return null; } private int findIndex(List schemaFieldList, String colName) { for (int i = 0; i < schemaFieldList.size(); i++) { if (schemaFieldList.get(i).equals(colName)) { return i; } } return -1; } private List getParquetFileFields(Path filePath, org.apache.hadoop.conf.Configuration configuration) { try (org.apache.parquet.hadoop.ParquetFileReader reader = org.apache.parquet.hadoop.ParquetFileReader.open(HadoopInputFile.fromPath(filePath, configuration))) { org.apache.parquet.schema.MessageType schema = reader.getFooter().getFileMetaData().getSchema(); List fields = schema.getFields(); return fields; } catch (IOException e) { LOG.error("Fetch parquet field error", e); throw new DataXException(String.format("Fetch parquet field error, msg is %s", e.getMessage())); } } private String getParquetSchema(String sourceParquetFilePath, org.apache.hadoop.conf.Configuration hadoopConf) { GroupReadSupport readSupport = new GroupReadSupport(); ParquetReader.Builder parquetReaderBuilder = ParquetReader.builder(readSupport, new Path(sourceParquetFilePath)); ParquetReader reader = null; try { parquetReaderBuilder.withConf(hadoopConf); reader = parquetReaderBuilder.build(); Group g = null; if ((g = reader.read()) != null) { return g.getType().toString(); } } catch (Throwable e) { LOG.error("Inner error, getParquetSchema failed, message is {}", e.getMessage()); } finally { org.apache.commons.io.IOUtils.closeQuietly(reader); } return null; } /** * parquet 相关 */ private static final int JULIAN_EPOCH_OFFSET_DAYS = 2440588; private static final long MILLIS_IN_DAY = TimeUnit.DAYS.toMillis(1); private static final long NANOS_PER_MILLISECOND = TimeUnit.MILLISECONDS.toNanos(1); private long julianDayToMillis(int julianDay) { return (julianDay - JULIAN_EPOCH_OFFSET_DAYS) * MILLIS_IN_DAY; } private org.apache.parquet.schema.OriginalType getOriginalType(org.apache.parquet.schema.Type type, Map parquetMetaMap) { ParquetMeta meta = parquetMetaMap.get(type.getName()); return meta.getOriginalType(); } private org.apache.parquet.schema.PrimitiveType asPrimitiveType(org.apache.parquet.schema.Type type, Map parquetMetaMap) { ParquetMeta meta = parquetMetaMap.get(type.getName()); return meta.getPrimitiveType(); } private Object readFields(Group g, org.apache.parquet.schema.Type type, int index, Map parquetMetaMap, boolean isUtcTimestamp) { if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.MAP) { Group groupData = g.getGroup(index, 0); List parquetTypes = groupData.getType().getFields(); JSONObject data = new JSONObject(); for (int i = 0; i < parquetTypes.size(); i++) { int j = groupData.getFieldRepetitionCount(i); // map key value 的对数 for (int k = 0; k < j; k++) { Group groupDataK = groupData.getGroup(0, k); List parquetTypesK = groupDataK.getType().getFields(); if (2 != parquetTypesK.size()) { // warn: 不是key value成对出现 throw new RuntimeException(String.format("bad parquet map type: %s", groupData.getValueToString(index, 0))); } Object subDataKey = this.readFields(groupDataK, parquetTypesK.get(0), 0, parquetMetaMap, isUtcTimestamp); Object subDataValue = this.readFields(groupDataK, parquetTypesK.get(1), 1, parquetMetaMap, isUtcTimestamp); if (StringUtils.equalsIgnoreCase("key", parquetTypesK.get(0).getName())) { ((JSONObject) data).put(subDataKey.toString(), subDataValue); } else { ((JSONObject) data).put(subDataValue.toString(), subDataKey); } } } return data; } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.MAP_KEY_VALUE) { Group groupData = g.getGroup(index, 0); List parquetTypes = groupData.getType().getFields(); JSONObject data = new JSONObject(); for (int i = 0; i < parquetTypes.size(); i++) { int j = groupData.getFieldRepetitionCount(i); // map key value 的对数 for (int k = 0; k < j; k++) { Group groupDataK = groupData.getGroup(0, k); List parquetTypesK = groupDataK.getType().getFields(); if (2 != parquetTypesK.size()) { // warn: 不是key value成对出现 throw new RuntimeException(String.format("bad parquet map type: %s", groupData.getValueToString(index, 0))); } Object subDataKey = this.readFields(groupDataK, parquetTypesK.get(0), 0, parquetMetaMap, isUtcTimestamp); Object subDataValue = this.readFields(groupDataK, parquetTypesK.get(1), 1, parquetMetaMap, isUtcTimestamp); if (StringUtils.equalsIgnoreCase("key", parquetTypesK.get(0).getName())) { ((JSONObject) data).put(subDataKey.toString(), subDataValue); } else { ((JSONObject) data).put(subDataValue.toString(), subDataKey); } } } return data; } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.LIST) { Group groupData = g.getGroup(index, 0); List parquetTypes = groupData.getType().getFields(); JSONArray data = new JSONArray(); for (int i = 0; i < parquetTypes.size(); i++) { Object subData = this.readFields(groupData, parquetTypes.get(i), i, parquetMetaMap, isUtcTimestamp); data.add(subData); } return data; } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.DECIMAL) { Binary binaryDate = g.getBinary(index, 0); if (null == binaryDate) { return null; } else { org.apache.hadoop.hive.serde2.io.HiveDecimalWritable decimalWritable = new org.apache.hadoop.hive.serde2.io.HiveDecimalWritable(binaryDate.getBytes(), this.asPrimitiveType(type, parquetMetaMap).getDecimalMetadata().getScale()); // g.getType().getFields().get(1).asPrimitiveType().getDecimalMetadata().getScale() HiveDecimal hiveDecimal = decimalWritable.getHiveDecimal(); if (null == hiveDecimal) { return null; } else { return hiveDecimal.bigDecimalValue(); } // return decimalWritable.doubleValue(); } } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.DATE) { return java.sql.Date.valueOf(LocalDate.ofEpochDay(g.getInteger(index, 0))); } else if (this.getOriginalType(type, parquetMetaMap) == org.apache.parquet.schema.OriginalType.UTF8) { return g.getValueToString(index, 0); } else { if (type.isPrimitive()) { PrimitiveType.PrimitiveTypeName primitiveTypeName = this.asPrimitiveType(type, parquetMetaMap).getPrimitiveTypeName(); if (PrimitiveType.PrimitiveTypeName.BINARY == primitiveTypeName) { return g.getValueToString(index, 0); } else if (PrimitiveType.PrimitiveTypeName.BOOLEAN == primitiveTypeName) { return g.getValueToString(index, 0); } else if (PrimitiveType.PrimitiveTypeName.DOUBLE == primitiveTypeName) { return g.getValueToString(index, 0); } else if (PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY == primitiveTypeName) { return g.getValueToString(index, 0); } else if (PrimitiveType.PrimitiveTypeName.FLOAT == primitiveTypeName) { return g.getValueToString(index, 0); } else if (PrimitiveType.PrimitiveTypeName.INT32 == primitiveTypeName) { return g.getValueToString(index, 0); } else if (PrimitiveType.PrimitiveTypeName.INT64 == primitiveTypeName) { return g.getValueToString(index, 0); } else if (PrimitiveType.PrimitiveTypeName.INT96 == primitiveTypeName) { Binary dataInt96 = g.getInt96(index, 0); if (null == dataInt96) { return null; } else { ByteBuffer buf = dataInt96.toByteBuffer(); buf.order(ByteOrder.LITTLE_ENDIAN); long timeOfDayNanos = buf.getLong(); int julianDay = buf.getInt(); if (isUtcTimestamp) { // UTC LocalDate localDate = LocalDate.ofEpochDay(julianDay - JULIAN_EPOCH_OFFSET_DAYS); LocalTime localTime = LocalTime.ofNanoOfDay(timeOfDayNanos); return Timestamp.valueOf(LocalDateTime.of(localDate, localTime)); } else { // local time long mills = julianDayToMillis(julianDay) + (timeOfDayNanos / NANOS_PER_MILLISECOND); Timestamp timestamp = new Timestamp(mills); timestamp.setNanos((int) (timeOfDayNanos % TimeUnit.SECONDS.toNanos(1))); return timestamp; } } } else { return g.getValueToString(index, 0); } } else { return g.getValueToString(index, 0); } } } } ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsFileType.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; /** * Created by mingya.wmy on 2015/8/22. * */ public enum HdfsFileType { ORC, SEQ, RC, CSV, TEXT, } ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsPathFilter.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.PathFilter; /** * Created by wmy on 16/11/29. */ public class HdfsPathFilter implements PathFilter { private String regex = null; public HdfsPathFilter(String regex) { this.regex = regex; } @Override public boolean accept(Path path) { return regex != null ? path.getName().matches(regex) : true; } } ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReader.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; import org.apache.commons.io.Charsets; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.InputStream; import java.nio.charset.UnsupportedCharsetException; import java.util.ArrayList; import java.util.HashSet; import java.util.List; public class HdfsReader extends Reader { /** * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 *

* 整个 Reader 执行流程是: *

     * Job类init-->prepare-->split
     *
     * Task类init-->prepare-->startRead-->post-->destroy
     * Task类init-->prepare-->startRead-->post-->destroy
     *
     * Job类post-->destroy
     * 
*/ public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private Configuration readerOriginConfig = null; private String encoding = null; private HashSet sourceFiles; private String specifiedFileType = null; private DFSUtil dfsUtil = null; private List path = null; private boolean skipEmptyOrcFile = false; private Integer orcFileEmptySize = null; @Override public void init() { LOG.info("init() begin..."); this.readerOriginConfig = super.getPluginJobConf(); this.validate(); dfsUtil = new DFSUtil(this.readerOriginConfig); LOG.info("init() ok and end..."); } public void validate(){ this.readerOriginConfig.getNecessaryValue(Key.DEFAULT_FS, HdfsReaderErrorCode.DEFAULT_FS_NOT_FIND_ERROR); // path check String pathInString = this.readerOriginConfig.getNecessaryValue(Key.PATH, HdfsReaderErrorCode.REQUIRED_VALUE); if (!pathInString.startsWith("[") && !pathInString.endsWith("]")) { path = new ArrayList(); path.add(pathInString); } else { path = this.readerOriginConfig.getList(Key.PATH, String.class); if (null == path || path.size() == 0) { throw DataXException.asDataXException(HdfsReaderErrorCode.REQUIRED_VALUE, "您需要指定待读取的源目录或文件"); } for (String eachPath : path) { if(!eachPath.startsWith("/")){ String message = String.format("请检查参数path:[%s],需要配置为绝对路径", eachPath); LOG.error(message); throw DataXException.asDataXException(HdfsReaderErrorCode.ILLEGAL_VALUE, message); } } } specifiedFileType = this.readerOriginConfig.getNecessaryValue(Key.FILETYPE, HdfsReaderErrorCode.REQUIRED_VALUE); if( !specifiedFileType.equalsIgnoreCase(Constant.ORC) && !specifiedFileType.equalsIgnoreCase(Constant.TEXT) && !specifiedFileType.equalsIgnoreCase(Constant.CSV) && !specifiedFileType.equalsIgnoreCase(Constant.SEQ) && !specifiedFileType.equalsIgnoreCase(Constant.RC) && !specifiedFileType.equalsIgnoreCase(Constant.PARQUET)){ String message = "HdfsReader插件目前支持ORC, TEXT, CSV, SEQUENCE, RC, PARQUET 六种格式的文件," + "请将fileType选项的值配置为ORC, TEXT, CSV, SEQUENCE,RC 和 PARQUET"; throw DataXException.asDataXException(HdfsReaderErrorCode.FILE_TYPE_ERROR, message); } encoding = this.readerOriginConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.ENCODING, "UTF-8"); try { Charsets.toCharset(encoding); } catch (UnsupportedCharsetException uce) { throw DataXException.asDataXException( HdfsReaderErrorCode.ILLEGAL_VALUE, String.format("不支持的编码格式 : [%s]", encoding), uce); } catch (Exception e) { throw DataXException.asDataXException( HdfsReaderErrorCode.ILLEGAL_VALUE, String.format("运行配置异常 : %s", e.getMessage()), e); } //check Kerberos Boolean haveKerberos = this.readerOriginConfig.getBool(Key.HAVE_KERBEROS, false); if(haveKerberos) { this.readerOriginConfig.getNecessaryValue(Key.KERBEROS_KEYTAB_FILE_PATH, HdfsReaderErrorCode.REQUIRED_VALUE); this.readerOriginConfig.getNecessaryValue(Key.KERBEROS_PRINCIPAL, HdfsReaderErrorCode.REQUIRED_VALUE); } // validate the Columns validateColumns(); if(this.specifiedFileType.equalsIgnoreCase(Constant.CSV)){ //compress校验 UnstructuredStorageReaderUtil.validateCompress(this.readerOriginConfig); UnstructuredStorageReaderUtil.validateCsvReaderConfig(this.readerOriginConfig); } if (this.specifiedFileType.equalsIgnoreCase(Constant.ORC)) { skipEmptyOrcFile = this.readerOriginConfig.getBool(Key.SKIP_EMPTY_ORCFILE, false); orcFileEmptySize = this.readerOriginConfig.getInt(Key.ORCFILE_EMPTYSIZE); //将orcFileEmptySize必填项检查去掉,仅需要配置skipEmptyOrcFile即可,考虑历史任务兼容性(For中华保险),保留orcFileEmptySize参数配置 //if (skipEmptyOrcFile && orcFileEmptySize == null) { // throw new IllegalArgumentException("When \"skipEmptyOrcFile\" is configured, " // + "parameter \"orcFileEmptySize\" cannot be null."); //} } LOG.info("skipEmptyOrcFile: {}, orcFileEmptySize: {}", skipEmptyOrcFile, orcFileEmptySize); } private void validateColumns(){ // 检测是column 是否为 ["*"] 若是则填为空 List column = this.readerOriginConfig .getListConfiguration(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN); if (null != column && 1 == column.size() && ("\"*\"".equals(column.get(0).toString()) || "'*'" .equals(column.get(0).toString()))) { readerOriginConfig .set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN, new ArrayList()); } else { // column: 1. index type 2.value type 3.when type is Data, may have format List columns = this.readerOriginConfig .getListConfiguration(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN); if (null == columns || columns.size() == 0) { throw DataXException.asDataXException( HdfsReaderErrorCode.CONFIG_INVALID_EXCEPTION, "您需要指定 columns"); } if (null != columns && columns.size() != 0) { for (Configuration eachColumnConf : columns) { eachColumnConf.getNecessaryValue(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.TYPE, HdfsReaderErrorCode.REQUIRED_VALUE); Integer columnIndex = eachColumnConf.getInt(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.INDEX); String columnValue = eachColumnConf.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.VALUE); if (null == columnIndex && null == columnValue) { throw DataXException.asDataXException( HdfsReaderErrorCode.NO_INDEX_VALUE, "由于您配置了type, 则至少需要配置 index 或 value"); } if (null != columnIndex && null != columnValue) { throw DataXException.asDataXException( HdfsReaderErrorCode.MIXED_INDEX_VALUE, "您混合配置了index, value, 每一列同时仅能选择其中一种"); } } } } } @Override public void prepare() { LOG.info("prepare(), start to getAllFiles..."); this.sourceFiles = dfsUtil.getAllFiles(path, specifiedFileType,skipEmptyOrcFile, orcFileEmptySize); LOG.info(String.format("您即将读取的文件数为: [%s], 列表为: [%s]", this.sourceFiles.size(), StringUtils.join(this.sourceFiles, ","))); } @Override public List split(int adviceNumber) { LOG.info("split() begin..."); List readerSplitConfigs = new ArrayList(); // warn:每个slice拖且仅拖一个文件, // int splitNumber = adviceNumber; int splitNumber = this.sourceFiles.size(); if (0 == splitNumber) { throw DataXException.asDataXException(HdfsReaderErrorCode.EMPTY_DIR_EXCEPTION, String.format("未能找到待读取的文件,请确认您的配置项path: %s", this.readerOriginConfig.getString(Key.PATH))); } List> splitedSourceFiles = this.splitSourceFiles(new ArrayList(this.sourceFiles), splitNumber); for (List files : splitedSourceFiles) { Configuration splitedConfig = this.readerOriginConfig.clone(); splitedConfig.set(Constant.SOURCE_FILES, files); readerSplitConfigs.add(splitedConfig); } return readerSplitConfigs; } private List> splitSourceFiles(final List sourceList, int adviceNumber) { List> splitedList = new ArrayList>(); int averageLength = sourceList.size() / adviceNumber; averageLength = averageLength == 0 ? 1 : averageLength; for (int begin = 0, end = 0; begin < sourceList.size(); begin = end) { end = begin + averageLength; if (end > sourceList.size()) { end = sourceList.size(); } splitedList.add(sourceList.subList(begin, end)); } return splitedList; } @Override public void post() { } @Override public void destroy() { } } public static class Task extends Reader.Task { private static Logger LOG = LoggerFactory.getLogger(Reader.Task.class); private Configuration taskConfig; private List sourceFiles; private String specifiedFileType; private String encoding; private DFSUtil dfsUtil = null; private int bufferSize; @Override public void init() { this.taskConfig = super.getPluginJobConf(); this.sourceFiles = this.taskConfig.getList(Constant.SOURCE_FILES, String.class); this.specifiedFileType = this.taskConfig.getNecessaryValue(Key.FILETYPE, HdfsReaderErrorCode.REQUIRED_VALUE); this.encoding = this.taskConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.ENCODING, "UTF-8"); this.dfsUtil = new DFSUtil(this.taskConfig); this.bufferSize = this.taskConfig.getInt(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.BUFFER_SIZE, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.DEFAULT_BUFFER_SIZE); } @Override public void prepare() { } @Override public void startRead(RecordSender recordSender) { LOG.info("read start"); for (String sourceFile : this.sourceFiles) { LOG.info(String.format("reading file : [%s]", sourceFile)); if(specifiedFileType.equalsIgnoreCase(Constant.TEXT) || specifiedFileType.equalsIgnoreCase(Constant.CSV)) { InputStream inputStream = dfsUtil.getInputStream(sourceFile); UnstructuredStorageReaderUtil.readFromStream(inputStream, sourceFile, this.taskConfig, recordSender, this.getTaskPluginCollector()); }else if(specifiedFileType.equalsIgnoreCase(Constant.ORC)){ dfsUtil.orcFileStartRead(sourceFile, this.taskConfig, recordSender, this.getTaskPluginCollector()); }else if(specifiedFileType.equalsIgnoreCase(Constant.SEQ)){ dfsUtil.sequenceFileStartRead(sourceFile, this.taskConfig, recordSender, this.getTaskPluginCollector()); }else if(specifiedFileType.equalsIgnoreCase(Constant.RC)){ dfsUtil.rcFileStartRead(sourceFile, this.taskConfig, recordSender, this.getTaskPluginCollector()); } else if (specifiedFileType.equalsIgnoreCase(Constant.PARQUET)) { dfsUtil.parquetFileStartRead(sourceFile, this.taskConfig, recordSender, this.getTaskPluginCollector()); } else { String message = "HdfsReader插件目前支持ORC, TEXT, CSV, SEQUENCE, RC五种格式的文件," + "请将fileType选项的值配置为ORC, TEXT, CSV, SEQUENCE 或者 RC"; throw DataXException.asDataXException(HdfsReaderErrorCode.FILE_TYPE_UNSUPPORT, message); } if(recordSender != null){ recordSender.flush(); } } LOG.info("end read source files..."); } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/HdfsReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; import com.alibaba.datax.common.spi.ErrorCode; public enum HdfsReaderErrorCode implements ErrorCode { BAD_CONFIG_VALUE("HdfsReader-00", "您配置的值不合法."), PATH_NOT_FIND_ERROR("HdfsReader-01", "您未配置path值"), DEFAULT_FS_NOT_FIND_ERROR("HdfsReader-02", "您未配置defaultFS值"), ILLEGAL_VALUE("HdfsReader-03", "值错误"), CONFIG_INVALID_EXCEPTION("HdfsReader-04", "参数配置错误"), REQUIRED_VALUE("HdfsReader-05", "您缺失了必须填写的参数值."), NO_INDEX_VALUE("HdfsReader-06","没有 Index" ), MIXED_INDEX_VALUE("HdfsReader-07","index 和 value 混合" ), EMPTY_DIR_EXCEPTION("HdfsReader-08", "您尝试读取的文件目录为空."), PATH_CONFIG_ERROR("HdfsReader-09", "您配置的path格式有误"), READ_FILE_ERROR("HdfsReader-10", "读取文件出错"), MALFORMED_ORC_ERROR("HdfsReader-10", "ORCFILE格式异常"), FILE_TYPE_ERROR("HdfsReader-11", "文件类型配置错误"), FILE_TYPE_UNSUPPORT("HdfsReader-12", "文件类型目前不支持"), KERBEROS_LOGIN_ERROR("HdfsReader-13", "KERBEROS认证失败"), READ_SEQUENCEFILE_ERROR("HdfsReader-14", "读取SequenceFile文件出错"), READ_RCFILE_ERROR("HdfsReader-15", "读取RCFile文件出错"), INIT_RCFILE_SERDE_ERROR("HdfsReader-16", "Deserialize RCFile, initialization failed!"), PARSE_MESSAGE_TYPE_FROM_SCHEMA_ERROR("HdfsReader-17", "Error parsing ParquetSchema"), INVALID_PARQUET_SCHEMA("HdfsReader-18", "ParquetSchema is invalid"), READ_PARQUET_ERROR("HdfsReader-19", "Error reading Parquet file"), CONNECT_HDFS_IO_ERROR("HdfsReader-20", "I/O exception in establishing connection with HDFS"); private final String code; private final String description; private HdfsReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; public final class Key { /** * 此处声明插件用到的需要插件使用者提供的配置项 */ public final static String PATH = "path"; public final static String DEFAULT_FS = "defaultFS"; public final static String HIVE_VERSION = "hiveVersion"; public static final String FILETYPE = "fileType"; public static final String HADOOP_CONFIG = "hadoopConfig"; public static final String HAVE_KERBEROS = "haveKerberos"; public static final String KERBEROS_KEYTAB_FILE_PATH = "kerberosKeytabFilePath"; public static final String KERBEROS_CONF_FILE_PATH = "kerberosConfFilePath"; public static final String KERBEROS_PRINCIPAL = "kerberosPrincipal"; public static final String PATH_FILTER = "pathFilter"; public static final String PARQUET_SCHEMA = "parquetSchema"; /** * hive 3.x 或 cdh高版本,使用UTC时区存储时间戳,如果发现时区偏移,该配置项要配置成 true */ public static final String PARQUET_UTC_TIMESTAMP = "parquetUtcTimestamp"; public static final String SUCCESS_ON_NO_FILE = "successOnNoFile"; public static final String PROTECTION = "protection"; /** * 用于显示地指定hdfs客户端的用户名 */ public static final String HDFS_USERNAME = "hdfsUsername"; /** * ORC FILE空文件大小 */ public static final String ORCFILE_EMPTYSIZE = "orcFileEmptySize"; /** * 是否跳过空的OrcFile */ public static final String SKIP_EMPTY_ORCFILE = "skipEmptyOrcFile"; /** * 是否跳过 orc meta 信息 */ public static final String SKIP_ORC_META = "skipOrcMetaInfo"; /** * 过滤_或者.开头的文件 */ public static final String REGEX_PATTERN = "^.*[/][^._].*"; public static final String FILTER_TAG_FILE = "filterTagFile"; // high level params refs https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/user/4.x/4.4.0/oss/configuration/jindosdk_configuration_list.md // public static final String FS_OSS_DOWNLOAD_QUEUE_SIZE = "ossDownloadQueueSize"; // public static final String FS_OSS_DOWNLOAD_THREAD_CONCURRENCY = "ossDownloadThreadConcurrency"; public static final String FS_OSS_READ_READAHEAD_BUFFER_COUNT = "ossDownloadBufferCount"; public static final String FILE_SYSTEM_TYPE = "fileSystemType"; public static final String CDH_3_X_HIVE_VERSION = "3.1.3-cdh"; public static final String SUPPORT_ADD_MIDDLE_COLUMN = "supportAddMiddleColumn"; } ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/ParquetMessageHelper.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; import org.apache.parquet.schema.OriginalType; import org.apache.parquet.schema.PrimitiveType; import java.util.HashMap; import java.util.List; import java.util.Map; /** * @author jitongchen * @date 2023/9/7 10:20 AM */ public class ParquetMessageHelper { public static Map parseParquetTypes(List parquetTypes) { int fieldCount = parquetTypes.size(); Map parquetMetaMap = new HashMap(); for (int i = 0; i < fieldCount; i++) { org.apache.parquet.schema.Type type = parquetTypes.get(i); String name = type.getName(); ParquetMeta parquetMeta = new ParquetMeta(); parquetMeta.setName(name); OriginalType originalType = type.getOriginalType(); parquetMeta.setOriginalType(originalType); if (type.isPrimitive()) { PrimitiveType primitiveType = type.asPrimitiveType(); parquetMeta.setPrimitiveType(primitiveType); } parquetMetaMap.put(name, parquetMeta); } return parquetMetaMap; } } ================================================ FILE: hdfsreader/src/main/java/com/alibaba/datax/plugin/reader/hdfsreader/ParquetMeta.java ================================================ package com.alibaba.datax.plugin.reader.hdfsreader; import org.apache.parquet.schema.OriginalType; import org.apache.parquet.schema.PrimitiveType; /** * @author jitongchen * @date 2023/9/7 10:20 AM */ public class ParquetMeta { private String name; private OriginalType originalType; private PrimitiveType primitiveType; public String getName() { return name; } public void setName(String name) { this.name = name; } public OriginalType getOriginalType() { return originalType; } public void setOriginalType(OriginalType originalType) { this.originalType = originalType; } public PrimitiveType getPrimitiveType() { return primitiveType; } public void setPrimitiveType(PrimitiveType primitiveType) { this.primitiveType = primitiveType; } } ================================================ FILE: hdfsreader/src/main/resources/plugin.json ================================================ { "name": "hdfsreader", "class": "com.alibaba.datax.plugin.reader.hdfsreader.HdfsReader", "description": "useScene: test. mechanism: use datax framework to transport data from hdfs. warn: The more you know about the data, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: hdfsreader/src/main/resources/plugin_job_template.json ================================================ { "name": "hdfsreader", "parameter": { "path": "", "defaultFS": "", "column": [], "fileType": "orc", "encoding": "UTF-8", "fieldDelimiter": "," } } ================================================ FILE: hdfswriter/doc/hdfswriter.md ================================================ # DataX HdfsWriter 插件文档 ------------ ## 1 快速介绍 HdfsWriter提供向HDFS文件系统指定路径中写入TEXTFile文件和ORCFile文件,文件内容可与hive中表关联。 ## 2 功能与限制 * (1)、目前HdfsWriter仅支持textfile和orcfile两种格式的文件,且文件内容存放的必须是一张逻辑意义上的二维表; * (2)、由于HDFS是文件系统,不存在schema的概念,因此不支持对部分列写入; * (3)、目前仅支持与以下Hive数据类型: 数值型:TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE 字符串类型:STRING,VARCHAR,CHAR 布尔类型:BOOLEAN 时间类型:DATE,TIMESTAMP **目前不支持:decimal、binary、arrays、maps、structs、union类型**; * (4)、对于Hive分区表目前仅支持一次写入单个分区; * (5)、对于textfile需用户保证写入hdfs文件的分隔符**与在Hive上创建表时的分隔符一致**,从而实现写入hdfs数据与Hive表字段关联; * (6)、HdfsWriter实现过程是:首先根据用户指定的path,创建一个hdfs文件系统上不存在的临时目录,创建规则:path_随机;然后将读取的文件写入这个临时目录;全部写入后再将这个临时目录下的文件移动到用户指定目录(在创建文件时保证文件名不重复); 最后删除临时目录。如果在中间过程发生网络中断等情况造成无法与hdfs建立连接,需要用户手动删除已经写入的文件和临时目录。 * (7)、目前插件中Hive版本为1.1.1,Hadoop版本为2.7.1(Apache[为适配JDK1.7],在Hadoop 2.5.0, Hadoop 2.6.0 和Hive 1.2.0测试环境中写入正常;其它版本需后期进一步测试; * (8)、目前HdfsWriter支持Kerberos认证(注意:如果用户需要进行kerberos认证,那么用户使用的Hadoop集群版本需要和hdfsreader的Hadoop版本保持一致,如果高于hdfsreader的Hadoop版本,不保证kerberos认证有效) ## 3 功能说明 ### 3.1 配置样例 ```json { "setting": {}, "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": ["/Users/shf/workplace/txtWorkplace/job/dataorcfull.txt"], "encoding": "UTF-8", "column": [ { "index": 0, "type": "long" }, { "index": 1, "type": "long" }, { "index": 2, "type": "long" }, { "index": 3, "type": "long" }, { "index": 4, "type": "DOUBLE" }, { "index": 5, "type": "DOUBLE" }, { "index": 6, "type": "STRING" }, { "index": 7, "type": "STRING" }, { "index": 8, "type": "STRING" }, { "index": 9, "type": "BOOLEAN" }, { "index": 10, "type": "date" }, { "index": 11, "type": "date" } ], "fieldDelimiter": "\t" } }, "writer": { "name": "hdfswriter", "parameter": { "defaultFS": "hdfs://xxx:port", "fileType": "orc", "path": "/user/hive/warehouse/writerorc.db/orcfull", "fileName": "xxxx", "column": [ { "name": "col1", "type": "TINYINT" }, { "name": "col2", "type": "SMALLINT" }, { "name": "col3", "type": "INT" }, { "name": "col4", "type": "BIGINT" }, { "name": "col5", "type": "FLOAT" }, { "name": "col6", "type": "DOUBLE" }, { "name": "col7", "type": "STRING" }, { "name": "col8", "type": "VARCHAR" }, { "name": "col9", "type": "CHAR" }, { "name": "col10", "type": "BOOLEAN" }, { "name": "col11", "type": "date" }, { "name": "col12", "type": "TIMESTAMP" } ], "writeMode": "append", "fieldDelimiter": "\t", "compress":"NONE" } } } ] } } ``` ### 3.2 参数说明 * **defaultFS** * 描述:Hadoop hdfs文件系统namenode节点地址。格式:hdfs://ip:端口;例如:hdfs://127.0.0.1:9000
* 必选:是
* 默认值:无
* **fileType** * 描述:文件的类型,目前只支持用户配置为"text"或"orc"。
text表示textfile文件格式 orc表示orcfile文件格式 * 必选:是
* 默认值:无
* **path** * 描述:存储到Hadoop hdfs文件系统的路径信息,HdfsWriter会根据并发配置在Path目录下写入多个文件。为与hive表关联,请填写hive表在hdfs上的存储路径。例:Hive上设置的数据仓库的存储路径为:/user/hive/warehouse/ ,已建立数据库:test,表:hello;则对应的存储路径为:/user/hive/warehouse/test.db/hello
* 必选:是
* 默认值:无
* **fileName** * 描述:HdfsWriter写入时的文件名,实际执行时会在该文件名后添加随机的后缀作为每个线程写入实际文件名。
* 必选:是
* 默认值:无
* **column** * 描述:写入数据的字段,不支持对部分列写入。为与hive中表关联,需要指定表中所有字段名和字段类型,其中:name指定字段名,type指定字段类型。
用户可以指定Column字段信息,配置如下: ```json "column": [ { "name": "userName", "type": "string" }, { "name": "age", "type": "long" } ] ``` * 必选:是
* 默认值:无
* **writeMode** * 描述:hdfswriter写入前数据清理处理模式:
* append,写入前不做任何处理,DataX hdfswriter直接使用filename写入,并保证文件名不冲突。 * nonConflict,如果目录下有fileName前缀的文件,直接报错。 * truncate,如果目录下有fileName前缀的文件,先删除后写入。 * 必选:是
* 默认值:无
* **fieldDelimiter** * 描述:hdfswriter写入时的字段分隔符,**需要用户保证与创建的Hive表的字段分隔符一致,否则无法在Hive表中查到数据**
* 必选:是
* 默认值:无
* **compress** * 描述:hdfs文件压缩类型,默认不填写意味着没有压缩。其中:text类型文件支持压缩类型有gzip、bzip2;orc类型文件支持的压缩类型有NONE、SNAPPY(需要用户安装SnappyCodec)。
* 必选:否
* 默认值:无压缩
* **hadoopConfig** * 描述:hadoopConfig里可以配置与Hadoop相关的一些高级参数,比如HA的配置。
```json "hadoopConfig":{ "dfs.nameservices": "testDfs", "dfs.ha.namenodes.testDfs": "namenode1,namenode2",        "dfs.namenode.rpc-address.aliDfs.namenode1": "", "dfs.namenode.rpc-address.aliDfs.namenode2": "", "dfs.client.failover.proxy.provider.testDfs": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" } ``` * 必选:否
* 默认值:无
* **encoding** * 描述:写文件的编码配置。
* 必选:否
* 默认值:utf-8,**慎重修改**
* **haveKerberos** * 描述:是否有Kerberos认证,默认false
例如如果用户配置true,则配置项kerberosKeytabFilePath,kerberosPrincipal为必填。 * 必选:haveKerberos 为true必选
* 默认值:false
* **kerberosKeytabFilePath** * 描述:Kerberos认证 keytab文件路径,绝对路径
* 必选:否
* 默认值:无
* **kerberosPrincipal** * 描述:Kerberos认证Principal名,如xxxx/hadoopclient@xxx.xxx
* 必选:haveKerberos 为true必选
* 默认值:无
### 3.3 类型转换 目前 HdfsWriter 支持大部分 Hive 类型,请注意检查你的类型。 下面列出 HdfsWriter 针对 Hive 数据类型转换列表: | DataX 内部类型| HIVE 数据类型 | | -------- | ----- | | Long |TINYINT,SMALLINT,INT,BIGINT | | Double |FLOAT,DOUBLE | | String |STRING,VARCHAR,CHAR | | Boolean |BOOLEAN | | Date |DATE,TIMESTAMP | ## 4 配置步骤 * 步骤一、在Hive中创建数据库、表 Hive数据库在HDFS上存储配置,在hive安装目录下 conf/hive-site.xml文件中配置,默认值为:/user/hive/warehouse 如下所示: ```xml hive.metastore.warehouse.dir /user/hive/warehouse location of default database for the warehouse ``` Hive建库/建表语法 参考 [Hive操作手册]( https://cwiki.apache.org/confluence/display/Hive/LanguageManual) 例: (1)建立存储为textfile文件类型的表 ```json create database IF NOT EXISTS hdfswriter; use hdfswriter; create table text_table( col1 TINYINT, col2 SMALLINT, col3 INT, col4 BIGINT, col5 FLOAT, col6 DOUBLE, col7 STRING, col8 VARCHAR(10), col9 CHAR(10), col10 BOOLEAN, col11 date, col12 TIMESTAMP ) row format delimited fields terminated by "\t" STORED AS TEXTFILE; ``` text_table在hdfs上存储路径为:/user/hive/warehouse/hdfswriter.db/text_table/ (2)建立存储为orcfile文件类型的表 ```json create database IF NOT EXISTS hdfswriter; use hdfswriter; create table orc_table( col1 TINYINT, col2 SMALLINT, col3 INT, col4 BIGINT, col5 FLOAT, col6 DOUBLE, col7 STRING, col8 VARCHAR(10), col9 CHAR(10), col10 BOOLEAN, col11 date, col12 TIMESTAMP ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS ORC; ``` orc_table在hdfs上存储路径为:/user/hive/warehouse/hdfswriter.db/orc_table/ * 步骤二、根据步骤一的配置信息配置HdfsWriter作业 ## 5 约束限制 略 ## 6 FAQ 略 ================================================ FILE: hdfswriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hdfswriter hdfswriter HdfsWriter提供了写入HDFS功能。 jar 3.1.3 2.7.1 com.twitter parquet-hadoop-bundle 1.6.0 org.apache.logging.log4j log4j-api 2.17.1 org.apache.logging.log4j log4j-core 2.17.1 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.aliyun.oss hadoop-aliyun 2.7.2 org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.hadoop hadoop-hdfs ${hadoop.version} org.apache.hadoop hadoop-common ${hadoop.version} org.apache.hadoop hadoop-yarn-common ${hadoop.version} org.apache.hadoop hadoop-mapreduce-client-core ${hadoop.version} org.apache.hive hive-exec ${hive.version} org.apache.hive hive-serde ${hive.version} org.apache.hive hive-service ${hive.version} org.apache.hive hive-common ${hive.version} org.apache.hive.hcatalog hive-hcatalog-core ${hive.version} com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} junit junit test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hdfswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/hdfswriter target/ hdfswriter-0.0.1-SNAPSHOT.jar plugin/writer/hdfswriter false plugin/writer/hdfswriter/libs runtime ================================================ FILE: hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.hdfswriter; public class Constant { public static final String DEFAULT_ENCODING = "UTF-8"; public static final String DEFAULT_NULL_FORMAT = "\\N"; } ================================================ FILE: hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsHelper.java ================================================ package com.alibaba.datax.plugin.writer.hdfswriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.util.ColumnTypeUtil; import com.alibaba.datax.plugin.unstructuredstorage.util.HdfsUtil; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONObject; import com.google.common.collect.Lists; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.commons.lang3.tuple.MutablePair; import org.apache.hadoop.fs.*; import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat; import org.apache.hadoop.hive.ql.io.orc.OrcSerde; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.compress.CompressionCodec; import org.apache.hadoop.mapred.*; import org.apache.hadoop.security.UserGroupInformation; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import parquet.hadoop.metadata.CompressionCodecName; import parquet.schema.*; import java.io.IOException; import java.sql.Timestamp; import java.text.SimpleDateFormat; import java.util.*; public class HdfsHelper { public static final Logger LOG = LoggerFactory.getLogger(HdfsWriter.Job.class); public FileSystem fileSystem = null; public JobConf conf = null; public org.apache.hadoop.conf.Configuration hadoopConf = null; public static final String HADOOP_SECURITY_AUTHENTICATION_KEY = "hadoop.security.authentication"; public static final String HDFS_DEFAULTFS_KEY = "fs.defaultFS"; // Kerberos private Boolean haveKerberos = false; private String kerberosKeytabFilePath; private String kerberosPrincipal; public void getFileSystem(String defaultFS, Configuration taskConfig){ hadoopConf = new org.apache.hadoop.conf.Configuration(); Configuration hadoopSiteParams = taskConfig.getConfiguration(Key.HADOOP_CONFIG); JSONObject hadoopSiteParamsAsJsonObject = JSON.parseObject(taskConfig.getString(Key.HADOOP_CONFIG)); if (null != hadoopSiteParams) { Set paramKeys = hadoopSiteParams.getKeys(); for (String each : paramKeys) { hadoopConf.set(each, hadoopSiteParamsAsJsonObject.getString(each)); } } hadoopConf.set(HDFS_DEFAULTFS_KEY, defaultFS); //是否有Kerberos认证 this.haveKerberos = taskConfig.getBool(Key.HAVE_KERBEROS, false); if(haveKerberos){ this.kerberosKeytabFilePath = taskConfig.getString(Key.KERBEROS_KEYTAB_FILE_PATH); this.kerberosPrincipal = taskConfig.getString(Key.KERBEROS_PRINCIPAL); hadoopConf.set(HADOOP_SECURITY_AUTHENTICATION_KEY, "kerberos"); } this.kerberosAuthentication(this.kerberosPrincipal, this.kerberosKeytabFilePath); conf = new JobConf(hadoopConf); try { fileSystem = FileSystem.get(conf); } catch (IOException e) { String message = String.format("获取FileSystem时发生网络IO异常,请检查您的网络是否正常!HDFS地址:[%s]", "message:defaultFS =" + defaultFS); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.CONNECT_HDFS_IO_ERROR, e); }catch (Exception e) { String message = String.format("获取FileSystem失败,请检查HDFS地址是否正确: [%s]", "message:defaultFS =" + defaultFS); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.CONNECT_HDFS_IO_ERROR, e); } if(null == fileSystem || null == conf){ String message = String.format("获取FileSystem失败,请检查HDFS地址是否正确: [%s]", "message:defaultFS =" + defaultFS); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.CONNECT_HDFS_IO_ERROR, message); } } private void kerberosAuthentication(String kerberosPrincipal, String kerberosKeytabFilePath){ if(haveKerberos && StringUtils.isNotBlank(this.kerberosPrincipal) && StringUtils.isNotBlank(this.kerberosKeytabFilePath)){ UserGroupInformation.setConfiguration(this.hadoopConf); try { UserGroupInformation.loginUserFromKeytab(kerberosPrincipal, kerberosKeytabFilePath); } catch (Exception e) { String message = String.format("kerberos认证失败,请确定kerberosKeytabFilePath[%s]和kerberosPrincipal[%s]填写正确", kerberosKeytabFilePath, kerberosPrincipal); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.KERBEROS_LOGIN_ERROR, e); } } } /** *获取指定目录先的文件列表 * @param dir * @return * 拿到的是文件全路径, * eg:hdfs://10.101.204.12:9000/user/hive/warehouse/writer.db/text/test.textfile */ public String[] hdfsDirList(String dir){ Path path = new Path(dir); String[] files = null; try { FileStatus[] status = fileSystem.listStatus(path); files = new String[status.length]; for(int i=0;i tmpFiles, HashSet endFiles){ Path tmpFilesParent = null; if(tmpFiles.size() != endFiles.size()){ String message = String.format("临时目录下文件名个数与目标文件名个数不一致!"); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.HDFS_RENAME_FILE_ERROR, message); }else{ try{ for (Iterator it1=tmpFiles.iterator(),it2=endFiles.iterator();it1.hasNext()&&it2.hasNext();){ String srcFile = it1.next().toString(); String dstFile = it2.next().toString(); Path srcFilePah = new Path(srcFile); Path dstFilePah = new Path(dstFile); if(tmpFilesParent == null){ tmpFilesParent = srcFilePah.getParent(); } LOG.info(String.format("start rename file [%s] to file [%s].", srcFile,dstFile)); boolean renameTag = false; long fileLen = fileSystem.getFileStatus(srcFilePah).getLen(); if(fileLen>0){ renameTag = fileSystem.rename(srcFilePah,dstFilePah); if(!renameTag){ String message = String.format("重命名文件[%s]失败,请检查您的网络是否正常!", srcFile); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.HDFS_RENAME_FILE_ERROR, message); } LOG.info(String.format("finish rename file [%s] to file [%s].", srcFile,dstFile)); }else{ LOG.info(String.format("文件[%s]内容为空,请检查写入是否正常!", srcFile)); } } }catch (Exception e) { String message = String.format("重命名文件时发生异常,请检查您的网络是否正常!"); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.CONNECT_HDFS_IO_ERROR, e); }finally { deleteDir(tmpFilesParent); } } } //关闭FileSystem public void closeFileSystem(){ try { fileSystem.close(); } catch (IOException e) { String message = String.format("关闭FileSystem时发生IO异常,请检查您的网络是否正常!"); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.CONNECT_HDFS_IO_ERROR, e); } } //textfile格式文件 public FSDataOutputStream getOutputStream(String path){ Path storePath = new Path(path); FSDataOutputStream fSDataOutputStream = null; try { fSDataOutputStream = fileSystem.create(storePath); } catch (IOException e) { String message = String.format("Create an FSDataOutputStream at the indicated Path[%s] failed: [%s]", "message:path =" + path); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.Write_FILE_IO_ERROR, e); } return fSDataOutputStream; } /** * 写textfile类型文件 * @param lineReceiver * @param config * @param fileName * @param taskPluginCollector */ public void textFileStartWrite(RecordReceiver lineReceiver, Configuration config, String fileName, TaskPluginCollector taskPluginCollector){ char fieldDelimiter = config.getChar(Key.FIELD_DELIMITER); List columns = config.getListConfiguration(Key.COLUMN); String compress = config.getString(Key.COMPRESS,null); SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddHHmm"); String attempt = "attempt_"+dateFormat.format(new Date())+"_0001_m_000000_0"; Path outputPath = new Path(fileName); //todo 需要进一步确定TASK_ATTEMPT_ID conf.set(JobContext.TASK_ATTEMPT_ID, attempt); FileOutputFormat outFormat = new TextOutputFormat(); outFormat.setOutputPath(conf, outputPath); outFormat.setWorkOutputPath(conf, outputPath); if(null != compress) { Class codecClass = getCompressCodec(compress); if (null != codecClass) { outFormat.setOutputCompressorClass(conf, codecClass); } } try { RecordWriter writer = outFormat.getRecordWriter(fileSystem, conf, outputPath.toString(), Reporter.NULL); Record record = null; while ((record = lineReceiver.getFromReader()) != null) { MutablePair transportResult = transportOneRecord(record, fieldDelimiter, columns, taskPluginCollector); if (!transportResult.getRight()) { writer.write(NullWritable.get(),transportResult.getLeft()); } } writer.close(Reporter.NULL); } catch (Exception e) { String message = String.format("写文件文件[%s]时发生IO异常,请检查您的网络是否正常!", fileName); LOG.error(message); Path path = new Path(fileName); deleteDir(path.getParent()); throw DataXException.asDataXException(HdfsWriterErrorCode.Write_FILE_IO_ERROR, e); } } public static MutablePair transportOneRecord( Record record, char fieldDelimiter, List columnsConfiguration, TaskPluginCollector taskPluginCollector) { MutablePair, Boolean> transportResultList = transportOneRecord(record,columnsConfiguration,taskPluginCollector); //保存<转换后的数据,是否是脏数据> MutablePair transportResult = new MutablePair(); transportResult.setRight(false); if(null != transportResultList){ Text recordResult = new Text(StringUtils.join(transportResultList.getLeft(), fieldDelimiter)); transportResult.setRight(transportResultList.getRight()); transportResult.setLeft(recordResult); } return transportResult; } public Class getCompressCodec(String compress){ Class codecClass = null; if(null == compress){ codecClass = null; }else if("GZIP".equalsIgnoreCase(compress)){ codecClass = org.apache.hadoop.io.compress.GzipCodec.class; }else if ("BZIP2".equalsIgnoreCase(compress)) { codecClass = org.apache.hadoop.io.compress.BZip2Codec.class; }else if("SNAPPY".equalsIgnoreCase(compress)){ //todo 等需求明确后支持 需要用户安装SnappyCodec codecClass = org.apache.hadoop.io.compress.SnappyCodec.class; // org.apache.hadoop.hive.ql.io.orc.ZlibCodec.class not public //codecClass = org.apache.hadoop.hive.ql.io.orc.ZlibCodec.class; }else { throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("目前不支持您配置的 compress 模式 : [%s]", compress)); } return codecClass; } /** * 写orcfile类型文件 * @param lineReceiver * @param config * @param fileName * @param taskPluginCollector */ public void orcFileStartWrite(RecordReceiver lineReceiver, Configuration config, String fileName, TaskPluginCollector taskPluginCollector){ List columns = config.getListConfiguration(Key.COLUMN); String compress = config.getString(Key.COMPRESS, null); List columnNames = getColumnNames(columns); List columnTypeInspectors = getColumnTypeInspectors(columns); StructObjectInspector inspector = (StructObjectInspector)ObjectInspectorFactory .getStandardStructObjectInspector(columnNames, columnTypeInspectors); OrcSerde orcSerde = new OrcSerde(); FileOutputFormat outFormat = new OrcOutputFormat(); if(!"NONE".equalsIgnoreCase(compress) && null != compress ) { Class codecClass = getCompressCodec(compress); if (null != codecClass) { outFormat.setOutputCompressorClass(conf, codecClass); } } try { RecordWriter writer = outFormat.getRecordWriter(fileSystem, conf, fileName, Reporter.NULL); Record record = null; while ((record = lineReceiver.getFromReader()) != null) { MutablePair, Boolean> transportResult = transportOneRecord(record,columns,taskPluginCollector); if (!transportResult.getRight()) { writer.write(NullWritable.get(), orcSerde.serialize(transportResult.getLeft(), inspector)); } } writer.close(Reporter.NULL); } catch (Exception e) { String message = String.format("写文件文件[%s]时发生IO异常,请检查您的网络是否正常!", fileName); LOG.error(message); Path path = new Path(fileName); deleteDir(path.getParent()); throw DataXException.asDataXException(HdfsWriterErrorCode.Write_FILE_IO_ERROR, e); } } public List getColumnNames(List columns){ List columnNames = Lists.newArrayList(); for (Configuration eachColumnConf : columns) { columnNames.add(eachColumnConf.getString(Key.NAME)); } return columnNames; } /** * 根据writer配置的字段类型,构建inspector * @param columns * @return */ public List getColumnTypeInspectors(List columns){ List columnTypeInspectors = Lists.newArrayList(); for (Configuration eachColumnConf : columns) { SupportHiveDataType columnType = SupportHiveDataType.valueOf(eachColumnConf.getString(Key.TYPE).toUpperCase()); ObjectInspector objectInspector = null; switch (columnType) { case TINYINT: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Byte.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case SMALLINT: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Short.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case INT: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Integer.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case BIGINT: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Long.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case FLOAT: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Float.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case DOUBLE: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Double.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case TIMESTAMP: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(org.apache.hadoop.hive.common.type.Timestamp.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case DATE: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(java.sql.Date.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case STRING: case VARCHAR: case CHAR: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(String.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; case BOOLEAN: objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Boolean.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); break; default: throw DataXException .asDataXException( HdfsWriterErrorCode.ILLEGAL_VALUE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%d]. 请修改表中该字段的类型或者不同步该字段.", eachColumnConf.getString(Key.NAME), eachColumnConf.getString(Key.TYPE))); } columnTypeInspectors.add(objectInspector); } return columnTypeInspectors; } public OrcSerde getOrcSerde(Configuration config){ String fieldDelimiter = config.getString(Key.FIELD_DELIMITER); String compress = config.getString(Key.COMPRESS); String encoding = config.getString(Key.ENCODING); OrcSerde orcSerde = new OrcSerde(); Properties properties = new Properties(); properties.setProperty("orc.bloom.filter.columns", fieldDelimiter); properties.setProperty("orc.compress", compress); properties.setProperty("orc.encoding.strategy", encoding); orcSerde.initialize(conf, properties); return orcSerde; } public static MutablePair, Boolean> transportOneRecord( Record record,List columnsConfiguration, TaskPluginCollector taskPluginCollector){ MutablePair, Boolean> transportResult = new MutablePair, Boolean>(); transportResult.setRight(false); List recordList = Lists.newArrayList(); int recordLength = record.getColumnNumber(); if (0 != recordLength) { Column column; for (int i = 0; i < recordLength; i++) { column = record.getColumn(i); //todo as method if (null != column.getRawData()) { String rowData = column.getRawData().toString(); SupportHiveDataType columnType = SupportHiveDataType.valueOf( columnsConfiguration.get(i).getString(Key.TYPE).toUpperCase()); //根据writer端类型配置做类型转换 try { switch (columnType) { case TINYINT: recordList.add(Byte.valueOf(rowData)); break; case SMALLINT: recordList.add(Short.valueOf(rowData)); break; case INT: recordList.add(Integer.valueOf(rowData)); break; case BIGINT: recordList.add(column.asLong()); break; case FLOAT: recordList.add(Float.valueOf(rowData)); break; case DOUBLE: recordList.add(column.asDouble()); break; case STRING: case VARCHAR: case CHAR: recordList.add(column.asString()); break; case BOOLEAN: recordList.add(column.asBoolean()); break; case DATE: recordList.add(new java.sql.Date(column.asDate().getTime())); break; case TIMESTAMP: Date date = column.asDate(); if (date == null) { recordList.add(null); } else { Timestamp ts = new Timestamp(date.getTime()); recordList.add(org.apache.hadoop.hive.common.type.Timestamp.ofEpochMilli(ts.getTime(), ts.getNanos())); } break; default: throw DataXException .asDataXException( HdfsWriterErrorCode.ILLEGAL_VALUE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%d]. 请修改表中该字段的类型或者不同步该字段.", columnsConfiguration.get(i).getString(Key.NAME), columnsConfiguration.get(i).getString(Key.TYPE))); } } catch (Exception e) { // warn: 此处认为脏数据 String message = String.format( "字段类型转换错误:你目标字段为[%s]类型,实际字段值为[%s].", columnsConfiguration.get(i).getString(Key.TYPE), column.getRawData().toString()); taskPluginCollector.collectDirtyRecord(record, message); transportResult.setRight(true); break; } }else { // warn: it's all ok if nullFormat is null recordList.add(null); } } } transportResult.setLeft(recordList); return transportResult; } public static String generateParquetSchemaFromColumnAndType(List columns) { Map decimalColInfo = new HashMap<>(16); ColumnTypeUtil.DecimalInfo PARQUET_DEFAULT_DECIMAL_INFO = new ColumnTypeUtil.DecimalInfo(10, 2); Types.MessageTypeBuilder typeBuilder = Types.buildMessage(); for (Configuration column : columns) { String name = column.getString("name"); String colType = column.getString("type"); Validate.notNull(name, "column.name can't be null"); Validate.notNull(colType, "column.type can't be null"); switch (colType.toLowerCase()) { case "tinyint": case "smallint": case "int": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.INT32).named(name); break; case "bigint": case "long": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.INT64).named(name); break; case "float": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.FLOAT).named(name); break; case "double": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.DOUBLE).named(name); break; case "binary": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.BINARY).named(name); break; case "char": case "varchar": case "string": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.BINARY).as(OriginalType.UTF8).named(name); break; case "boolean": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.BOOLEAN).named(name); break; case "timestamp": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.INT96).named(name); break; case "date": typeBuilder.optional(PrimitiveType.PrimitiveTypeName.INT32).as(OriginalType.DATE).named(name); break; default: if (ColumnTypeUtil.isDecimalType(colType)) { ColumnTypeUtil.DecimalInfo decimalInfo = ColumnTypeUtil.getDecimalInfo(colType, PARQUET_DEFAULT_DECIMAL_INFO); typeBuilder.optional(PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY) .as(OriginalType.DECIMAL) .precision(decimalInfo.getPrecision()) .scale(decimalInfo.getScale()) .length(HdfsUtil.computeMinBytesForPrecision(decimalInfo.getPrecision())) .named(name); decimalColInfo.put(name, decimalInfo); } else { typeBuilder.optional(PrimitiveType.PrimitiveTypeName.BINARY).named(name); } break; } } return typeBuilder.named("m").toString(); } public void parquetFileStartWrite(RecordReceiver lineReceiver, Configuration config, String fileName, TaskPluginCollector taskPluginCollector, Configuration taskConfig) { MessageType messageType = null; ParquetFileProccessor proccessor = null; Path outputPath = new Path(fileName); String schema = config.getString(Key.PARQUET_SCHEMA, null); if (schema == null) { List columns = config.getListConfiguration(Key.COLUMN); if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException("parquetSchema or column can't be blank!"); } schema = HdfsHelper.generateParquetSchemaFromColumnAndType(columns); } try { messageType = MessageTypeParser.parseMessageType(schema); } catch (Exception e) { String message = String.format("Error parsing the Schema string [%s] into MessageType", schema); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.PARSE_MESSAGE_TYPE_FROM_SCHEMA_ERROR, e); } // determine the compression codec String compress = config.getString(Key.COMPRESS, null); // be compatible with the old NONE if ("NONE".equalsIgnoreCase(compress)) { compress = "UNCOMPRESSED"; } CompressionCodecName compressionCodecName = CompressionCodecName.fromConf(compress); LOG.info("The compression codec used for parquet writing is: {}", compressionCodecName, compress); try { proccessor = new ParquetFileProccessor(outputPath, messageType, compressionCodecName, false, taskConfig, taskPluginCollector, hadoopConf); } catch (Exception e) { String message = String.format("Initializing ParquetFileProccessor based on Schema[%s] failed.", schema); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.INIT_PROCCESSOR_FAILURE, e); } SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddHHmm"); String attempt = "attempt_" + dateFormat.format(new Date()) + "_0001_m_000000_0"; conf.set(JobContext.TASK_ATTEMPT_ID, attempt); FileOutputFormat outFormat = new TextOutputFormat(); outFormat.setOutputPath(conf, outputPath); outFormat.setWorkOutputPath(conf, outputPath); try { Record record = null; while ((record = lineReceiver.getFromReader()) != null) { proccessor.write(record); } } catch (Exception e) { String message = String.format("An exception occurred while writing the file file [%s]", fileName); LOG.error(message); Path path = new Path(fileName); deleteDir(path.getParent()); throw DataXException.asDataXException(HdfsWriterErrorCode.Write_FILE_IO_ERROR, e); } finally { if (proccessor != null) { try { proccessor.close(); } catch (IOException e) { LOG.error(e.getMessage(), e); } } } } } ================================================ FILE: hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriter.java ================================================ package com.alibaba.datax.plugin.writer.hdfswriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.writer.Constant; import com.google.common.collect.Sets; import org.apache.commons.io.Charsets; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.fs.Path; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import parquet.schema.MessageTypeParser; import java.util.*; public class HdfsWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration writerSliceConfig = null; private String defaultFS; private String path; private String fileType; private String fileName; private List columns; private String writeMode; private String fieldDelimiter; private String compress; private String encoding; private HashSet tmpFiles = new HashSet();//临时文件全路径 private HashSet endFiles = new HashSet();//最终文件全路径 private HdfsHelper hdfsHelper = null; @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.validateParameter(); //创建textfile存储 hdfsHelper = new HdfsHelper(); hdfsHelper.getFileSystem(defaultFS, this.writerSliceConfig); } private void validateParameter() { this.defaultFS = this.writerSliceConfig.getNecessaryValue(Key.DEFAULT_FS, HdfsWriterErrorCode.REQUIRED_VALUE); //fileType check this.fileType = this.writerSliceConfig.getNecessaryValue(Key.FILE_TYPE, HdfsWriterErrorCode.REQUIRED_VALUE); if (!fileType.equalsIgnoreCase("ORC") && !fileType.equalsIgnoreCase("TEXT") && !fileType.equalsIgnoreCase("PARQUET")) { String message = "HdfsWriter插件目前只支持ORC、TEXT、PARQUET三种格式的文件,请将filetype选项的值配置为ORC、TEXT或PARQUET"; throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, message); } //path this.path = this.writerSliceConfig.getNecessaryValue(Key.PATH, HdfsWriterErrorCode.REQUIRED_VALUE); if(!path.startsWith("/")){ String message = String.format("请检查参数path:[%s],需要配置为绝对路径", path); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, message); }else if(path.contains("*") || path.contains("?")){ String message = String.format("请检查参数path:[%s],不能包含*,?等特殊字符", path); LOG.error(message); throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, message); } //fileName this.fileName = this.writerSliceConfig.getNecessaryValue(Key.FILE_NAME, HdfsWriterErrorCode.REQUIRED_VALUE); //columns check this.columns = this.writerSliceConfig.getListConfiguration(Key.COLUMN); if (null == columns || columns.size() == 0) { throw DataXException.asDataXException(HdfsWriterErrorCode.REQUIRED_VALUE, "您需要指定 columns"); }else{ for (Configuration eachColumnConf : columns) { eachColumnConf.getNecessaryValue(Key.NAME, HdfsWriterErrorCode.COLUMN_REQUIRED_VALUE); eachColumnConf.getNecessaryValue(Key.TYPE, HdfsWriterErrorCode.COLUMN_REQUIRED_VALUE); } } //writeMode check this.writeMode = this.writerSliceConfig.getNecessaryValue(Key.WRITE_MODE, HdfsWriterErrorCode.REQUIRED_VALUE); writeMode = writeMode.toLowerCase().trim(); Set supportedWriteModes = Sets.newHashSet("append", "nonconflict", "truncate"); if (!supportedWriteModes.contains(writeMode)) { throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("仅支持append, nonConflict, truncate三种模式, 不支持您配置的 writeMode 模式 : [%s]", writeMode)); } this.writerSliceConfig.set(Key.WRITE_MODE, writeMode); //fieldDelimiter check this.fieldDelimiter = this.writerSliceConfig.getString(Key.FIELD_DELIMITER,null); if(null == fieldDelimiter){ throw DataXException.asDataXException(HdfsWriterErrorCode.REQUIRED_VALUE, String.format("您提供配置文件有误,[%s]是必填参数.", Key.FIELD_DELIMITER)); }else if(1 != fieldDelimiter.length()){ // warn: if have, length must be one throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("仅仅支持单字符切分, 您配置的切分为 : [%s]", fieldDelimiter)); } //compress check this.compress = this.writerSliceConfig.getString(Key.COMPRESS,null); if(fileType.equalsIgnoreCase("TEXT")){ Set textSupportedCompress = Sets.newHashSet("GZIP", "BZIP2"); //用户可能配置的是compress:"",空字符串,需要将compress设置为null if(StringUtils.isBlank(compress) ){ this.writerSliceConfig.set(Key.COMPRESS, null); }else { compress = compress.toUpperCase().trim(); if(!textSupportedCompress.contains(compress) ){ throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("目前TEXT FILE仅支持GZIP、BZIP2 两种压缩, 不支持您配置的 compress 模式 : [%s]", compress)); } } }else if(fileType.equalsIgnoreCase("ORC")){ Set orcSupportedCompress = Sets.newHashSet("NONE", "SNAPPY"); if(null == compress){ this.writerSliceConfig.set(Key.COMPRESS, "NONE"); }else { compress = compress.toUpperCase().trim(); if(!orcSupportedCompress.contains(compress)){ throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("目前ORC FILE仅支持SNAPPY压缩, 不支持您配置的 compress 模式 : [%s]", compress)); } } } //Kerberos check Boolean haveKerberos = this.writerSliceConfig.getBool(Key.HAVE_KERBEROS, false); if(haveKerberos) { this.writerSliceConfig.getNecessaryValue(Key.KERBEROS_KEYTAB_FILE_PATH, HdfsWriterErrorCode.REQUIRED_VALUE); this.writerSliceConfig.getNecessaryValue(Key.KERBEROS_PRINCIPAL, HdfsWriterErrorCode.REQUIRED_VALUE); } // encoding check this.encoding = this.writerSliceConfig.getString(Key.ENCODING,Constant.DEFAULT_ENCODING); try { encoding = encoding.trim(); this.writerSliceConfig.set(Key.ENCODING, encoding); Charsets.toCharset(encoding); } catch (Exception e) { throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("不支持您配置的编码格式:[%s]", encoding), e); } } @Override public void prepare() { //若路径已经存在,检查path是否是目录 if(hdfsHelper.isPathexists(path)){ if(!hdfsHelper.isPathDir(path)){ throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("您配置的path: [%s] 不是一个合法的目录, 请您注意文件重名, 不合法目录名等情况.", path)); } //根据writeMode对目录下文件进行处理 Path[] existFilePaths = hdfsHelper.hdfsDirList(path,fileName); boolean isExistFile = false; if(existFilePaths.length > 0){ isExistFile = true; } /** if ("truncate".equals(writeMode) && isExistFile ) { LOG.info(String.format("由于您配置了writeMode truncate, 开始清理 [%s] 下面以 [%s] 开头的内容", path, fileName)); hdfsHelper.deleteFiles(existFilePaths); } else */ if ("append".equalsIgnoreCase(writeMode)) { LOG.info(String.format("由于您配置了writeMode append, 写入前不做清理工作, [%s] 目录下写入相应文件名前缀 [%s] 的文件", path, fileName)); } else if ("nonconflict".equalsIgnoreCase(writeMode) && isExistFile) { LOG.info(String.format("由于您配置了writeMode nonConflict, 开始检查 [%s] 下面的内容", path)); List allFiles = new ArrayList(); for (Path eachFile : existFilePaths) { allFiles.add(eachFile.toString()); } LOG.error(String.format("冲突文件列表为: [%s]", StringUtils.join(allFiles, ","))); throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("由于您配置了writeMode nonConflict,但您配置的path: [%s] 目录不为空, 下面存在其他文件或文件夹.", path)); }else if ("truncate".equalsIgnoreCase(writeMode) && isExistFile) { LOG.info(String.format("由于您配置了writeMode truncate, [%s] 下面的内容将被覆盖重写", path)); hdfsHelper.deleteFiles(existFilePaths); } }else{ throw DataXException.asDataXException(HdfsWriterErrorCode.ILLEGAL_VALUE, String.format("您配置的path: [%s] 不存在, 请先在hive端创建对应的数据库和表.", path)); } } @Override public void post() { hdfsHelper.renameFile(tmpFiles, endFiles); } @Override public void destroy() { hdfsHelper.closeFileSystem(); } @Override public List split(int mandatoryNumber) { LOG.info("begin do split..."); List writerSplitConfigs = new ArrayList(); String filePrefix = fileName; Set allFiles = new HashSet(); //获取该路径下的所有已有文件列表 if(hdfsHelper.isPathexists(path)){ allFiles.addAll(Arrays.asList(hdfsHelper.hdfsDirList(path))); } String fileSuffix; //临时存放路径 String storePath = buildTmpFilePath(this.path); //最终存放路径 String endStorePath = buildFilePath(); this.path = endStorePath; for (int i = 0; i < mandatoryNumber; i++) { // handle same file name Configuration splitedTaskConfig = this.writerSliceConfig.clone(); String fullFileName = null; String endFullFileName = null; fileSuffix = UUID.randomUUID().toString().replace('-', '_'); if (fileType.equalsIgnoreCase("PARQUET")) { if (StringUtils.isNotBlank(this.compress)) { fileSuffix += "." + this.compress.toLowerCase(); } fileSuffix += ".parquet"; } fullFileName = String.format("%s%s%s__%s", defaultFS, storePath, filePrefix, fileSuffix); endFullFileName = String.format("%s%s%s__%s", defaultFS, endStorePath, filePrefix, fileSuffix); while (allFiles.contains(endFullFileName)) { fileSuffix = UUID.randomUUID().toString().replace('-', '_'); fullFileName = String.format("%s%s%s__%s", defaultFS, storePath, filePrefix, fileSuffix); endFullFileName = String.format("%s%s%s__%s", defaultFS, endStorePath, filePrefix, fileSuffix); } allFiles.add(endFullFileName); //设置临时文件全路径和最终文件全路径 if("GZIP".equalsIgnoreCase(this.compress)){ this.tmpFiles.add(fullFileName + ".gz"); this.endFiles.add(endFullFileName + ".gz"); }else if("BZIP2".equalsIgnoreCase(compress)){ this.tmpFiles.add(fullFileName + ".bz2"); this.endFiles.add(endFullFileName + ".bz2"); }else{ this.tmpFiles.add(fullFileName); this.endFiles.add(endFullFileName); } splitedTaskConfig .set(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME, fullFileName); LOG.info(String.format("splited write file name:[%s]", fullFileName)); writerSplitConfigs.add(splitedTaskConfig); } LOG.info("end do split."); return writerSplitConfigs; } private String buildFilePath() { boolean isEndWithSeparator = false; switch (IOUtils.DIR_SEPARATOR) { case IOUtils.DIR_SEPARATOR_UNIX: isEndWithSeparator = this.path.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR)); break; case IOUtils.DIR_SEPARATOR_WINDOWS: isEndWithSeparator = this.path.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR_WINDOWS)); break; default: break; } if (!isEndWithSeparator) { this.path = this.path + IOUtils.DIR_SEPARATOR; } return this.path; } /** * 创建临时目录 * @param userPath * @return */ private String buildTmpFilePath(String userPath) { String tmpFilePath; boolean isEndWithSeparator = false; switch (IOUtils.DIR_SEPARATOR) { case IOUtils.DIR_SEPARATOR_UNIX: isEndWithSeparator = userPath.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR)); break; case IOUtils.DIR_SEPARATOR_WINDOWS: isEndWithSeparator = userPath.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR_WINDOWS)); break; default: break; } String tmpSuffix; tmpSuffix = UUID.randomUUID().toString().replace('-', '_'); if (!isEndWithSeparator) { tmpFilePath = String.format("%s__%s%s", userPath, tmpSuffix, IOUtils.DIR_SEPARATOR); }else if("/".equals(userPath)){ tmpFilePath = String.format("%s__%s%s", userPath, tmpSuffix, IOUtils.DIR_SEPARATOR); }else{ tmpFilePath = String.format("%s__%s%s", userPath.substring(0,userPath.length()-1), tmpSuffix, IOUtils.DIR_SEPARATOR); } while(hdfsHelper.isPathexists(tmpFilePath)){ tmpSuffix = UUID.randomUUID().toString().replace('-', '_'); if (!isEndWithSeparator) { tmpFilePath = String.format("%s__%s%s", userPath, tmpSuffix, IOUtils.DIR_SEPARATOR); }else if("/".equals(userPath)){ tmpFilePath = String.format("%s__%s%s", userPath, tmpSuffix, IOUtils.DIR_SEPARATOR); }else{ tmpFilePath = String.format("%s__%s%s", userPath.substring(0,userPath.length()-1), tmpSuffix, IOUtils.DIR_SEPARATOR); } } return tmpFilePath; } public void unitizeParquetConfig(Configuration writerSliceConfig) { String parquetSchema = writerSliceConfig.getString(Key.PARQUET_SCHEMA); if (StringUtils.isNotBlank(parquetSchema)) { LOG.info("parquetSchema has config. use parquetSchema:\n{}", parquetSchema); return; } List columns = writerSliceConfig.getListConfiguration(Key.COLUMN); if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException("parquetSchema or column can't be blank!"); } parquetSchema = generateParquetSchemaFromColumn(columns); // 为了兼容历史逻辑,对之前的逻辑做保留,但是如果配置的时候报错,则走新逻辑 try { MessageTypeParser.parseMessageType(parquetSchema); } catch (Throwable e) { LOG.warn("The generated parquetSchema {} is illegal, try to generate parquetSchema in another way", parquetSchema); parquetSchema = HdfsHelper.generateParquetSchemaFromColumnAndType(columns); LOG.info("The last generated parquet schema is {}", parquetSchema); } writerSliceConfig.set(Key.PARQUET_SCHEMA, parquetSchema); LOG.info("dataxParquetMode use default fields."); writerSliceConfig.set(Key.DATAX_PARQUET_MODE, "fields"); } private String generateParquetSchemaFromColumn(List columns) { StringBuffer parquetSchemaStringBuffer = new StringBuffer(); parquetSchemaStringBuffer.append("message m {"); for (Configuration column: columns) { String name = column.getString("name"); Validate.notNull(name, "column.name can't be null"); String type = column.getString("type"); Validate.notNull(type, "column.type can't be null"); String parquetColumn = String.format("optional %s %s;", type, name); parquetSchemaStringBuffer.append(parquetColumn); } parquetSchemaStringBuffer.append("}"); String parquetSchema = parquetSchemaStringBuffer.toString(); LOG.info("generate parquetSchema:\n{}", parquetSchema); return parquetSchema; } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration writerSliceConfig; private String defaultFS; private String fileType; private String fileName; private HdfsHelper hdfsHelper = null; @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.defaultFS = this.writerSliceConfig.getString(Key.DEFAULT_FS); this.fileType = this.writerSliceConfig.getString(Key.FILE_TYPE); //得当的已经是绝对路径,eg:hdfs://10.101.204.12:9000/user/hive/warehouse/writer.db/text/test.textfile this.fileName = this.writerSliceConfig.getString(Key.FILE_NAME); hdfsHelper = new HdfsHelper(); hdfsHelper.getFileSystem(defaultFS, writerSliceConfig); } @Override public void prepare() { } @Override public void startWrite(RecordReceiver lineReceiver) { LOG.info("begin do write..."); LOG.info(String.format("write to file : [%s]", this.fileName)); if(fileType.equalsIgnoreCase("TEXT")){ //写TEXT FILE hdfsHelper.textFileStartWrite(lineReceiver,this.writerSliceConfig, this.fileName, this.getTaskPluginCollector()); }else if(fileType.equalsIgnoreCase("ORC")){ //写ORC FILE hdfsHelper.orcFileStartWrite(lineReceiver,this.writerSliceConfig, this.fileName, this.getTaskPluginCollector()); } else if (fileType.equalsIgnoreCase("PARQUET")) { //写PARQUET FILE hdfsHelper.parquetFileStartWrite(lineReceiver, this.writerSliceConfig, this.fileName, this.getTaskPluginCollector(), this.writerSliceConfig); } LOG.info("end do write"); } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/HdfsWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.hdfswriter; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by shf on 15/10/8. */ public enum HdfsWriterErrorCode implements ErrorCode { CONFIG_INVALID_EXCEPTION("HdfsWriter-00", "您的参数配置错误."), REQUIRED_VALUE("HdfsWriter-01", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("HdfsWriter-02", "您填写的参数值不合法."), WRITER_FILE_WITH_CHARSET_ERROR("HdfsWriter-03", "您配置的编码未能正常写入."), Write_FILE_IO_ERROR("HdfsWriter-04", "您配置的文件在写入时出现IO异常."), WRITER_RUNTIME_EXCEPTION("HdfsWriter-05", "出现运行时异常, 请联系我们."), CONNECT_HDFS_IO_ERROR("HdfsWriter-06", "与HDFS建立连接时出现IO异常."), COLUMN_REQUIRED_VALUE("HdfsWriter-07", "您column配置中缺失了必须填写的参数值."), HDFS_RENAME_FILE_ERROR("HdfsWriter-08", "将文件移动到配置路径失败."), KERBEROS_LOGIN_ERROR("HdfsWriter-09", "KERBEROS认证失败"), PARSE_MESSAGE_TYPE_FROM_SCHEMA_ERROR("HdfsWriter-10", "Parse parquet schema error"), INIT_PROCCESSOR_FAILURE("HdfsWriter-11", "Init processor failed"); private final String code; private final String description; private HdfsWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.hdfswriter; /** * Created by shf on 15/10/8. */ public class Key { // must have public static final String PATH = "path"; //must have public final static String DEFAULT_FS = "defaultFS"; //must have public final static String FILE_TYPE = "fileType"; // must have public static final String FILE_NAME = "fileName"; // must have for column public static final String COLUMN = "column"; public static final String NAME = "name"; public static final String TYPE = "type"; public static final String DATE_FORMAT = "dateFormat"; // must have public static final String WRITE_MODE = "writeMode"; // must have public static final String FIELD_DELIMITER = "fieldDelimiter"; // not must, default UTF-8 public static final String ENCODING = "encoding"; // not must, default no compress public static final String COMPRESS = "compress"; // not must, not default \N public static final String NULL_FORMAT = "nullFormat"; // Kerberos public static final String HAVE_KERBEROS = "haveKerberos"; public static final String KERBEROS_KEYTAB_FILE_PATH = "kerberosKeytabFilePath"; public static final String KERBEROS_PRINCIPAL = "kerberosPrincipal"; // hadoop config public static final String HADOOP_CONFIG = "hadoopConfig"; // useOldRawDataTransf public final static String PARQUET_FILE_USE_RAW_DATA_TRANSF = "useRawDataTransf"; public final static String DATAX_PARQUET_MODE = "dataxParquetMode"; // hdfs username 默认值 admin public final static String HDFS_USERNAME = "hdfsUsername"; public static final String PROTECTION = "protection"; public static final String PARQUET_SCHEMA = "parquetSchema"; public static final String PARQUET_MERGE_RESULT = "parquetMergeResult"; /** * hive 3.x 或 cdh高版本,使用UTC时区存储时间戳,如果发现时区偏移,该配置项要配置成 true */ public static final String PARQUET_UTC_TIMESTAMP = "parquetUtcTimestamp"; // Kerberos public static final String KERBEROS_CONF_FILE_PATH = "kerberosConfFilePath"; // PanguFS public final static String PANGU_FS_CONFIG = "panguFSConfig"; public final static String PANGU_FS_CONFIG_NUWA_CLUSTER = "nuwaCluster"; public final static String PANGU_FS_CONFIG_NUWA_SERVERS = "nuwaServers"; public final static String PANGU_FS_CONFIG_NUWA_PROXIES = "nuwaProxies"; public final static String PANGU_FS_CONFIG_CAPABILITY = "capability"; public static final String FS_OSS_UPLOAD_THREAD_CONCURRENCY = "ossUploadConcurrency"; // public static final String FS_OSS_UPLOAD_QUEUE_SIZE = "ossUploadQueueSize"; // public static final String FS_OSS_UPLOAD_MAX_PENDING_TASKS_PER_STREAM = "ossUploadMaxPendingTasksPerStream"; public static final String FS_OSS_BLOCKLET_SIZE_MB = "ossBlockSize"; public static final String FILE_SYSTEM_TYPE = "fileSystemType"; public static final String ENABLE_COLUMN_EXCHANGE = "enableColumnExchange"; public static final String SUPPORT_HIVE_DATETIME = "supportHiveDateTime"; } ================================================ FILE: hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/ParquetFileProccessor.java ================================================ package com.alibaba.datax.plugin.writer.hdfswriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import org.apache.hadoop.fs.Path; import parquet.hadoop.ParquetWriter; import parquet.hadoop.metadata.CompressionCodecName; import parquet.schema.MessageType; import java.io.IOException; /** * @author jitongchen * @date 2023/9/7 9:41 AM */ public class ParquetFileProccessor extends ParquetWriter { public ParquetFileProccessor(Path file, MessageType schema, boolean enableDictionary, Configuration taskConfig, TaskPluginCollector taskPluginCollector, org.apache.hadoop.conf.Configuration configuration) throws IOException { this(file, schema, CompressionCodecName.UNCOMPRESSED, enableDictionary, taskConfig, taskPluginCollector, configuration); } public ParquetFileProccessor(Path file, MessageType schema, CompressionCodecName codecName, boolean enableDictionary, Configuration taskConfig, TaskPluginCollector taskPluginCollector) throws IOException { super(file, new ParquetFileSupport(schema, taskConfig, taskPluginCollector), codecName, DEFAULT_BLOCK_SIZE, DEFAULT_PAGE_SIZE, DEFAULT_PAGE_SIZE, enableDictionary, false, DEFAULT_WRITER_VERSION); } public ParquetFileProccessor(Path file, MessageType schema, CompressionCodecName codecName, boolean enableDictionary, Configuration taskConfig, TaskPluginCollector taskPluginCollector, org.apache.hadoop.conf.Configuration configuration) throws IOException { super(file, new ParquetFileSupport(schema, taskConfig, taskPluginCollector), codecName, DEFAULT_BLOCK_SIZE, DEFAULT_PAGE_SIZE, DEFAULT_PAGE_SIZE, enableDictionary, false, DEFAULT_WRITER_VERSION, configuration); } } ================================================ FILE: hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/ParquetFileSupport.java ================================================ package com.alibaba.datax.plugin.writer.hdfswriter; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.LimitLogger; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import parquet.column.ColumnDescriptor; import parquet.hadoop.api.WriteSupport; import parquet.io.api.Binary; import parquet.io.api.RecordConsumer; import parquet.schema.*; import java.math.BigDecimal; import java.math.RoundingMode; import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.sql.Timestamp; import java.text.SimpleDateFormat; import java.time.LocalDateTime; import java.time.OffsetDateTime; import java.time.ZoneOffset; import java.time.temporal.ChronoField; import java.util.Arrays; import java.util.Date; import java.util.HashMap; import java.util.List; import java.util.concurrent.TimeUnit; /** * @author jitongchen * @date 2023/9/7 9:41 AM */ public class ParquetFileSupport extends WriteSupport { public static final Logger LOGGER = LoggerFactory.getLogger(ParquetFileSupport.class); private MessageType schema; private List columns; private RecordConsumer recordConsumer; private boolean useRawDataTransf = true; private boolean printStackTrace = true; // 不通类型的nullFormat private String nullFormat; private String dateFormat; private boolean isUtcTimestamp; private SimpleDateFormat dateParse; private Binary binaryForNull; private TaskPluginCollector taskPluginCollector; private String dataxParquetMode; public ParquetFileSupport(MessageType schema, com.alibaba.datax.common.util.Configuration taskConfig, TaskPluginCollector taskPluginCollector) { this.schema = schema; this.columns = schema.getColumns(); this.useRawDataTransf = taskConfig.getBool(Key.PARQUET_FILE_USE_RAW_DATA_TRANSF, true); // 不通类型的nullFormat this.nullFormat = taskConfig.getString(Key.NULL_FORMAT, Constant.DEFAULT_NULL_FORMAT); this.binaryForNull = Binary.fromString(this.nullFormat); this.dateFormat = taskConfig.getString(Key.DATE_FORMAT, null); if (StringUtils.isNotBlank(this.dateFormat)) { this.dateParse = new SimpleDateFormat(dateFormat); } this.isUtcTimestamp = taskConfig.getBool(Key.PARQUET_UTC_TIMESTAMP, false); this.taskPluginCollector = taskPluginCollector; if (taskConfig.getKeys().contains("dataxParquetMode")) { this.dataxParquetMode = taskConfig.getString("dataxParquetMode"); } else { // 默认值是columns this.dataxParquetMode = "columns"; } } @Override public WriteContext init(Configuration configuration) { return new WriteContext(schema, new HashMap()); } @Override public void prepareForWrite(RecordConsumer recordConsumer) { this.recordConsumer = recordConsumer; } @Override public void write(Record values) { if (dataxParquetMode.equalsIgnoreCase("fields")) { writeBaseOnFields(values); return; } // NOTE: 下面的实现其实是不对的,只是看代码注释貌似有用户已经在用 // 所以暂时不动下面的逻辑。 // 默认走的就是下面的这条代码路径 if (values != null && columns != null && values.getColumnNumber() == columns.size()) { recordConsumer.startMessage(); for (int i = 0; i < columns.size(); i++) { Column value = values.getColumn(i); ColumnDescriptor columnDescriptor = columns.get(i); Type type = this.schema.getFields().get(i); if (value != null) { try { if (this.useRawDataTransf) { if (value.getRawData() == null) { continue; } recordConsumer.startField(columnDescriptor.getPath()[0], i); // 原来使用Column->RawData的方法其实是错误的类型转换策略,会将DataX的数据内部表示形象序列化出去 // 但是 Parquet 已经有用户使用了,故暂时只是配置项切换 String rawData = value.getRawData().toString(); switch (columnDescriptor.getType()) { case BOOLEAN: recordConsumer.addBoolean(Boolean.parseBoolean(rawData)); break; case FLOAT: recordConsumer.addFloat(Float.parseFloat(rawData)); break; case DOUBLE: recordConsumer.addDouble(Double.parseDouble(rawData)); break; case INT32: OriginalType originalType = type.getOriginalType(); if (originalType != null && StringUtils.equalsIgnoreCase("DATE", originalType.name())) { int realVal = (int) (new java.sql.Date(Long.parseLong(rawData)).toLocalDate().toEpochDay()); recordConsumer.addInteger(realVal); } else { recordConsumer.addInteger(Integer.parseInt(rawData)); } break; case INT64: recordConsumer.addLong(Long.valueOf(rawData)); break; case INT96: recordConsumer.addBinary(timestampColToBinary(value)); break; case BINARY: recordConsumer.addBinary(Binary.fromString(rawData)); break; case FIXED_LEN_BYTE_ARRAY: PrimitiveType primitiveType = type.asPrimitiveType(); if (primitiveType.getDecimalMetadata() != null) { // decimal recordConsumer.addBinary(decimalToBinary(value, primitiveType.getDecimalMetadata().getPrecision(), primitiveType.getDecimalMetadata().getScale())); break; } /* fall through */ default: recordConsumer.addBinary(Binary.fromString(rawData)); break; } recordConsumer.endField(columnDescriptor.getPath()[0], i); } else { boolean isNull = null == value.getRawData(); if (!isNull) { recordConsumer.startField(columnDescriptor.getPath()[0], i); // no skip: empty fields are illegal, the field should be ommited completely instead switch (columnDescriptor.getType()) { case BOOLEAN: recordConsumer.addBoolean(value.asBoolean()); break; case FLOAT: recordConsumer.addFloat(value.asDouble().floatValue()); break; case DOUBLE: recordConsumer.addDouble(value.asDouble()); break; case INT32: OriginalType originalType = type.getOriginalType(); if (originalType != null && StringUtils.equalsIgnoreCase("DATE", originalType.name())) { int realVal = (int) (new java.sql.Date(value.asLong()).toLocalDate().toEpochDay()); recordConsumer.addInteger(realVal); } else { recordConsumer.addInteger(value.asLong().intValue()); } break; case INT64: recordConsumer.addLong(value.asLong()); break; case INT96: recordConsumer.addBinary(timestampColToBinary(value)); break; case BINARY: String valueAsString2Write = null; if (Column.Type.DATE == value.getType() && null != this.dateParse) { valueAsString2Write = dateParse.format(value.asDate()); } else { valueAsString2Write = value.asString(); } recordConsumer.addBinary(Binary.fromString(valueAsString2Write)); break; case FIXED_LEN_BYTE_ARRAY: PrimitiveType primitiveType = type.asPrimitiveType(); if (primitiveType.getDecimalMetadata() != null) { // decimal recordConsumer.addBinary(decimalToBinary(value, primitiveType.getDecimalMetadata().getPrecision(), primitiveType.getDecimalMetadata().getScale())); break; } /* fall through */ default: recordConsumer.addBinary(Binary.fromString(value.asString())); break; } recordConsumer.endField(columnDescriptor.getPath()[0], i); } } } catch (Exception e) { if (printStackTrace) { printStackTrace = false; LOGGER.warn("write to parquet error: {}", e.getMessage(), e); } // dirty data if (null != this.taskPluginCollector) { // job post 里面的merge taskPluginCollector 为null this.taskPluginCollector.collectDirtyRecord(values, e, e.getMessage()); } } } else { recordConsumer.addBinary(this.binaryForNull); } } recordConsumer.endMessage(); } } private Binary decimalToBinary(Column value, int precision, int scale) { BigDecimal bigDecimal = value.asBigDecimal(); bigDecimal = bigDecimal.setScale(scale, RoundingMode.HALF_UP); byte[] decimalBytes = bigDecimal.unscaledValue().toByteArray(); int precToBytes = ParquetHiveSerDe.PRECISION_TO_BYTE_COUNT[precision - 1]; if (precToBytes == decimalBytes.length) { // No padding needed. return Binary.fromByteArray(decimalBytes); } byte[] tgt = new byte[precToBytes]; // padding -1 for negative number if (bigDecimal.compareTo(new BigDecimal("0")) < 0) { Arrays.fill(tgt, 0, precToBytes - decimalBytes.length, (byte) -1); } System.arraycopy(decimalBytes, 0, tgt, precToBytes - decimalBytes.length, decimalBytes.length); return Binary.fromByteArray(tgt); } private static final int JULIAN_EPOCH_OFFSET_DAYS = 2_440_588; private static final long MILLIS_IN_DAY = TimeUnit.DAYS.toMillis(1); private static final long MILLS_PER_SECOND = TimeUnit.SECONDS.toMillis(1); private static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1); private static final long NANOS_PER_SECOND = TimeUnit.SECONDS.toNanos(1); private static final ZoneOffset defaultOffset = OffsetDateTime.now().getOffset(); /** * int 96 is timestamp in parquet * * @param valueColumn * @return */ private Binary timestampColToBinary(Column valueColumn) { if (valueColumn.getRawData() == null) { return Binary.EMPTY; } long mills; long nanos = 0; if (valueColumn instanceof DateColumn) { DateColumn dateColumn = (DateColumn) valueColumn; mills = dateColumn.asLong(); nanos = dateColumn.getNanos(); } else { mills = valueColumn.asLong(); } int julianDay; long nanosOfDay; if (isUtcTimestamp) { // utc ignore current timezone (task should set timezone same as hive/hdfs) long seconds = mills >= 0 ? mills / MILLS_PER_SECOND : (mills / MILLS_PER_SECOND - 1); LocalDateTime localDateTime = LocalDateTime.ofEpochSecond(seconds, (int) nanos, defaultOffset); julianDay = (int) (localDateTime.getLong(ChronoField.EPOCH_DAY) + JULIAN_EPOCH_OFFSET_DAYS); nanosOfDay = localDateTime.getLong(ChronoField.NANO_OF_DAY); } else { // local date julianDay = (int) ((mills / MILLIS_IN_DAY) + JULIAN_EPOCH_OFFSET_DAYS); if (mills >= 0) { nanosOfDay = ((mills % MILLIS_IN_DAY) / MILLS_PER_SECOND) * NANOS_PER_SECOND + nanos; } else { julianDay--; nanosOfDay = (((mills % MILLIS_IN_DAY) / MILLS_PER_SECOND) - 1) * NANOS_PER_SECOND + nanos; nanosOfDay += NANOS_PER_DAY; } } ByteBuffer buf = ByteBuffer.allocate(12); buf.order(ByteOrder.LITTLE_ENDIAN); buf.putLong(nanosOfDay); buf.putInt(julianDay); buf.flip(); return Binary.fromByteBuffer(buf); } private void writeBaseOnFields(Record values) { //LOGGER.info("Writing parquet data using fields mode(The correct mode.)"); List types = this.schema.getFields(); if (values != null && types != null && values.getColumnNumber() == types.size()) { recordConsumer.startMessage(); writeFields(types, values); recordConsumer.endMessage(); } } private void writeFields(List types, Record values) { for (int i = 0; i < types.size(); i++) { Type type = types.get(i); Column value = values.getColumn(i); if (value != null) { try { if (type.isPrimitive()) { writePrimitiveType(type, value, i); } else { writeGroupType(type, (JSON) JSON.parse(value.asString()), i); } } catch (Exception e) { if (printStackTrace) { printStackTrace = false; LOGGER.warn("write to parquet error: {}", e.getMessage(), e); } // dirty data if (null != this.taskPluginCollector) { // job post 里面的merge taskPluginCollector 为null this.taskPluginCollector.collectDirtyRecord(values, e, e.getMessage()); } } } } } private void writeFields(List types, JSONObject values) { for (int i = 0; i < types.size(); i++) { Type type = types.get(i); Object value = values.get(type.getName()); if (value != null) { try { if (type.isPrimitive()) { writePrimitiveType(type, value, i); } else { writeGroupType(type, (JSON) value, i); } } catch (Exception e) { if (printStackTrace) { printStackTrace = false; LOGGER.warn("write to parquet error: {}", e.getMessage(), e); } } } else { recordConsumer.addBinary(this.binaryForNull); } } } private void writeGroupType(Type type, JSON value, int index) { GroupType groupType = type.asGroupType(); OriginalType originalType = groupType.getOriginalType(); if (originalType != null) { switch (originalType) { case MAP: writeMap(groupType, value, index); break; case LIST: writeList(groupType, value, index); break; default: break; } } else { // struct writeStruct(groupType, value, index); } } private void writeMap(GroupType groupType, JSON value, int index) { if (value == null) { return; } JSONObject json = (JSONObject) value; if (json.isEmpty()) { return; } recordConsumer.startField(groupType.getName(), index); recordConsumer.startGroup(); // map // key_value start recordConsumer.startField("key_value", 0); recordConsumer.startGroup(); List keyValueFields = groupType.getFields().get(0).asGroupType().getFields(); Type keyType = keyValueFields.get(0); Type valueType = keyValueFields.get(1); for (String key : json.keySet()) { // key writePrimitiveType(keyType, key, 0); // value if (valueType.isPrimitive()) { writePrimitiveType(valueType, json.get(key), 1); } else { writeGroupType(valueType, (JSON) json.get(key), 1); } } recordConsumer.endGroup(); recordConsumer.endField("key_value", 0); // key_value end recordConsumer.endGroup(); recordConsumer.endField(groupType.getName(), index); } private void writeList(GroupType groupType, JSON value, int index) { if (value == null) { return; } JSONArray json = (JSONArray) value; if (json.isEmpty()) { return; } recordConsumer.startField(groupType.getName(), index); // list recordConsumer.startGroup(); // list start recordConsumer.startField("list", 0); recordConsumer.startGroup(); Type elementType = groupType.getFields().get(0).asGroupType().getFields().get(0); if (elementType.isPrimitive()) { for (Object elementValue : json) { writePrimitiveType(elementType, elementValue, 0); } } else { for (Object elementValue : json) { writeGroupType(elementType, (JSON) elementValue, 0); } } recordConsumer.endGroup(); recordConsumer.endField("list", 0); // list end recordConsumer.endGroup(); recordConsumer.endField(groupType.getName(), index); } private void writeStruct(GroupType groupType, JSON value, int index) { if (value == null) { return; } JSONObject json = (JSONObject) value; if (json.isEmpty()) { return; } recordConsumer.startField(groupType.getName(), index); // struct start recordConsumer.startGroup(); writeFields(groupType.getFields(), json); recordConsumer.endGroup(); // struct end recordConsumer.endField(groupType.getName(), index); } private void writePrimitiveType(Type type, Object value, int index) { if (value == null) { return; } recordConsumer.startField(type.getName(), index); PrimitiveType primitiveType = type.asPrimitiveType(); switch (primitiveType.getPrimitiveTypeName()) { case BOOLEAN: recordConsumer.addBoolean((Boolean) value); break; case FLOAT: if (value instanceof Float) { recordConsumer.addFloat(((Float) value).floatValue()); } else if (value instanceof Double) { recordConsumer.addFloat(((Double) value).floatValue()); } else if (value instanceof Long) { recordConsumer.addFloat(((Long) value).floatValue()); } else if (value instanceof Integer) { recordConsumer.addFloat(((Integer) value).floatValue()); } break; case DOUBLE: if (value instanceof Float) { recordConsumer.addDouble(((Float) value).doubleValue()); } else if (value instanceof Double) { recordConsumer.addDouble(((Double) value).doubleValue()); } else if (value instanceof Long) { recordConsumer.addDouble(((Long) value).doubleValue()); } else if (value instanceof Integer) { recordConsumer.addDouble(((Integer) value).doubleValue()); } break; case INT32: if (value instanceof Integer) { recordConsumer.addInteger((Integer) value); } else if (value instanceof Long) { recordConsumer.addInteger(((Long) value).intValue()); } else { // 之前代码写的有问题,导致这里丢列了没抛异常,先收集,后续看看有没有任务命中在决定怎么改 LimitLogger.limit("dirtyDataHiveWriterParquet", TimeUnit.MINUTES.toMillis(1), () -> LOGGER.warn("dirtyDataHiveWriterParquet {}", String.format("Invalid value: %s(clazz: %s) for field: %s", value, value.getClass(), type.getName()))); } break; case INT64: if (value instanceof Integer) { recordConsumer.addLong(((Integer) value).longValue()); } else if (value instanceof Long) { recordConsumer.addInteger(((Long) value).intValue()); } else { // 之前代码写的有问题,导致这里丢列了没抛异常,先收集,后续看看有没有任务命中在决定怎么改 LimitLogger.limit("dirtyDataHiveWriterParquet", TimeUnit.MINUTES.toMillis(1), () -> LOGGER.warn("dirtyDataHiveWriterParquet {}", String.format("Invalid value: %s(clazz: %s) for field: %s", value, value.getClass(), type.getName()))); } break; case INT96: if (value instanceof Integer) { recordConsumer.addBinary(timestampColToBinary(new LongColumn((Integer) value))); } else if (value instanceof Long) { recordConsumer.addBinary(timestampColToBinary(new LongColumn((Long) value))); } else if (value instanceof Timestamp) { recordConsumer.addBinary(timestampColToBinary(new DateColumn((Timestamp) value))); } else if (value instanceof Date) { recordConsumer.addBinary(timestampColToBinary(new DateColumn((Date) value))); } else { recordConsumer.addBinary(timestampColToBinary(new StringColumn(value.toString()))); } break; case FIXED_LEN_BYTE_ARRAY: if (primitiveType.getDecimalMetadata() != null) { // decimal Column column; if (value instanceof Integer) { column = new LongColumn((Integer) value); } else if (value instanceof Long) { column = new LongColumn((Long) value); } else if (value instanceof Double) { column = new DoubleColumn((Double) value); } else if (value instanceof BigDecimal) { column = new DoubleColumn((BigDecimal) value); } else { column = new StringColumn(value.toString()); } recordConsumer.addBinary(decimalToBinary(column, primitiveType.getDecimalMetadata().getPrecision(), primitiveType.getDecimalMetadata().getScale())); break; } /* fall through */ case BINARY: default: recordConsumer.addBinary(Binary.fromString((String) value)); break; } recordConsumer.endField(type.getName(), index); } private void writePrimitiveType(Type type, Column value, int index) { if (value == null || value.getRawData() == null) { return; } recordConsumer.startField(type.getName(), index); PrimitiveType primitiveType = type.asPrimitiveType(); switch (primitiveType.getPrimitiveTypeName()) { case BOOLEAN: recordConsumer.addBoolean(value.asBoolean()); break; case FLOAT: recordConsumer.addFloat(value.asDouble().floatValue()); break; case DOUBLE: recordConsumer.addDouble(value.asDouble()); break; case INT32: OriginalType originalType = type.getOriginalType(); if (OriginalType.DATE.equals(originalType)) { int realVal = (int) (new java.sql.Date(value.asLong()).toLocalDate().toEpochDay()); recordConsumer.addInteger(realVal); } else { recordConsumer.addInteger(value.asLong().intValue()); } break; case INT64: recordConsumer.addLong(value.asLong()); break; case INT96: recordConsumer.addBinary(timestampColToBinary(value)); break; case BINARY: String valueAsString2Write = null; if (Column.Type.DATE == value.getType() && null != this.dateParse) { valueAsString2Write = dateParse.format(value.asDate()); } else { valueAsString2Write = value.asString(); } recordConsumer.addBinary(Binary.fromString(valueAsString2Write)); break; case FIXED_LEN_BYTE_ARRAY: if (primitiveType.getDecimalMetadata() != null) { // decimal recordConsumer.addBinary(decimalToBinary(value, primitiveType.getDecimalMetadata().getPrecision(), primitiveType.getDecimalMetadata().getScale())); break; } /* fall through */ default: recordConsumer.addBinary(Binary.fromString(value.asString())); break; } recordConsumer.endField(type.getName(), index); } } ================================================ FILE: hdfswriter/src/main/java/com/alibaba/datax/plugin/writer/hdfswriter/SupportHiveDataType.java ================================================ package com.alibaba.datax.plugin.writer.hdfswriter; public enum SupportHiveDataType { TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, TIMESTAMP, DATE, STRING, VARCHAR, CHAR, BOOLEAN } ================================================ FILE: hdfswriter/src/main/resources/plugin.json ================================================ { "name": "hdfswriter", "class": "com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriter", "description": "useScene: prod. mechanism: via FileSystem connect HDFS write data concurrent.", "developer": "alibaba" } ================================================ FILE: hdfswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "hdfswriter", "parameter": { "defaultFS": "", "fileType": "", "path": "", "fileName": "", "column": [], "writeMode": "", "fieldDelimiter": "", "compress":"" } } ================================================ FILE: hologresjdbcwriter/doc/hologresjdbcwriter.md ================================================ # DataX HologresJdbcWriter --- ## 1 快速介绍 HologresJdbcWriter 插件实现了写入数据到 Hologres目的表的功能。在底层实现上,HologresJdbcWriter通过JDBC连接远程 Hologres 数据库,并执行相应的 insert into ... on conflict sql 语句将数据写入 Hologres,内部会分批次提交入库。
* HologresJdbcWriter 只支持单表同步 ## 2 实现原理 HologresJdbcWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL插入语句 * `insert into... on conflict ` ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 HologresJdbcWriter导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "hologresjdbcwriter", "parameter": { "username": "xx", "password": "xx", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "jdbc:postgresql://127.0.0.1:3002/datax", "table": [ "test" ] } ], "writeMode" : "REPLACE", "client" : { "writeThreadSize" : 3 } } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。 注意:1、在一个数据库上只能配置一个值。 2、jdbcUrl按照PostgreSQL官方规范,并可以填写连接附加参数信息。具体请参看PostgreSQL官方文档或者咨询对应 DBA。 * 必选:是
* 默认值:无
* **username** * 描述:目的数据库的用户名
* 必选:是
* 默认值:无
* **password** * 描述:目的数据库的密码
* 必选:是
* 默认值:无
* **table** * 描述:目的表的表名称。只支持写入一个表。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
* 默认值:无
* **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用\*表示, 例如: "column": ["\*"] 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 * 必选:是
* 默认值:否
* **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。
* 必选:否
* 默认值:无
* **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
* 必选:否
* 默认值:无
* **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与HologresJdbcWriter的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
* 必选:否
* 默认值:512
* **writeMode** * 描述:当写入hologres有主键表时,控制主键冲突后的策略。REPLACE表示冲突后hologres表的所有字段都被覆盖(未在writer中配置的字段将填充null);UPDATE表示冲突后hologres表writer配置的字段将被覆盖;IGNORE表示冲突后丢弃新数据,不覆盖。
* 必选:否
* 默认值:REPLACE
* **client.writeThreadSize** * 描述:写入hologres的连接池大小,多个连接将并行写入数据。
* 必选:否
* 默认值:1
### 3.3 类型转换 目前 HologresJdbcWriter支持大部分 Hologres类型,但也存在部分没有支持的情况,请注意检查你的类型。 下面列出 HologresJdbcWriter针对 Hologres类型转换列表: | DataX 内部类型| Hologres 数据类型 | | -------- | ----- | | Long |bigint, integer, smallint | | Double |double precision, money, numeric, real | | String |varchar, char, text, bit| | Date |date, time, timestamp | | Boolean |bool| | Bytes |bytea| ================================================ FILE: hologresjdbcwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 hologresjdbcwriter hologresjdbcwriter jar writer data into hologres using jdbc 1.8 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.alibaba.hologres holo-client 2.1.0 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: hologresjdbcwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/hologresjdbcwriter target/ hologresjdbcwriter-0.0.1-SNAPSHOT.jar plugin/writer/hologresjdbcwriter false plugin/writer/hologresjdbcwriter/libs runtime ================================================ FILE: hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/BaseWriter.java ================================================ package com.alibaba.datax.plugin.writer.hologresjdbcwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.writer.hologresjdbcwriter.util.ConfLoader; import com.alibaba.datax.plugin.writer.hologresjdbcwriter.util.OriginalConfPretreatmentUtil; import com.alibaba.datax.plugin.writer.hologresjdbcwriter.util.WriterUtil; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import com.alibaba.hologres.client.HoloClient; import com.alibaba.hologres.client.HoloConfig; import com.alibaba.hologres.client.Put; import com.alibaba.hologres.client.exception.HoloClientWithDetailsException; import com.alibaba.hologres.client.model.TableSchema; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Time; import java.sql.Timestamp; import java.sql.Types; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; public class BaseWriter { protected static final Set ignoreConfList; static { ignoreConfList = new HashSet<>(); ignoreConfList.add("jdbcUrl"); ignoreConfList.add("username"); ignoreConfList.add("password"); ignoreConfList.add("writeMode"); } enum WriteMode { IGNORE, UPDATE, REPLACE } private static WriteMode getWriteMode(String text) { text = text.toUpperCase(); switch (text) { case "IGNORE": return WriteMode.IGNORE; case "UPDATE": return WriteMode.UPDATE; case "REPLACE": return WriteMode.REPLACE; default: throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "writeMode只支持IGNORE,UPDATE,REPLACE,无法识别 " + text); } } public static class Job { private DataBaseType dataBaseType; private static final Logger LOG = LoggerFactory .getLogger(BaseWriter.Job.class); public Job(DataBaseType dataBaseType) { this.dataBaseType = dataBaseType; OriginalConfPretreatmentUtil.DATABASE_TYPE = this.dataBaseType; } public void init(Configuration originalConfig) { OriginalConfPretreatmentUtil.doPretreatment(originalConfig, this.dataBaseType); checkConf(originalConfig); LOG.debug("After job init(), originalConfig now is:[\n{}\n]", originalConfig.toJSON()); } private void checkConf(Configuration originalConfig) { getWriteMode(originalConfig.getString(Key.WRITE_MODE, "REPLACE")); List userConfiguredColumns = originalConfig.getList(Key.COLUMN, String.class); List conns = originalConfig.getList(Constant.CONN_MARK, JSONObject.class); if (conns.size() > 1) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "只支持单表同步"); } int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); if (tableNumber > 1) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "只支持单表同步"); } JSONObject connConf = conns.get(0); String jdbcUrl = connConf.getString(Key.JDBC_URL); String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); String table = connConf.getJSONArray(Key.TABLE).getString(0); Map clientConf = originalConfig.getMap("client"); HoloConfig config = new HoloConfig(); config.setJdbcUrl(jdbcUrl); config.setUsername(username); config.setPassword(password); if (clientConf != null) { try { config = ConfLoader.load(clientConf, config, ignoreConfList); } catch (Exception e) { throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, "配置解析失败."); } } try (HoloClient client = new HoloClient(config)) { TableSchema schema = client.getTableSchema(table); LOG.info("table {} column info:", schema.getTableNameObj().getFullName()); for (com.alibaba.hologres.client.model.Column column : schema.getColumnSchema()) { LOG.info("name:{},type:{},typeName:{},nullable:{},defaultValue:{}", column.getName(), column.getType(), column.getTypeName(), column.getAllowNull(), column.getDefaultValue()); } for (String userColumn : userConfiguredColumns) { if (schema.getColumnIndex(userColumn) == null) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, "配置的列 " + userColumn + " 不存在"); } } } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.CONN_DB_ERROR, "获取表schema失败", e); } } // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) public void prepare(Configuration originalConfig) { try { String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); Configuration connConf = Configuration.from(conns.get(0) .toString()); String jdbcUrl = connConf.getString(Key.JDBC_URL); originalConfig.set(Key.JDBC_URL, jdbcUrl); String table = connConf.getList(Key.TABLE, String.class).get(0); originalConfig.set(Key.TABLE, table); List preSqls = originalConfig.getList(Key.PRE_SQL, String.class); List renderedPreSqls = WriterUtil.renderPreOrPostSqls( preSqls, table); originalConfig.remove(Constant.CONN_MARK); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { // 说明有 preSql 配置,则此处删除掉 originalConfig.remove(Key.PRE_SQL); String tempJdbcUrl = jdbcUrl.replace("jdbc:postgresql://", "jdbc:hologres://"); try (Connection conn = DriverManager.getConnection( tempJdbcUrl, username, password)) { LOG.info("Begin to execute preSqls:[{}]. context info:{}.", StringUtils.join(renderedPreSqls, ";"), tempJdbcUrl); WriterUtil.executeSqls(conn, renderedPreSqls, tempJdbcUrl, dataBaseType); } } LOG.debug("After job prepare(), originalConfig now is:[\n{}\n]", originalConfig.toJSON()); } catch (SQLException e) { throw DataXException.asDataXException(DBUtilErrorCode.SQL_EXECUTE_FAIL, e); } } public List split(Configuration originalConfig, int mandatoryNumber) { return WriterUtil.doSplit(originalConfig, mandatoryNumber); } // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) public void post(Configuration originalConfig) { try { String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); String jdbcUrl = originalConfig.getString(Key.JDBC_URL); String table = originalConfig.getString(Key.TABLE); List postSqls = originalConfig.getList(Key.POST_SQL, String.class); List renderedPostSqls = WriterUtil.renderPreOrPostSqls( postSqls, table); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { // 说明有 postSql 配置,则此处删除掉 originalConfig.remove(Key.POST_SQL); String tempJdbcUrl = jdbcUrl.replace("jdbc:postgresql://", "jdbc:hologres://"); try (Connection conn = DriverManager.getConnection( tempJdbcUrl, username, password)) { LOG.info( "Begin to execute postSqls:[{}]. context info:{}.", StringUtils.join(renderedPostSqls, ";"), tempJdbcUrl); WriterUtil.executeSqls(conn, renderedPostSqls, tempJdbcUrl, dataBaseType); } } } catch (SQLException e) { throw DataXException.asDataXException(DBUtilErrorCode.SQL_EXECUTE_FAIL, e); } } public void destroy(Configuration originalConfig) { } } public static class Task { protected static final Logger LOG = LoggerFactory .getLogger(BaseWriter.Task.class); protected DataBaseType dataBaseType; protected String username; protected String password; protected String jdbcUrl; protected String table; protected List columns; protected int batchSize; protected int batchByteSize; protected int columnNumber = 0; protected TaskPluginCollector taskPluginCollector; // 作为日志显示信息时,需要附带的通用信息。比如信息所对应的数据库连接等信息,针对哪个表做的操作 protected static String BASIC_MESSAGE; protected WriteMode writeMode; protected String arrayDelimiter; protected boolean emptyAsNull; protected HoloConfig config; public Task(DataBaseType dataBaseType) { this.dataBaseType = dataBaseType; } public void init(Configuration writerSliceConfig) { this.username = writerSliceConfig.getString(Key.USERNAME); this.password = writerSliceConfig.getString(Key.PASSWORD); this.jdbcUrl = writerSliceConfig.getString(Key.JDBC_URL); this.table = writerSliceConfig.getString(Key.TABLE); this.columns = writerSliceConfig.getList(Key.COLUMN, String.class); this.columnNumber = this.columns.size(); this.arrayDelimiter = writerSliceConfig.getString(Key.Array_Delimiter); this.batchSize = writerSliceConfig.getInt(Key.BATCH_SIZE, Constant.DEFAULT_BATCH_SIZE); this.batchByteSize = writerSliceConfig.getInt(Key.BATCH_BYTE_SIZE, Constant.DEFAULT_BATCH_BYTE_SIZE); writeMode = getWriteMode(writerSliceConfig.getString(Key.WRITE_MODE, "REPLACE")); emptyAsNull = writerSliceConfig.getBool(Key.EMPTY_AS_NULL, true); Map clientConf = writerSliceConfig.getMap("client"); config = new HoloConfig(); config.setJdbcUrl(this.jdbcUrl); config.setUsername(username); config.setPassword(password); config.setWriteMode(writeMode == WriteMode.IGNORE ? com.alibaba.hologres.client.model.WriteMode.INSERT_OR_IGNORE : (writeMode == WriteMode.UPDATE ? com.alibaba.hologres.client.model.WriteMode.INSERT_OR_UPDATE : com.alibaba.hologres.client.model.WriteMode.INSERT_OR_REPLACE)); config.setWriteBatchSize(this.batchSize); config.setWriteBatchTotalByteSize(this.batchByteSize); config.setMetaCacheTTL(3600000L); config.setEnableDefaultForNotNullColumn(false); config.setRetryCount(5); config.setAppName("datax"); if (clientConf != null) { try { config = ConfLoader.load(clientConf, config, ignoreConfList); } catch (Exception e) { throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, "配置解析失败."); } } BASIC_MESSAGE = String.format("jdbcUrl:[%s], table:[%s]", this.jdbcUrl, this.table); } public void prepare(Configuration writerSliceConfig) { } public void startWriteWithConnection(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector) { this.taskPluginCollector = taskPluginCollector; try (HoloClient client = new HoloClient(config)) { Record record; TableSchema schema = RetryUtil.executeWithRetry(() -> client.getTableSchema(this.table), 3, 5000L, true); while ((record = recordReceiver.getFromReader()) != null) { if (record.getColumnNumber() != this.columnNumber) { // 源头读取字段列数与目的表字段写入列数不相等,直接报错 throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "列配置信息有错误. 因为您配置的任务中,源头读取字段数:%s 与 目的表要写入的字段数:%s 不相等. 请检查您的配置并作出修改.", record.getColumnNumber(), this.columnNumber)); } Put put = convertToPut(record, schema); if (null != put) { try { client.put(put); } catch (HoloClientWithDetailsException detail) { handleDirtyData(detail); } } } try { client.flush(); } catch (HoloClientWithDetailsException detail) { handleDirtyData(detail); } } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } } private void handleDirtyData(HoloClientWithDetailsException detail) { for (int i = 0; i < detail.size(); ++i) { com.alibaba.hologres.client.model.Record failRecord = detail.getFailRecord(i); if (failRecord.getAttachmentList() != null) { for (Object obj : failRecord.getAttachmentList()) { taskPluginCollector.collectDirtyRecord((Record) obj, detail.getException(i)); } } } } public void startWrite(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector) { startWriteWithConnection(recordReceiver, taskPluginCollector); } public void post(Configuration writerSliceConfig) { } public void destroy(Configuration writerSliceConfig) { } // 直接使用了两个类变量:columnNumber,resultSetMetaData protected Put convertToPut(Record record, TableSchema schema) { try { Put put = new Put(schema); put.getRecord().addAttachment(record); for (int i = 0; i < this.columnNumber; i++) { fillColumn(put, schema, schema.getColumnIndex(this.columns.get(i)), record.getColumn(i)); } return put; } catch (Exception e) { taskPluginCollector.collectDirtyRecord(record, e); return null; } } protected void fillColumn(Put data, TableSchema schema, int index, Column column) throws SQLException { com.alibaba.hologres.client.model.Column holoColumn = schema.getColumn(index); switch (holoColumn.getType()) { case Types.CHAR: case Types.NCHAR: case Types.CLOB: case Types.NCLOB: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: String value = column.asString(); if (emptyAsNull && value != null && value.length() == 0) { data.setObject(index, null); } else { data.setObject(index, value); } break; case Types.SMALLINT: if (column.getByteSize() > 0) { data.setObject(index, column.asBigInteger().shortValue()); } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.INTEGER: if (column.getByteSize() > 0) { data.setObject(index, column.asBigInteger().intValue()); } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.BIGINT: if (column.getByteSize() > 0) { data.setObject(index, column.asBigInteger().longValue()); } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.NUMERIC: case Types.DECIMAL: if (column.getByteSize() > 0) { data.setObject(index, column.asBigDecimal()); } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.FLOAT: case Types.REAL: if (column.getByteSize() > 0) { data.setObject(index, column.asBigDecimal().floatValue()); } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.DOUBLE: if (column.getByteSize() > 0) { data.setObject(index, column.asDouble()); } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.TIME: if (column.getByteSize() > 0) { if (column instanceof LongColumn || column instanceof DateColumn) { data.setObject(index, new Time(column.asLong())); } else { data.setObject(index, column.asString()); } } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.DATE: if (column.getByteSize() > 0) { if (column instanceof LongColumn || column instanceof DateColumn) { data.setObject(index, column.asLong()); } else { data.setObject(index, column.asString()); } } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.TIMESTAMP: if (column.getByteSize() > 0) { if (column instanceof LongColumn || column instanceof DateColumn) { data.setObject(index, new Timestamp(column.asLong())); } else { data.setObject(index, column.asString()); } } else if (emptyAsNull) { data.setObject(index, null); } break; case Types.BINARY: case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: String byteValue = column.asString(); if (null != byteValue) { data.setObject(index, column .asBytes()); } break; case Types.BOOLEAN: case Types.BIT: if (column.getByteSize() == 0) { break; } try { Boolean boolValue = column.asBoolean(); data.setObject(index, boolValue); } catch (Exception e) { data.setObject(index, !"0".equals(column.asString())); } break; case Types.ARRAY: String arrayString = column.asString(); Object arrayObject = null; if (null == arrayString || (emptyAsNull && "".equals(arrayString))) { data.setObject(index, null); break; } else if (arrayDelimiter != null && arrayDelimiter.length() > 0) { arrayObject = arrayString.split(this.arrayDelimiter); } else { arrayObject = JSONArray.parseArray(arrayString); } data.setObject(index, arrayObject); break; default: throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.", holoColumn.getName(), holoColumn.getType(), holoColumn.getTypeName())); } } } } ================================================ FILE: hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.hologresjdbcwriter; /** * 用于插件解析用户配置时,需要进行标识(MARK)的常量的声明. */ public final class Constant { public static final int DEFAULT_BATCH_SIZE = 512; public static final int DEFAULT_BATCH_BYTE_SIZE = 50 * 1024 * 1024; public static String CONN_MARK = "connection"; public static String TABLE_NUMBER_MARK = "tableNumber"; } ================================================ FILE: hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/HologresJdbcWriter.java ================================================ package com.alibaba.datax.plugin.writer.hologresjdbcwriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class HologresJdbcWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.PostgreSQL; public static class Job extends Writer.Job { private Configuration originalConfig = null; private BaseWriter.Job baseWriterMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.baseWriterMaster = new BaseWriter.Job(DATABASE_TYPE); this.baseWriterMaster.init(this.originalConfig); } @Override public void prepare() { this.baseWriterMaster.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.baseWriterMaster.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.baseWriterMaster.post(this.originalConfig); } @Override public void destroy() { this.baseWriterMaster.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private BaseWriter.Task baseWriterSlave; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.baseWriterSlave = new BaseWriter.Task(DATABASE_TYPE); this.baseWriterSlave.init(this.writerSliceConfig); } @Override public void prepare() { this.baseWriterSlave.prepare(this.writerSliceConfig); } public void startWrite(RecordReceiver recordReceiver) { this.baseWriterSlave.startWrite(recordReceiver, super.getTaskPluginCollector()); } @Override public void post() { this.baseWriterSlave.post(this.writerSliceConfig); } @Override public void destroy() { this.baseWriterSlave.destroy(this.writerSliceConfig); } } } ================================================ FILE: hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.hologresjdbcwriter; public final class Key { public final static String JDBC_URL = "jdbcUrl"; public final static String USERNAME = "username"; public final static String PASSWORD = "password"; public final static String TABLE = "table"; public final static String COLUMN = "column"; public final static String Array_Delimiter = "arrayDelimiter"; public final static String WRITE_MODE = "writeMode"; public final static String PRE_SQL = "preSql"; public final static String POST_SQL = "postSql"; //默认值:256 public final static String BATCH_SIZE = "batchSize"; //默认值:50m public final static String BATCH_BYTE_SIZE = "batchByteSize"; public final static String EMPTY_AS_NULL = "emptyAsNull"; } ================================================ FILE: hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/util/ConfLoader.java ================================================ package com.alibaba.datax.plugin.writer.hologresjdbcwriter.util; import com.alibaba.hologres.client.model.WriteMode; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.lang.reflect.Field; import java.util.Map; import java.util.Set; public class ConfLoader { public static Logger LOG = LoggerFactory.getLogger(ConfLoader.class); static public T load(Map props, T config, Set ignoreList) throws Exception { Field[] fields = config.getClass().getDeclaredFields(); for (Map.Entry entry : props.entrySet()) { String key = entry.getKey(); String value = entry.getValue().toString(); if (ignoreList.contains(key)) { LOG.info("Config Skip {}", key); continue; } boolean match = false; for (Field field : fields) { if (field.getName().equals(key)) { match = true; field.setAccessible(true); Class type = field.getType(); if (type.equals(String.class)) { field.set(config, value); } else if (type.equals(int.class)) { field.set(config, Integer.parseInt(value)); } else if (type.equals(long.class)) { field.set(config, Long.parseLong(value)); } else if (type.equals(boolean.class)) { field.set(config, Boolean.parseBoolean(value)); } else if (WriteMode.class.equals(type)) { field.set(config, WriteMode.valueOf(value)); } else { throw new Exception("invalid type " + type + " for param " + key); } if ("password".equals(key)) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < value.length(); ++i) { sb.append("*"); } LOG.info("Config {}={}", key, sb.toString()); } else { LOG.info("Config {}={}", key, value); } } } if (!match) { throw new Exception("param " + key + " not found in HoloConfig"); } } return config; } } ================================================ FILE: hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/util/OriginalConfPretreatmentUtil.java ================================================ package com.alibaba.datax.plugin.writer.hologresjdbcwriter.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.TableExpandUtil; import com.alibaba.datax.plugin.writer.hologresjdbcwriter.Constant; import com.alibaba.datax.plugin.writer.hologresjdbcwriter.Key; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public final class OriginalConfPretreatmentUtil { private static final Logger LOG = LoggerFactory .getLogger(OriginalConfPretreatmentUtil.class); public static DataBaseType DATABASE_TYPE; public static void doPretreatment(Configuration originalConfig, DataBaseType dataBaseType) { // 检查 username/password 配置(必填) originalConfig.getNecessaryValue(Key.USERNAME, DBUtilErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.PASSWORD, DBUtilErrorCode.REQUIRED_VALUE); doCheckBatchSize(originalConfig); simplifyConf(originalConfig); } public static void doCheckBatchSize(Configuration originalConfig) { // 检查batchSize 配置(选填,如果未填写,则设置为默认值) int batchSize = originalConfig.getInt(Key.BATCH_SIZE, Constant.DEFAULT_BATCH_SIZE); if (batchSize < 1) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, String.format( "您的batchSize配置有误. 您所配置的写入数据库表的 batchSize:%s 不能小于1. 推荐配置范围为:[256-1024] (保持128的倍数), 该值越大, 内存溢出可能性越大. 请检查您的配置并作出修改.", batchSize)); } originalConfig.set(Key.BATCH_SIZE, batchSize); } public static void simplifyConf(Configuration originalConfig) { List connections = originalConfig.getList(Constant.CONN_MARK, Object.class); int tableNum = 0; for (int i = 0, len = connections.size(); i < len; i++) { Configuration connConf = Configuration.from(connections.get(i).toString()); String jdbcUrl = connConf.getString(Key.JDBC_URL); if (StringUtils.isBlank(jdbcUrl)) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, "您未配置的写入数据库表的 jdbcUrl."); } List tables = connConf.getList(Key.TABLE, String.class); if (null == tables || tables.isEmpty()) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, "您未配置写入数据库表的表名称. 根据配置DataX找不到您配置的表. 请检查您的配置并作出修改."); } // 对每一个connection 上配置的table 项进行解析 List expandedTables = TableExpandUtil .expandTableConf(DATABASE_TYPE, tables); if (null == expandedTables || expandedTables.isEmpty()) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, "您配置的写入数据库表名称错误. DataX找不到您配置的表,请检查您的配置并作出修改."); } tableNum += expandedTables.size(); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.TABLE), expandedTables); } originalConfig.set(Constant.TABLE_NUMBER_MARK, tableNum); } } ================================================ FILE: hologresjdbcwriter/src/main/java/com/alibaba/datax/plugin/writer/hologresjdbcwriter/util/WriterUtil.java ================================================ package com.alibaba.datax.plugin.writer.hologresjdbcwriter.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.datax.plugin.rdbms.writer.Key; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.Statement; import java.util.ArrayList; import java.util.Collections; import java.util.List; public final class WriterUtil { private static final Logger LOG = LoggerFactory.getLogger(WriterUtil.class); //TODO 切分报错 public static List doSplit(Configuration simplifiedConf, int adviceNumber) { List splitResultConfigs = new ArrayList(); int tableNumber = simplifiedConf.getInt(Constant.TABLE_NUMBER_MARK); //处理单表的情况 if (tableNumber == 1) { //由于在之前的 master prepare 中已经把 table,jdbcUrl 提取出来,所以这里处理十分简单 for (int j = 0; j < adviceNumber; j++) { splitResultConfigs.add(simplifiedConf.clone()); } return splitResultConfigs; } if (tableNumber != adviceNumber) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, String.format("您的配置文件中的列配置信息有误. 您要写入的目的端的表个数是:%s , 但是根据系统建议需要切分的份数是:%s. 请检查您的配置并作出修改.", tableNumber, adviceNumber)); } String jdbcUrl; List preSqls = simplifiedConf.getList(Key.PRE_SQL, String.class); List postSqls = simplifiedConf.getList(Key.POST_SQL, String.class); List conns = simplifiedConf.getList(Constant.CONN_MARK, Object.class); for (Object conn : conns) { Configuration sliceConfig = simplifiedConf.clone(); Configuration connConf = Configuration.from(conn.toString()); jdbcUrl = connConf.getString(Key.JDBC_URL); sliceConfig.set(Key.JDBC_URL, jdbcUrl); sliceConfig.remove(Constant.CONN_MARK); List tables = connConf.getList(Key.TABLE, String.class); for (String table : tables) { Configuration tempSlice = sliceConfig.clone(); tempSlice.set(Key.TABLE, table); tempSlice.set(Key.PRE_SQL, renderPreOrPostSqls(preSqls, table)); tempSlice.set(Key.POST_SQL, renderPreOrPostSqls(postSqls, table)); splitResultConfigs.add(tempSlice); } } return splitResultConfigs; } public static List renderPreOrPostSqls(List preOrPostSqls, String tableName) { if (null == preOrPostSqls) { return Collections.emptyList(); } List renderedSqls = new ArrayList(); for (String sql : preOrPostSqls) { //preSql为空时,不加入执行队列 if (StringUtils.isNotBlank(sql)) { renderedSqls.add(sql.replace(Constant.TABLE_NAME_PLACEHOLDER, tableName)); } } return renderedSqls; } public static void executeSqls(Connection conn, List sqls, String basicMessage,DataBaseType dataBaseType) { Statement stmt = null; String currentSql = null; try { stmt = conn.createStatement(); for (String sql : sqls) { currentSql = sql; DBUtil.executeSqlWithoutResultSet(stmt, sql); } } catch (Exception e) { throw RdbmsException.asQueryException(dataBaseType,e,currentSql,null,null); } finally { DBUtil.closeDBResources(null, stmt, null); } } } ================================================ FILE: hologresjdbcwriter/src/main/resources/plugin.json ================================================ { "name": "hologresjdbcwriter", "class": "com.alibaba.datax.plugin.writer.hologresjdbcwriter.HologresJdbcWriter", "description": "", "developer": "alibaba" } ================================================ FILE: hologresjdbcwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "hologresjdbcwriter", "parameter": { "url": "", "username": "", "password": "", "database": "", "table": "", "partition": "" } } ================================================ FILE: introduction.md ================================================ # 阿里云开源离线同步工具DataX3.0介绍 ## 一. DataX3.0概览 ​ DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。 ![datax_why_new](https://cloud.githubusercontent.com/assets/1067175/17879841/93b7fc1c-6927-11e6-8cda-7cf8420fc65f.png) - #### 设计理念 为了解决异构数据源同步问题,DataX将复杂的网状的同步链路变成了星型数据链路,DataX作为中间传输载体负责连接各种数据源。当需要接入一个新的数据源的时候,只需要将此数据源对接到DataX,便能跟已有的数据源做到无缝数据同步。 - #### 当前使用现状 DataX在阿里巴巴集团内被广泛使用,承担了所有大数据的离线同步业务,并已持续稳定运行了6年之久。目前每天完成同步8w多道作业,每日传输数据量超过300TB。 此前已经开源DataX1.0版本,此次介绍为阿里云开源全新版本DataX3.0,有了更多更强大的功能和更好的使用体验。Github主页地址:https://github.com/alibaba/DataX ## 二、DataX3.0框架设计 ![datax_framework_new](https://cloud.githubusercontent.com/assets/1067175/17879884/ec7e36f4-6927-11e6-8f5f-ffc43d6a468b.png) DataX本身作为离线数据同步框架,采用Framework + plugin架构构建。将数据源读取和写入抽象成为Reader/Writer插件,纳入到整个同步框架中。 - Reader:Reader为数据采集模块,负责采集数据源的数据,将数据发送给Framework。 - Writer: Writer为数据写入模块,负责不断向Framework取数据,并将数据写入到目的端。 - Framework:Framework用于连接reader和writer,作为两者的数据传输通道,并处理缓冲,流控,并发,数据转换等核心技术问题。 ## 三. DataX3.0插件体系 ​ 经过几年积累,DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入。DataX目前支持数据如下: | 类型 | 数据源 | Reader(读) | Writer(写) |文档| | ------------ | ---------- | :-------: | :-------: |:-------: | | RDBMS 关系型数据库 | MySQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md)| |             | Oracle     |     √     |     √     |[读](https://github.com/alibaba/DataX/blob/master/oraclereader/doc/oraclereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/oraclewriter/doc/oraclewriter.md)| |             | OceanBase  |     √     |     √     |[读](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase) 、[写](https://open.oceanbase.com/docs/community/oceanbase-database/V3.1.0/use-datax-to-full-migration-data-to-oceanbase)| | | SQLServer | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/sqlserverreader/doc/sqlserverreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/sqlserverwriter/doc/sqlserverwriter.md)| | | PostgreSQL | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/postgresqlreader/doc/postgresqlreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/postgresqlwriter/doc/postgresqlwriter.md)| | | DRDS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/drdsreader/doc/drdsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/drdswriter/doc/drdswriter.md)| | | 达梦 | √ | √ |[读]() 、[写]()| | | 通用RDBMS(支持所有关系型数据库) | √ | √ |[读]() 、[写]()| | 阿里云数仓数据存储 | ODPS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/odpsreader/doc/odpsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/odpsswriter/doc/odpswriter.md)| | | ADS | | √ |[写](https://github.com/alibaba/DataX/blob/master/adswriter/doc/adswriter.md)| | | OSS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ossreader/doc/ossreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/osswriter/doc/osswriter.md)| | | OCS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ocsreader/doc/ocsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ocswriter/doc/ocswriter.md)| | NoSQL数据存储 | OTS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md)| | | Hbase0.94 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase094xreader/doc/hbase094xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase094xwriter/doc/hbase094xwriter.md)| | | Hbase1.1 | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hbase11xreader/doc/hbase11xreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hbase11xwriter/doc/hbase11xwriter.md)| | | MongoDB | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md)| | | Hive | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| | 无结构化数据存储 | TxtFile | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md) 、[写](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)| | | FTP | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md)| | | HDFS | √ | √ |[读](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md) 、[写](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)| | | Elasticsearch | | √ |[写](https://github.com/alibaba/DataX/blob/master/elasticsearchwriter/doc/elasticsearchwriter.md)| DataX Framework提供了简单的接口与插件交互,提供简单的插件接入机制,只需要任意加上一种插件,就能无缝对接其他数据源。详情请看:[DataX数据源指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels) ## 四、DataX3.0核心架构 DataX 3.0 开源版本支持单机多线程模式完成同步作业运行,本小节按一个DataX作业生命周期的时序图,从整体架构设计非常简要说明DataX各个模块相互关系。 ![datax_arch](https://cloud.githubusercontent.com/assets/1067175/17850849/aa6c95a8-6891-11e6-94b7-39f0ab5af3b4.png) #### 核心模块介绍: 1. DataX完成单个数据同步的作业,我们称之为Job,DataX接受到一个Job之后,将启动一个进程来完成整个作业同步过程。DataX Job模块是单个作业的中枢管理节点,承担了数据清理、子任务切分(将单一作业计算转化为多个子Task)、TaskGroup管理等功能。 2. DataXJob启动后,会根据不同的源端切分策略,将Job切分成多个小的Task(子任务),以便于并发执行。Task便是DataX作业的最小单元,每一个Task都会负责一部分数据的同步工作。 3. 切分多个Task之后,DataX Job会调用Scheduler模块,根据配置的并发数据量,将拆分成的Task重新组合,组装成TaskGroup(任务组)。每一个TaskGroup负责以一定的并发运行完毕分配好的所有Task,默认单个任务组的并发数量为5。 4. 每一个Task都由TaskGroup负责启动,Task启动后,会固定启动Reader—>Channel—>Writer的线程来完成任务同步工作。 5. DataX作业运行起来之后, Job监控并等待多个TaskGroup模块任务完成,等待所有TaskGroup任务完成后Job成功退出。否则,异常退出,进程退出值非0 #### DataX调度流程: 举例来说,用户提交了一个DataX作业,并且配置了20个并发,目的是将一个100张分表的mysql数据同步到odps里面。 DataX的调度决策思路是: 1. DataXJob根据分库分表切分成了100个Task。 2. 根据20个并发,DataX计算共需要分配4个TaskGroup。 3. 4个TaskGroup平分切分好的100个Task,每一个TaskGroup负责以5个并发共计运行25个Task。 ## 五、DataX 3.0六大核心优势 - #### 可靠的数据质量监控 - 完美解决数据传输个别类型失真问题 DataX旧版对于部分数据类型(比如时间戳)传输一直存在毫秒阶段等数据失真情况,新版本DataX3.0已经做到支持所有的强数据类型,每一种插件都有自己的数据类型转换策略,让数据可以完整无损的传输到目的端。 - 提供作业全链路的流量、数据量运行时监控 DataX3.0运行过程中可以将作业本身状态、数据流量、数据速度、执行进度等信息进行全面的展示,让用户可以实时了解作业状态。并可在作业执行过程中智能判断源端和目的端的速度对比情况,给予用户更多性能排查信息。 - 提供脏数据探测 在大量数据的传输过程中,必定会由于各种原因导致很多数据传输报错(比如类型转换错误),这种数据DataX认为就是脏数据。DataX目前可以实现脏数据精确过滤、识别、采集、展示,为用户提供多种的脏数据处理模式,让用户准确把控数据质量大关! - #### 丰富的数据转换功能 DataX作为一个服务于大数据的ETL工具,除了提供数据快照搬迁功能之外,还提供了丰富数据转换的功能,让数据在传输过程中可以轻松完成数据脱敏,补全,过滤等数据转换功能,另外还提供了自动groovy函数,让用户自定义转换函数。详情请看DataX3的transformer详细介绍。 - #### 精准的速度控制 还在为同步过程对在线存储压力影响而担心吗?新版本DataX3.0提供了包括通道(并发)、记录流、字节流三种流控模式,可以随意控制你的作业速度,让你的作业在库可以承受的范围内达到最佳的同步速度。 ```json "speed": { "channel": 5, "byte": 1048576, "record": 10000 } ``` - #### 强劲的同步性能 DataX3.0每一种读插件都有一种或多种切分策略,都能将作业合理切分成多个Task并行执行,单机多线程执行模型可以让DataX速度随并发成线性增长。在源端和目的端性能都足够的情况下,单个作业一定可以打满网卡。另外,DataX团队对所有的已经接入的插件都做了极致的性能优化,并且做了完整的性能测试。性能测试相关详情可以参照每单个数据源的详细介绍:[DataX数据源指南](https://github.com/alibaba/DataX/wiki/DataX-all-data-channels) - #### 健壮的容错机制 DataX作业是极易受外部因素的干扰,网络闪断、数据源不稳定等因素很容易让同步到一半的作业报错停止。因此稳定性是DataX的基本要求,在DataX 3.0的设计中,重点完善了框架和插件的稳定性。目前DataX3.0可以做到线程级别、进程级别(暂时未开放)、作业级别多层次局部/全局的重试,保证用户的作业稳定运行。 - 线程内部重试 DataX的核心插件都经过团队的全盘review,不同的网络交互方式都有不同的重试策略。 - 线程级别重试 目前DataX已经可以实现TaskFailover,针对于中间失败的Task,DataX框架可以做到整个Task级别的重新调度。 - #### 极简的使用体验 - 易用 下载即可用,支持linux和windows,只需要短短几步骤就可以完成数据的传输。请点击:[Quick Start](https://github.com/alibaba/DataX/wiki/Quick-Start) - 详细 DataX在运行日志中打印了大量信息,其中包括传输速度,Reader、Writer性能,进程CPU,JVM和GC情况等等。 - 传输过程中打印传输速度、进度等 ![datax_run_speed](https://cloud.githubusercontent.com/assets/1067175/17850877/d1612c0a-6891-11e6-9970-d6693c15ef24.png) - 传输过程中会打印进程相关的CPU、JVM等 ![datax_run_cpu](https://cloud.githubusercontent.com/assets/1067175/17850903/ee63c2fe-6891-11e6-9056-97d7e3d13d8d.png) - 在任务结束之后,打印总体运行情况 ![datax_end_info](https://cloud.githubusercontent.com/assets/1067175/17850930/0484d3ac-6892-11e6-9c1d-b102ad210a32.png) ================================================ FILE: kingbaseesreader/doc/kingbaseesreader.md ================================================ # KingbaseesReader 插件文档 ___ ## 1 快速介绍 KingbaseesReader插件实现了从KingbaseES读取数据。在底层实现上,KingbaseesReader通过JDBC连接远程KingbaseES数据库,并执行相应的sql语句将数据从KingbaseES库中SELECT出来。 ## 2 实现原理 简而言之,KingbaseesReader通过JDBC连接器连接到远程的KingbaseES数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程KingbaseES数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,KingbaseesReader将其拼接为SQL语句发送到KingbaseES数据库;对于用户配置querySql信息,KingbaseesReader直接将其发送到KingbaseES数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从KingbaseES数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte": 1048576 }, //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { "name": "kingbaseesreader", "parameter": { // 数据库连接用户名 "username": "xx", // 数据库连接密码 "password": "xx", "column": [ "id","name" ], //切分主键 "splitPk": "id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:kingbase8://host:port/database" ] } ] } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print":true, } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "speed": 1048576 }, "content": [ { "reader": { "name": "kingbaseesreader", "parameter": { "username": "xx", "password": "xx", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;" ], "jdbcUrl": [ "jdbc:kingbase8://host:port/database", "jdbc:kingbase8://host:port/database" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,KingbaseesReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,KingbaseesReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 jdbcUrl按照KingbaseES官方规范,并可以填写连接附件控制信息。具体请参看[KingbaseES官方文档](https://help.kingbase.com.cn/doc-view-5683.html)。 * 必选:是
* 默认值:无
* **username** * 描述:数据源的用户名
* 必选:是
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,KingbaseesReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照KingbaseES语法格式: ["id", "'hello'::varchar", "true", "2.5::real", "power(2,3)"] id为普通列名,'hello'::varchar为字符串常量,true为布尔值,2.5为浮点数, power(2,3)为函数。 **column必须用户显示指定同步的列集合,不允许为空!** * 必选:是
* 默认值:无
* **splitPk** * 描述:KingbaseesReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形数据切分,`不支持浮点、字符串型、日期等其他类型`。如果用户指定其他非支持类型,KingbaseesReader将报错! splitPk设置为空,底层将视作用户不允许对单表进行切分,因此使用单通道进行抽取。 * 必选:否
* 默认值:空
* **where** * 描述:筛选条件,KingbaseesReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
where条件可以有效地进行业务增量同步。 where条件不配置或者为空,视作全表同步数据。 * 必选:否
* 默认值:无
* **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
`当用户配置querySql时,KingbaseesReader直接忽略table、column、where条件的配置`。 * 必选:否
* 默认值:无
* **fetchSize** * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
`注意,该值过大(>2048)可能造成DataX进程OOM。`。 * 必选:否
* 默认值:1024
### 3.3 类型转换 目前KingbaseesReader支持大部分KingbaseES类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出KingbaseesReader针对KingbaseES类型转换列表: | DataX 内部类型| KingbaseES 数据类型 | | -------- | ----- | | Long |bigint, bigserial, integer, smallint, serial | | Double |double precision, money, numeric, real | | String |varchar, char, text, bit, inet| | Date |date, time, timestamp | | Boolean |bool| | Bytes |bytea| 请注意: * `除上述罗列字段类型外,其他类型均不支持; money,inet,bit需用户使用a_inet::varchar类似的语法转换`。 ================================================ FILE: kingbaseesreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 kingbaseesreader kingbaseesreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.kingbase8 kingbase8 8.2.0 system ${basedir}/src/main/libs/kingbase8-8.2.0.jar maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: kingbaseesreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/kingbaseesreader target/ kingbaseesreader-0.0.1-SNAPSHOT.jar plugin/reader/kingbaseesreader src/main/libs *.* plugin/reader/kingbaseesreader/libs false plugin/reader/kingbaseesreader/libs runtime ================================================ FILE: kingbaseesreader/src/main/java/com/alibaba/datax/plugin/reader/kingbaseesreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.kingbaseesreader; public class Constant { public static final int DEFAULT_FETCH_SIZE = 1000; } ================================================ FILE: kingbaseesreader/src/main/java/com/alibaba/datax/plugin/reader/kingbaseesreader/KingbaseesReader.java ================================================ package com.alibaba.datax.plugin.reader.kingbaseesreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class KingbaseesReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES; public static class Job extends Reader.Job { private Configuration originalConfig; private CommonRdbmsReader.Job commonRdbmsReaderMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); int fetchSize = this.originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constant.DEFAULT_FETCH_SIZE); if (fetchSize < 1) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, String.format("您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1.", fetchSize)); } this.originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); this.commonRdbmsReaderMaster.init(this.originalConfig); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderMaster.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderSlave; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderSlave.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderSlave.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig); } } } ================================================ FILE: kingbaseesreader/src/main/resources/plugin.json ================================================ { "name": "kingbaseesreader", "class": "com.alibaba.datax.plugin.reader.kingbaseesreader.KingbaseesReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: kingbaseesreader/src/main/resources/plugin_job_template.json ================================================ { "name": "kingbaseesreader", "parameter": { "username": "", "password": "", "connection": [ { "table": [], "jdbcUrl": [] } ] } } ================================================ FILE: kingbaseeswriter/doc/kingbaseeswriter.md ================================================ # DataX KingbaseesWriter --- ## 1 快速介绍 KingbaseesWriter插件实现了写入数据到 KingbaseES主库目的表的功能。在底层实现上,KingbaseesWriter通过JDBC连接远程 KingbaseES 数据库,并执行相应的 insert into ... sql 语句将数据写入 KingbaseES,内部会分批次提交入库。 KingbaseesWriter面向ETL开发工程师,他们使用KingbaseesWriter从数仓导入数据到KingbaseES。同时 KingbaseesWriter亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 KingbaseesWriter通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL插入语句 * `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行)
注意: 1. 目的表所在数据库必须是主库才能写入数据;整个任务至少需具备 insert into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 2. KingbaseesWriter和MysqlWriter不同,不支持配置writeMode参数。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 KingbaseesWriter导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "kingbaseeswriter", "parameter": { "username": "xx", "password": "xx", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "jdbc:kingbase8://127.0.0.1:3002/datax", "table": [ "test" ] } ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。 注意:1、在一个数据库上只能配置一个值。 2、jdbcUrl按照KingbaseES官方规范,并可以填写连接附加参数信息。具体请参看KingbaseES官方文档或者咨询对应 DBA。 * 必选:是
* 默认值:无
* **username** * 描述:目的数据库的用户名
* 必选:是
* 默认值:无
* **password** * 描述:目的数据库的密码
* 必选:是
* 默认值:无
* **table** * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
* 默认值:无
* **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用\*表示, 例如: "column": ["\*"] 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 * 必选:是
* 默认值:否
* **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
* 必选:否
* 默认值:无
* **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
* 必选:否
* 默认值:无
* **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与KingbaseES的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
* 必选:否
* 默认值:1024
### 3.3 类型转换 目前 KingbaseesWriter支持大部分 KingbaseES类型,但也存在部分没有支持的情况,请注意检查你的类型。 下面列出 KingbaseesWriter针对 KingbaseES类型转换列表: | DataX 内部类型| KingbaseES 数据类型 | | -------- | ----- | | Long |bigint, bigserial, integer, smallint, serial | | Double |double precision, money, numeric, real | | String |varchar, char, text, bit| | Date |date, time, timestamp | | Boolean |bool| | Bytes |bytea| ## FAQ *** **Q: KingbaseesWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 *** **Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。 第二种,向临时表导入数据,完成后再 rename 到线上表。 *** ================================================ FILE: kingbaseeswriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT kingbaseeswriter kingbaseeswriter jar writer data into kingbasees database com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.kingbase8 kingbase8 8.2.0 system ${basedir}/src/main/libs/kingbase8-8.2.0.jar maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: kingbaseeswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/kingbaseeswriter target/ kingbaseeswriter-0.0.1-SNAPSHOT.jar plugin/writer/kingbaseeswriter src/main/libs *.* plugin/writer/kingbaseeswriter/libs false plugin/writer/kingbaseeswriter/libs runtime ================================================ FILE: kingbaseeswriter/src/main/java/com/alibaba/datax/plugin/writer/kingbaseeswriter/KingbaseesWriter.java ================================================ package com.alibaba.datax.plugin.writer.kingbaseeswriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.util.List; public class KingbaseesWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.KingbaseES; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); // warn:not like mysql, KingbaseES only support insert mode, don't use String writeMode = this.originalConfig.getString(Key.WRITE_MODE); if (null != writeMode) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, String.format("写入模式(writeMode)配置有误. 因为KingbaseES不支持配置参数项 writeMode: %s, KingbaseES仅使用insert sql 插入数据. 请检查您的配置并作出修改.", writeMode)); } this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterMaster.init(this.originalConfig); } @Override public void prepare() { this.commonRdbmsWriterMaster.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterMaster.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterSlave; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE){ @Override public String calcValueHolder(String columnType){ if("serial".equalsIgnoreCase(columnType)){ return "?::int"; }else if("bit".equalsIgnoreCase(columnType)){ return "?::bit varying"; } return "?::" + columnType; } }; this.commonRdbmsWriterSlave.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); } public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterSlave.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); } } } ================================================ FILE: kingbaseeswriter/src/main/resources/plugin.json ================================================ { "name": "kingbaseeswriter", "class": "com.alibaba.datax.plugin.writer.kingbaseeswriter.KingbaseesWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: kingbaseeswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "kingbaseeswriter", "parameter": { "username": "", "password": "", "column": [], "preSql": [], "connection": [ { "jdbcUrl": "", "table": [] } ], "preSql": [], "postSql": [] } } ================================================ FILE: kuduwriter/README.md ================================================ # datax-kudu-plugin datax kudu的writer插件 仅在kudu11进行过测试 ================================================ FILE: kuduwriter/doc/kuduwirter.md ================================================ # datax-kudu-plugins datax kudu的writer插件 eg: ```json { "name": "kuduwriter", "parameter": { "kuduConfig": { "kudu.master_addresses": "***", "timeout": 60000, "sessionTimeout": 60000 }, "table": "", "replicaCount": 3, "truncate": false, "writeMode": "upsert", "partition": { "range": { "column1": [ { "lower": "2020-08-25", "upper": "2020-08-26" }, { "lower": "2020-08-26", "upper": "2020-08-27" }, { "lower": "2020-08-27", "upper": "2020-08-28" } ] }, "hash": { "column": [ "column1" ], "number": 3 } }, "column": [ { "index": 0, "name": "c1", "type": "string", "primaryKey": true }, { "index": 1, "name": "c2", "type": "string", "compress": "DEFAULT_COMPRESSION", "encoding": "AUTO_ENCODING", "comment": "注解xxxx" } ], "batchSize": 1024, "bufferSize": 2048, "skipFail": false, "encoding": "UTF-8" } } ``` 必须参数: ```json "writer": { "name": "kuduwriter", "parameter": { "kuduConfig": { "kudu.master_addresses": "***" }, "table": "***", "column": [ { "name": "c1", "type": "string", "primaryKey": true }, { "name": "c2", "type": "string", }, { "name": "c3", "type": "string" }, { "name": "c4", "type": "string" } ] } } ``` 主键列请写到最前面 ![image-20200901193148188](./image-20200901193148188.png) ##### 配置列表 | name | default | description | 是否必须 | | -------------- | ------------------- | ------------------------------------------------------------ | -------- | | kuduConfig | | kudu配置 (kudu.master_addresses等) | 是 | | table | | 导入目标表名 | 是 | | partition | | 分区 | 否 | | column | | 列 | 是 | | name | | 列名 | 是 | | type | string | 列的类型,现支持INT, FLOAT, STRING, BIGINT, DOUBLE, BOOLEAN, LONG。 | 否 | | index | 升序排列 | 列索引位置(要么全部列都写,要么都不写),如reader中取到的某一字段在第二位置(eg: name, id, age)但kudu目标表结构不同(eg:id,name, age),此时就需要将index赋值为(1,0,2),默认顺序(0,1,2) | 否 | | primaryKey | false | 是否为主键(请将所有的主键列写在前面),不表明主键将不会检查过滤脏数据 | 否 | | compress | DEFAULT_COMPRESSION | 压缩格式 | 否 | | encoding | AUTO_ENCODING | 编码 | 否 | | replicaCount | 3 | 保留副本个数 | 否 | | hash | | hash分区 | 否 | | number | 3 | hash分区个数 | 否 | | range | | range分区 | 否 | | lower | | range分区下限 (eg: sql建表:partition value='haha' 对应:“lower”:“haha”,“upper”:“haha\000”) | 否 | | upper | | range分区上限(eg: sql建表:partition "10" <= VALUES < "20" 对应:“lower”:“10”,“upper”:“20”) | 否 | | truncate | false | 是否清空表,本质上是删表重建 | 否 | | writeMode | upsert | upsert,insert,update | 否 | | batchSize | 512 | 每xx行数据flush一次结果(最好不要超过1024) | 否 | | bufferSize | 3072 | 缓冲区大小 | 否 | | skipFail | false | 是否跳过插入不成功的数据 | 否 | | timeout | 60000 | client超时时间,如创建表,删除表操作的超时时间。单位:ms | 否 | | sessionTimeout | 60000 | session超时时间 单位:ms | 否 | ================================================ FILE: kuduwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 kuduwriter com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.apache.kudu kudu-client 1.11.1 junit junit 4.13.1 test com.alibaba.datax datax-core ${datax-project-version} com.alibaba.datax datax-service-face test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: kuduwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/kuduwriter target/ kuduwriter-0.0.1-SNAPSHOT.jar plugin/writer/kuduwriter false plugin/writer/kuduwriter/libs runtime ================================================ FILE: kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/ColumnType.java ================================================ package com.q1.datax.plugin.writer.kudu11xwriter; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; /** * @author daizihao * @create 2020-08-31 19:12 **/ public enum ColumnType { INT("int"), FLOAT("float"), STRING("string"), BIGINT("bigint"), DOUBLE("double"), BOOLEAN("boolean"), LONG("long"); private String mode; ColumnType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static ColumnType getByTypeName(String modeName) { for (ColumnType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, String.format("Kuduwriter does not support the type:%s, currently supported types are:%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Constant.java ================================================ package com.q1.datax.plugin.writer.kudu11xwriter; /** * @author daizihao * @create 2020-08-31 14:42 **/ public class Constant { public static final String DEFAULT_ENCODING = "UTF-8"; // public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String COMPRESSION = "DEFAULT_COMPRESSION"; public static final String ENCODING = "AUTO_ENCODING"; public static final Long ADMIN_TIMEOUTMS = 60000L; public static final Long SESSION_TIMEOUTMS = 60000L; public static final String INSERT_MODE = "upsert"; public static final long DEFAULT_WRITE_BATCH_SIZE = 512L; public static final long DEFAULT_MUTATION_BUFFER_SPACE = 3072L; } ================================================ FILE: kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/InsertModeType.java ================================================ package com.q1.datax.plugin.writer.kudu11xwriter; import com.alibaba.datax.common.exception.DataXException; import java.util.Arrays; /** * @author daizihao * @create 2020-08-31 14:47 **/ public enum InsertModeType { Insert("insert"), Upsert("upsert"), Update("update"); private String mode; InsertModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static InsertModeType getByTypeName(String modeName) { for (InsertModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, String.format("Kuduwriter does not support the mode :[%s], currently supported mode types are :%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Key.java ================================================ package com.q1.datax.plugin.writer.kudu11xwriter; /** * @author daizihao * @create 2020-08-31 14:17 **/ public class Key { public final static String KUDU_CONFIG = "kuduConfig"; public final static String KUDU_MASTER = "kudu.master_addresses"; public final static String KUDU_ADMIN_TIMEOUT = "timeout"; public final static String KUDU_SESSION_TIMEOUT = "sessionTimeout"; public final static String TABLE = "table"; public final static String PARTITION = "partition"; public final static String COLUMN = "column"; public static final String NAME = "name"; public static final String TYPE = "type"; public static final String INDEX = "index"; public static final String PRIMARYKEY = "primaryKey"; public static final String COMPRESSION = "compress"; public static final String COMMENT = "comment"; public final static String ENCODING = "encoding"; public static final String NUM_REPLICAS = "replicaCount"; public static final String HASH = "hash"; public static final String HASH_NUM = "number"; public static final String RANGE = "range"; public static final String LOWER = "lower"; public static final String UPPER = "upper"; public static final String TRUNCATE = "truncate"; public static final String INSERT_MODE = "writeMode"; public static final String WRITE_BATCH_SIZE = "batchSize"; public static final String MUTATION_BUFFER_SPACE = "bufferSize"; public static final String SKIP_FAIL = "skipFail"; } ================================================ FILE: kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xHelper.java ================================================ package com.q1.datax.plugin.writer.kudu11xwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.kudu.ColumnSchema; import org.apache.kudu.Schema; import org.apache.kudu.Type; import org.apache.kudu.client.*; import org.apache.kudu.shaded.org.checkerframework.checker.units.qual.K; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import sun.rmi.runtime.Log; import java.nio.charset.Charset; import java.util.*; import java.util.concurrent.SynchronousQueue; import java.util.concurrent.ThreadFactory; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; /** * @author daizihao * @create 2020-08-27 18:30 **/ public class Kudu11xHelper { private static final Logger LOG = LoggerFactory.getLogger(Kudu11xHelper.class); public static Map getKuduConfiguration(String kuduConfig) { if (StringUtils.isBlank(kuduConfig)) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE, "Connection configuration information required."); } Map kConfiguration; try { kConfiguration = JSON.parseObject(kuduConfig, HashMap.class); Validate.isTrue(kConfiguration != null, "kuduConfig is null!"); kConfiguration.put(Key.KUDU_ADMIN_TIMEOUT, kConfiguration.getOrDefault(Key.KUDU_ADMIN_TIMEOUT, Constant.ADMIN_TIMEOUTMS)); kConfiguration.put(Key.KUDU_SESSION_TIMEOUT, kConfiguration.getOrDefault(Key.KUDU_SESSION_TIMEOUT, Constant.SESSION_TIMEOUTMS)); } catch (Exception e) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e); } return kConfiguration; } public static KuduClient getKuduClient(String kuduConfig) { Map conf = Kudu11xHelper.getKuduConfiguration(kuduConfig); KuduClient kuduClient = null; try { String masterAddress = (String) conf.get(Key.KUDU_MASTER); kuduClient = new KuduClient.KuduClientBuilder(masterAddress) .defaultAdminOperationTimeoutMs((Long) conf.get(Key.KUDU_ADMIN_TIMEOUT)) .defaultOperationTimeoutMs((Long) conf.get(Key.KUDU_SESSION_TIMEOUT)) .build(); } catch (Exception e) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e); } return kuduClient; } public static KuduTable getKuduTable(Configuration configuration, KuduClient kuduClient) { String tableName = configuration.getString(Key.TABLE); KuduTable table = null; try { if (kuduClient.tableExists(tableName)) { table = kuduClient.openTable(tableName); } else { synchronized (Kudu11xHelper.class) { if (!kuduClient.tableExists(tableName)) { Schema schema = Kudu11xHelper.getSchema(configuration); CreateTableOptions tableOptions = new CreateTableOptions(); Kudu11xHelper.setTablePartition(configuration, tableOptions, schema); //副本数 Integer numReplicas = configuration.getInt(Key.NUM_REPLICAS, 3); tableOptions.setNumReplicas(numReplicas); table = kuduClient.createTable(tableName, schema, tableOptions); } else { table = kuduClient.openTable(tableName); } } } } catch (Exception e) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_TABLE_ERROR, e); } return table; } public static void createTable(Configuration configuration) { String tableName = configuration.getString(Key.TABLE); String kuduConfig = configuration.getString(Key.KUDU_CONFIG); KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig); try { Schema schema = Kudu11xHelper.getSchema(configuration); CreateTableOptions tableOptions = new CreateTableOptions(); Kudu11xHelper.setTablePartition(configuration, tableOptions, schema); //副本数 Integer numReplicas = configuration.getInt(Key.NUM_REPLICAS, 3); tableOptions.setNumReplicas(numReplicas); kuduClient.createTable(tableName, schema, tableOptions); } catch (Exception e) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.GREATE_KUDU_TABLE_ERROR, e); } finally { AtomicInteger i = new AtomicInteger(10); while (i.get() > 0) { try { if (kuduClient.isCreateTableDone(tableName)) { Kudu11xHelper.closeClient(kuduClient); LOG.info("Table " + tableName + " is created!"); break; } i.decrementAndGet(); LOG.error("timeout!"); } catch (KuduException e) { LOG.info("Wait for the table to be created..... " + i); try { Thread.sleep(100L); } catch (InterruptedException ex) { ex.printStackTrace(); } i.decrementAndGet(); } } try { if (kuduClient != null) { kuduClient.close(); } } catch (KuduException e) { LOG.info("Kudu client has been shut down!"); } } } public static ThreadPoolExecutor createRowAddThreadPool(int coreSize) { return new ThreadPoolExecutor(coreSize, coreSize, 60L, TimeUnit.SECONDS, new SynchronousQueue(), new ThreadFactory() { private final ThreadGroup group = System.getSecurityManager() == null ? Thread.currentThread().getThreadGroup() : System.getSecurityManager().getThreadGroup(); private final AtomicInteger threadNumber = new AtomicInteger(1); @Override public Thread newThread(Runnable r) { Thread t = new Thread(group, r, "pool-kudu_rows_add-thread-" + threadNumber.getAndIncrement(), 0); if (t.isDaemon()) t.setDaemon(false); if (t.getPriority() != Thread.NORM_PRIORITY) t.setPriority(Thread.NORM_PRIORITY); return t; } }, new ThreadPoolExecutor.CallerRunsPolicy()); } public static List> getColumnLists(List columns) { int quota = 8; int num = (columns.size() - 1) / quota + 1; int gap = columns.size() / num; List> columnLists = new ArrayList<>(num); for (int j = 0; j < num - 1; j++) { List destList = new ArrayList<>(columns.subList(j * gap, (j + 1) * gap)); columnLists.add(destList); } List destList = new ArrayList<>(columns.subList(gap * (num - 1), columns.size())); columnLists.add(destList); return columnLists; } public static boolean isTableExists(Configuration configuration) { String tableName = configuration.getString(Key.TABLE); String kuduConfig = configuration.getString(Key.KUDU_CONFIG); KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig); try { return kuduClient.tableExists(tableName); } catch (Exception e) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.GET_KUDU_CONNECTION_ERROR, e); } finally { Kudu11xHelper.closeClient(kuduClient); } } public static void closeClient(KuduClient kuduClient) { try { if (kuduClient != null) { kuduClient.close(); } } catch (KuduException e) { LOG.warn("The \"kudu client\" was not stopped gracefully. !"); } } public static Schema getSchema(Configuration configuration) { List columns = configuration.getListConfiguration(Key.COLUMN); List columnSchemas = new ArrayList<>(); Schema schema = null; if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE, "column is not defined,eg:column:[{\"name\": \"cf0:column0\",\"type\": \"string\"},{\"name\": \"cf1:column1\",\"type\": \"long\"}]"); } try { for (Configuration column : columns) { String type = "BIGINT".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) || "LONG".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) ? "INT64" : "INT".equals(column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase()) ? "INT32" : column.getNecessaryValue(Key.TYPE, Kudu11xWriterErrorcode.REQUIRED_VALUE).toUpperCase(); String name = column.getNecessaryValue(Key.NAME, Kudu11xWriterErrorcode.REQUIRED_VALUE); Boolean key = column.getBool(Key.PRIMARYKEY, false); String encoding = column.getString(Key.ENCODING, Constant.ENCODING).toUpperCase(); String compression = column.getString(Key.COMPRESSION, Constant.COMPRESSION).toUpperCase(); String comment = column.getString(Key.COMMENT, ""); columnSchemas.add(new ColumnSchema.ColumnSchemaBuilder(name, Type.getTypeForName(type)) .key(key) .encoding(ColumnSchema.Encoding.valueOf(encoding)) .compressionAlgorithm(ColumnSchema.CompressionAlgorithm.valueOf(compression)) .comment(comment) .build()); } schema = new Schema(columnSchemas); } catch (Exception e) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.REQUIRED_VALUE, e); } return schema; } public static Integer getPrimaryKeyIndexUntil(List columns) { int i = 0; while (i < columns.size()) { Configuration col = columns.get(i); if (!col.getBool(Key.PRIMARYKEY, false)) { break; } i++; } return i; } public static void setTablePartition(Configuration configuration, CreateTableOptions tableOptions, Schema schema) { Configuration partition = configuration.getConfiguration(Key.PARTITION); if (partition == null) { ColumnSchema columnSchema = schema.getColumns().get(0); tableOptions.addHashPartitions(Collections.singletonList(columnSchema.getName()), 3); return; } //range分区 Configuration range = partition.getConfiguration(Key.RANGE); if (range != null) { List rangeColums = new ArrayList<>(range.getKeys()); tableOptions.setRangePartitionColumns(rangeColums); for (String rangeColum : rangeColums) { List lowerAndUppers = range.getListConfiguration(rangeColum); for (Configuration lowerAndUpper : lowerAndUppers) { PartialRow lower = schema.newPartialRow(); lower.addString(rangeColum, lowerAndUpper.getNecessaryValue(Key.LOWER, Kudu11xWriterErrorcode.REQUIRED_VALUE)); PartialRow upper = schema.newPartialRow(); upper.addString(rangeColum, lowerAndUpper.getNecessaryValue(Key.UPPER, Kudu11xWriterErrorcode.REQUIRED_VALUE)); tableOptions.addRangePartition(lower, upper); } } LOG.info("Set range partition complete!"); } // 设置Hash分区 Configuration hash = partition.getConfiguration(Key.HASH); if (hash != null) { List hashColums = hash.getList(Key.COLUMN, String.class); Integer hashPartitionNum = configuration.getInt(Key.HASH_NUM, 3); tableOptions.addHashPartitions(hashColums, hashPartitionNum); LOG.info("Set hash partition complete!"); } } public static void validateParameter(Configuration configuration) { LOG.info("Start validating parameters!"); configuration.getNecessaryValue(Key.KUDU_CONFIG, Kudu11xWriterErrorcode.REQUIRED_VALUE); configuration.getNecessaryValue(Key.TABLE, Kudu11xWriterErrorcode.REQUIRED_VALUE); String encoding = configuration.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); if (!Charset.isSupported(encoding)) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, String.format("Encoding is not supported:[%s] .", encoding)); } configuration.set(Key.ENCODING, encoding); String insertMode = configuration.getString(Key.INSERT_MODE, Constant.INSERT_MODE); try { InsertModeType.getByTypeName(insertMode); } catch (Exception e) { insertMode = Constant.INSERT_MODE; } configuration.set(Key.INSERT_MODE, insertMode); Long writeBufferSize = configuration.getLong(Key.WRITE_BATCH_SIZE, Constant.DEFAULT_WRITE_BATCH_SIZE); configuration.set(Key.WRITE_BATCH_SIZE, writeBufferSize); Long mutationBufferSpace = configuration.getLong(Key.MUTATION_BUFFER_SPACE, Constant.DEFAULT_MUTATION_BUFFER_SPACE); configuration.set(Key.MUTATION_BUFFER_SPACE, mutationBufferSpace); Boolean isSkipFail = configuration.getBool(Key.SKIP_FAIL, false); configuration.set(Key.SKIP_FAIL, isSkipFail); List columns = configuration.getListConfiguration(Key.COLUMN); List goalColumns = new ArrayList<>(); //column参数验证 int indexFlag = 0; boolean primaryKey = true; int primaryKeyFlag = 0; for (int i = 0; i < columns.size(); i++) { Configuration col = columns.get(i); String index = col.getString(Key.INDEX); if (index == null) { index = String.valueOf(i); col.set(Key.INDEX, index); indexFlag++; } if(primaryKey != col.getBool(Key.PRIMARYKEY, false)){ primaryKey = col.getBool(Key.PRIMARYKEY, false); primaryKeyFlag++; } goalColumns.add(col); } if (indexFlag != 0 && indexFlag != columns.size()) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, "\"index\" either has values for all of them, or all of them are null!"); } if (primaryKeyFlag > 1){ throw DataXException.asDataXException(Kudu11xWriterErrorcode.ILLEGAL_VALUE, "\"primaryKey\" must be written in the front!"); } configuration.set(Key.COLUMN, goalColumns); // LOG.info("------------------------------------"); // LOG.info(configuration.toString()); // LOG.info("------------------------------------"); LOG.info("validate parameter complete!"); } public static void truncateTable(Configuration configuration) { String kuduConfig = configuration.getString(Key.KUDU_CONFIG); String userTable = configuration.getString(Key.TABLE); LOG.info(String.format("Because you have configured truncate is true,KuduWriter begins to truncate table %s .", userTable)); KuduClient kuduClient = Kudu11xHelper.getKuduClient(kuduConfig); try { if (kuduClient.tableExists(userTable)) { kuduClient.deleteTable(userTable); LOG.info(String.format("table %s has been deleted.", userTable)); } } catch (KuduException e) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.DELETE_KUDU_ERROR, e); } finally { Kudu11xHelper.closeClient(kuduClient); } } } ================================================ FILE: kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xWriter.java ================================================ package com.q1.datax.plugin.writer.kudu11xwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; /** * @author daizihao * @create 2020-08-27 16:58 **/ public class Kudu11xWriter extends Writer { public static class Job extends Writer.Job{ private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration config = null; @Override public void init() { this.config = this.getPluginJobConf(); Kudu11xHelper.validateParameter(this.config); } @Override public void prepare() { Boolean truncate = config.getBool(Key.TRUNCATE,false); if(truncate){ Kudu11xHelper.truncateTable(this.config); } if (!Kudu11xHelper.isTableExists(config)){ Kudu11xHelper.createTable(config); } } @Override public List split(int i) { List splitResultConfigs = new ArrayList<>(); for (int j = 0; j < i; j++) { splitResultConfigs.add(config.clone()); } return splitResultConfigs; } @Override public void destroy() { } } public static class Task extends Writer.Task{ private Configuration taskConfig; private KuduWriterTask kuduTaskProxy; private static final Logger LOG = LoggerFactory.getLogger(Job.class); @Override public void init() { this.taskConfig = super.getPluginJobConf(); this.kuduTaskProxy = new KuduWriterTask(this.taskConfig); } @Override public void startWrite(RecordReceiver lineReceiver) { this.kuduTaskProxy.startWriter(lineReceiver,super.getTaskPluginCollector()); } @Override public void destroy() { try { if (kuduTaskProxy.session != null) { kuduTaskProxy.session.close(); } }catch (Exception e){ LOG.warn("The \"kudu session\" was not stopped gracefully !"); } Kudu11xHelper.closeClient(kuduTaskProxy.kuduClient); } } } ================================================ FILE: kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/Kudu11xWriterErrorcode.java ================================================ package com.q1.datax.plugin.writer.kudu11xwriter; import com.alibaba.datax.common.spi.ErrorCode; /** * @author daizihao * @create 2020-08-27 19:25 **/ public enum Kudu11xWriterErrorcode implements ErrorCode { REQUIRED_VALUE("Kuduwriter-00", "You are missing a required parameter value."), ILLEGAL_VALUE("Kuduwriter-01", "You fill in the parameter values are not legitimate."), GET_KUDU_CONNECTION_ERROR("Kuduwriter-02", "Error getting Kudu connection."), GET_KUDU_TABLE_ERROR("Kuduwriter-03", "Error getting Kudu table."), CLOSE_KUDU_CONNECTION_ERROR("Kuduwriter-04", "Error closing Kudu connection."), CLOSE_KUDU_SESSION_ERROR("Kuduwriter-06", "Error closing Kudu table connection."), PUT_KUDU_ERROR("Kuduwriter-07", "IO exception occurred when writing to Kudu."), DELETE_KUDU_ERROR("Kuduwriter-08", "An exception occurred while delete Kudu table."), GREATE_KUDU_TABLE_ERROR("Kuduwriter-09", "Error creating Kudu table."), PARAMETER_NUM_ERROR("Kuduwriter-10","The number of parameters does not match.") ; private final String code; private final String description; Kudu11xWriterErrorcode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return code; } @Override public String getDescription() { return description; } } ================================================ FILE: kuduwriter/src/main/java/com/q1/datax/plugin/writer/kudu11xwriter/KuduWriterTask.java ================================================ package com.q1.datax.plugin.writer.kudu11xwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import org.apache.commons.lang3.StringUtils; import org.apache.kudu.client.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.concurrent.*; import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.atomic.LongAdder; /** * @author daizihao * @create 2020-08-31 16:55 **/ public class KuduWriterTask { private final static Logger LOG = LoggerFactory.getLogger(KuduWriterTask.class); private List columns; private List> columnLists; private ThreadPoolExecutor pool; private String encoding; private Double batchSize; private Boolean isUpsert; private Boolean isSkipFail; public KuduClient kuduClient; public KuduSession session; private KuduTable table; private Integer primaryKeyIndexUntil; private final Object lock = new Object(); public KuduWriterTask(Configuration configuration) { columns = configuration.getListConfiguration(Key.COLUMN); columnLists = Kudu11xHelper.getColumnLists(columns); pool = Kudu11xHelper.createRowAddThreadPool(columnLists.size()); this.encoding = configuration.getString(Key.ENCODING); this.batchSize = configuration.getDouble(Key.WRITE_BATCH_SIZE); this.isUpsert = !configuration.getString(Key.INSERT_MODE).equalsIgnoreCase("insert"); this.isSkipFail = configuration.getBool(Key.SKIP_FAIL); long mutationBufferSpace = configuration.getLong(Key.MUTATION_BUFFER_SPACE); this.kuduClient = Kudu11xHelper.getKuduClient(configuration.getString(Key.KUDU_CONFIG)); this.table = Kudu11xHelper.getKuduTable(configuration, kuduClient); this.session = kuduClient.newSession(); session.setFlushMode(SessionConfiguration.FlushMode.MANUAL_FLUSH); session.setMutationBufferSpace((int) mutationBufferSpace); this.primaryKeyIndexUntil = Kudu11xHelper.getPrimaryKeyIndexUntil(columns); // tableName = configuration.getString(Key.TABLE); } public void startWriter(RecordReceiver lineReceiver, TaskPluginCollector taskPluginCollector) { LOG.info("kuduwriter began to write!"); Record record; LongAdder counter = new LongAdder(); try { while ((record = lineReceiver.getFromReader()) != null) { if (record.getColumnNumber() != columns.size()) { throw DataXException.asDataXException(Kudu11xWriterErrorcode.PARAMETER_NUM_ERROR, " number of record fields:" + record.getColumnNumber() + " number of configuration fields:" + columns.size()); } boolean isDirtyRecord = false; for (int i = 0; i < primaryKeyIndexUntil && !isDirtyRecord; i++) { Column column = record.getColumn(i); isDirtyRecord = StringUtils.isBlank(column.asString()); } if (isDirtyRecord) { taskPluginCollector.collectDirtyRecord(record, "primarykey field is null"); continue; } CountDownLatch countDownLatch = new CountDownLatch(columnLists.size()); Upsert upsert = table.newUpsert(); Insert insert = table.newInsert(); PartialRow row; if (isUpsert) { //覆盖更新 row = upsert.getRow(); } else { //增量更新 row = insert.getRow(); } List> futures = new ArrayList<>(); for (List columnList : columnLists) { Record finalRecord = record; Future future = pool.submit(() -> { try { for (Configuration col : columnList) { String name = col.getString(Key.NAME); ColumnType type = ColumnType.getByTypeName(col.getString(Key.TYPE, "string")); Column column = finalRecord.getColumn(col.getInt(Key.INDEX)); String rawData = column.asString(); if (rawData == null) { synchronized (lock) { row.setNull(name); } continue; } switch (type) { case INT: synchronized (lock) { row.addInt(name, Integer.parseInt(rawData)); } break; case LONG: case BIGINT: synchronized (lock) { row.addLong(name, Long.parseLong(rawData)); } break; case FLOAT: synchronized (lock) { row.addFloat(name, Float.parseFloat(rawData)); } break; case DOUBLE: synchronized (lock) { row.addDouble(name, Double.parseDouble(rawData)); } break; case BOOLEAN: synchronized (lock) { row.addBoolean(name, Boolean.parseBoolean(rawData)); } break; case STRING: default: synchronized (lock) { row.addString(name, rawData); } } } } finally { countDownLatch.countDown(); } }); futures.add(future); } countDownLatch.await(); for (Future future : futures) { future.get(); } try { RetryUtil.executeWithRetry(() -> { if (isUpsert) { //覆盖更新 session.apply(upsert); } else { //增量更新 session.apply(insert); } //flush if (counter.longValue() > (batchSize * 0.8)) { session.flush(); counter.reset(); } counter.increment(); return true; }, 5, 500L, true); } catch (Exception e) { LOG.error("Record Write Failure!", e); if (isSkipFail) { LOG.warn("Since you have configured \"skipFail\" to be true, this record will be skipped !"); taskPluginCollector.collectDirtyRecord(record, e.getMessage()); } else { throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage()); } } } } catch (Exception e) { LOG.error("write failure! the task will exit!"); throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage()); } AtomicInteger i = new AtomicInteger(10); try { while (i.get() > 0) { if (session.hasPendingOperations()) { session.flush(); break; } Thread.sleep(20L); i.decrementAndGet(); } } catch (Exception e) { LOG.info("Waiting for data to be written to kudu...... " + i + "s"); } finally { try { pool.shutdown(); //强制刷写 session.flush(); } catch (KuduException e) { LOG.error("kuduwriter flush error! The results may be incomplete!"); throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e.getMessage()); } } } } ================================================ FILE: kuduwriter/src/main/java/com/q1/kudu/conf/KuduConfig.java ================================================ package com.q1.kudu.conf; /** * @author daizihao * @create 2020-09-16 11:39 **/ public class KuduConfig { } ================================================ FILE: kuduwriter/src/main/resources/plugin.json ================================================ { "name": "kuduwriter", "class": "com.q1.datax.plugin.writer.kudu11xwriter.Kudu11xWriter", "description": "use put: prod. mechanism: use kudu java api put data.", "developer": "com.q1.daizihao" } ================================================ FILE: kuduwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "kuduwriter", "parameter": { "kuduConfig": { "kudu.master_addresses": "***", "timeout": 60000, "sessionTimeout": 60000 }, "table": "", "replicaCount": 3, "truncate": false, "writeMode": "upsert", "partition": { "range": { "column1": [ { "lower": "2020-08-25", "upper": "2020-08-26" }, { "lower": "2020-08-26", "upper": "2020-08-27" }, { "lower": "2020-08-27", "upper": "2020-08-28" } ] }, "hash": { "column": [ "column1" ], "number": 3 } }, "column": [ { "index": 0, "name": "c1", "type": "string", "primaryKey": true }, { "index": 1, "name": "c2", "type": "string", "compress": "DEFAULT_COMPRESSION", "encoding": "AUTO_ENCODING", "comment": "注解xxxx" } ], "batchSize": 1024, "bufferSize": 2048, "skipFail": false, "encoding": "UTF-8" } } ================================================ FILE: kuduwriter/src/test/java/com/dai/test.java ================================================ package com.dai; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.RetryUtil; import com.q1.datax.plugin.writer.kudu11xwriter.*; import static org.apache.kudu.client.AsyncKuduClient.LOG; /** * @author daizihao * @create 2020-08-28 11:03 **/ public class test { static boolean isSkipFail; public static void main(String[] args) { try { while (true) { try { RetryUtil.executeWithRetry(()->{ throw new RuntimeException(); },5,1000L,true); } catch (Exception e) { LOG.error("Data write failed!", e); System.out.println(isSkipFail); if (isSkipFail) { LOG.warn("Because you have configured skipFail is true,this data will be skipped!"); }else { System.out.println("异常抛出"); throw e; } } } } catch (Exception e) { LOG.error("write failed! the task will exit!"); throw DataXException.asDataXException(Kudu11xWriterErrorcode.PUT_KUDU_ERROR, e); } } } ================================================ FILE: license.txt ================================================ Copyright 1999-2022 Alibaba Group Holding Ltd. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================ FILE: loghubreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 loghubreader 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.aliyun.openservices aliyun-log 0.6.22 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: loghubreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin/reader/loghubreader target/ loghubreader-0.0.1-SNAPSHOT.jar plugin/reader/loghubreader false plugin/reader/loghubreader/libs runtime ================================================ FILE: loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.loghubreader; public class Constant { public static String DATETIME_FORMAT = "yyyyMMddHHmmss"; public static String DATE_FORMAT = "yyyyMMdd"; static String META_COL_SOURCE = "__source__"; static String META_COL_TOPIC = "__topic__"; static String META_COL_CATEGORY = "__category__"; static String META_COL_MACHINEUUID = "__machineUUID__"; static String META_COL_HOSTNAME = "__hostname__"; static String META_COL_PATH = "__path__"; static String META_COL_LOGTIME = "__logtime__"; public static String META_COL_RECEIVE_TIME = "__receive_time__"; /** * 除用户手动配置的列之外,其余数据列作为一个 json 读取到一列 */ static String COL_EXTRACT_OTHERS = "C__extract_others__"; /** * 将所有元数据列作为一个 json 读取到一列 */ static String COL_EXTRACT_ALL_META = "C__extract_all_meta__"; } ================================================ FILE: loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.loghubreader; public final class Key { /** * 此处声明插件用到的需要插件使用者提供的配置项 */ public static final String ENDPOINT = "endpoint"; public static final String ACCESSKEYID = "accessId"; public static final String ACCESSKEYSECRET = "accessKey"; public static final String PROJECT = "project"; public static final String LOGSTORE = "logstore"; public static final String TOPIC = "topic"; public static final String COLUMN = "column"; public static final String BATCHSIZE = "batchSize"; public static final String BEGINTIMESTAMPMILLIS = "beginTimestampMillis"; public static final String ENDTIMESTAMPMILLIS = "endTimestampMillis"; public static final String BEGINDATETIME = "beginDateTime"; public static final String ENDDATETIME = "endDateTime"; public static final String TIMEFORMAT = "timeformat"; public static final String SOURCE = "source"; public static final String SHARD = "shard"; } ================================================ FILE: loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/LogHubReader.java ================================================ package com.alibaba.datax.plugin.reader.loghubreader; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.DataXCaseEnvUtil; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.fastjson2.JSONObject; import com.aliyun.openservices.log.Client; import com.aliyun.openservices.log.common.Consts.CursorMode; import com.aliyun.openservices.log.common.*; import com.aliyun.openservices.log.exception.LogException; import com.aliyun.openservices.log.response.BatchGetLogResponse; import com.aliyun.openservices.log.response.GetCursorResponse; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.*; import java.util.concurrent.Callable; public class LogHubReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Client client; private Configuration originalConfig; private Long beginTimestampMillis; private Long endTimestampMillis; @Override public void init() { LOG.info("loghub reader job init begin ..."); this.originalConfig = super.getPluginJobConf(); validateParameter(originalConfig); String endPoint = this.originalConfig.getString(Key.ENDPOINT); String accessKeyId = this.originalConfig.getString(Key.ACCESSKEYID); String accessKeySecret = this.originalConfig.getString(Key.ACCESSKEYSECRET); client = new Client(endPoint, accessKeyId, accessKeySecret); LOG.info("loghub reader job init end."); } private void validateParameter(Configuration conf){ conf.getNecessaryValue(Key.ENDPOINT,LogHubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.ACCESSKEYID,LogHubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.ACCESSKEYSECRET,LogHubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.PROJECT,LogHubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.LOGSTORE,LogHubReaderErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.COLUMN,LogHubReaderErrorCode.REQUIRE_VALUE); int batchSize = this.originalConfig.getInt(Key.BATCHSIZE); if (batchSize > 1000) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid batchSize[" + batchSize + "] value (0,1000]!"); } beginTimestampMillis = this.originalConfig.getLong(Key.BEGINTIMESTAMPMILLIS); String beginDateTime = this.originalConfig.getString(Key.BEGINDATETIME); if (beginDateTime != null) { try { beginTimestampMillis = getUnixTimeFromDateTime(beginDateTime); } catch (ParseException e) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid beginDateTime[" + beginDateTime + "], format [yyyyMMddHHmmss or yyyyMMdd]!"); } } if (beginTimestampMillis != null && beginTimestampMillis <= 0) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid beginTimestampMillis[" + beginTimestampMillis + "]!"); } endTimestampMillis = this.originalConfig.getLong(Key.ENDTIMESTAMPMILLIS); String endDateTime = this.originalConfig.getString(Key.ENDDATETIME); if (endDateTime != null) { try { endTimestampMillis = getUnixTimeFromDateTime(endDateTime); } catch (ParseException e) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid beginDateTime[" + endDateTime + "], format [yyyyMMddHHmmss or yyyyMMdd]!"); } } if (endTimestampMillis != null && endTimestampMillis <= 0) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid endTimestampMillis[" + endTimestampMillis + "]!"); } if (beginTimestampMillis != null && endTimestampMillis != null && endTimestampMillis <= beginTimestampMillis) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "endTimestampMillis[" + endTimestampMillis + "] must bigger than beginTimestampMillis[" + beginTimestampMillis + "]!"); } } private long getUnixTimeFromDateTime(String dateTime) throws ParseException { try { String format = Constant.DATETIME_FORMAT; SimpleDateFormat simpleDateFormat = new SimpleDateFormat(format); return simpleDateFormat.parse(dateTime).getTime() / 1000; } catch (ParseException ignored) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "Invalid DateTime[" + dateTime + "]!"); } } @Override public void prepare() { } @Override public List split(int adviceNumber) { LOG.info("split() begin..."); List readerSplitConfigs = new ArrayList(); final String project = this.originalConfig.getString(Key.PROJECT); final String logstore = this.originalConfig.getString(Key.LOGSTORE); List logStore = null; try { logStore = RetryUtil.executeWithRetry(new Callable>() { @Override public List call() throws Exception { return client.ListShard(project, logstore).GetShards(); } }, DataXCaseEnvUtil.getRetryTimes(7), DataXCaseEnvUtil.getRetryInterval(1000L), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "get LogStore[" + logstore + "] error, please check ! detail error messsage: " + e.toString()); } if (logStore == null) { throw DataXException.asDataXException(LogHubReaderErrorCode.BAD_CONFIG_VALUE, "LogStore[" + logstore + "] isn't exists, please check !"); } int splitNumber = logStore.size(); if (0 == splitNumber) { throw DataXException.asDataXException(LogHubReaderErrorCode.EMPTY_LOGSTORE_VALUE, "LogStore[" + logstore + "] has 0 shard, please check !"); } Collections.shuffle(logStore); for (int i = 0; i < logStore.size(); i++) { if (beginTimestampMillis != null && endTimestampMillis != null) { try { String beginCursor = getCursorWithRetry(client, project, logstore, logStore.get(i).GetShardId(), beginTimestampMillis).GetCursor(); String endCursor = getCursorWithRetry(client, project, logstore, logStore.get(i).GetShardId(), endTimestampMillis).GetCursor(); if (beginCursor.equals(endCursor)) { if ((i == logStore.size() - 1) && (readerSplitConfigs.size() == 0)) { } else { LOG.info("skip empty shard[" + logStore.get(i) + "]!"); continue; } } } catch (Exception e) { LOG.error("Check Shard[" + logStore.get(i) + "] Error, please check !" + e.toString()); throw DataXException.asDataXException(LogHubReaderErrorCode.LOG_HUB_ERROR, e); } } Configuration splitedConfig = this.originalConfig.clone(); splitedConfig.set(Key.SHARD, logStore.get(i).GetShardId()); readerSplitConfigs.add(splitedConfig); } if (splitNumber < adviceNumber) { // LOG.info(MESSAGE_SOURCE.message("hdfsreader.12", // splitNumber, adviceNumber, splitNumber, splitNumber)); } LOG.info("split() ok and end..."); return readerSplitConfigs; } @Override public void post() { } @Override public void destroy() { } private GetCursorResponse getCursorWithRetry(final Client client, final String project, final String logstore, final int shard, final long fromTime) throws Exception { return RetryUtil.executeWithRetry(new Callable() { @Override public GetCursorResponse call() throws Exception { LOG.info("loghug get cursor with project: {} logstore: {} shard: {} time: {}", project, logstore, shard, fromTime); return client.GetCursor(project, logstore, shard, fromTime); } }, 7, 1000L, true); } } public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration taskConfig; private Client client; private String endPoint; private String accessKeyId; private String accessKeySecret; private String project; private String logstore; private long beginTimestampMillis; private long endTimestampMillis; private int batchSize; private int shard; private List columns; @Override public void init() { this.taskConfig = super.getPluginJobConf(); endPoint = this.taskConfig.getString(Key.ENDPOINT); accessKeyId = this.taskConfig.getString(Key.ACCESSKEYID); accessKeySecret = this.taskConfig.getString(Key.ACCESSKEYSECRET); project = this.taskConfig.getString(Key.PROJECT); logstore = this.taskConfig.getString(Key.LOGSTORE); batchSize = this.taskConfig.getInt(Key.BATCHSIZE, 128); this.beginTimestampMillis = this.taskConfig.getLong(Key.BEGINTIMESTAMPMILLIS, -1); String beginDateTime = this.taskConfig.getString(Key.BEGINDATETIME); if (beginDateTime != null) { try { beginTimestampMillis = getUnixTimeFromDateTime(beginDateTime); } catch (ParseException e) { } } this.endTimestampMillis = this.taskConfig.getLong(Key.ENDTIMESTAMPMILLIS, -1); String endDateTime = this.taskConfig.getString(Key.ENDDATETIME); if (endDateTime != null) { try { endTimestampMillis = getUnixTimeFromDateTime(endDateTime); } catch (ParseException e) { } } columns = this.taskConfig.getList(Key.COLUMN, String.class); shard = this.taskConfig.getInt(Key.SHARD); client = new Client(endPoint, accessKeyId, accessKeySecret); LOG.info("init loghub reader task finished.project:{} logstore:{} batchSize:{}", project, logstore, batchSize); } @Override public void prepare() { } private long getUnixTimeFromDateTime(String dateTime) throws ParseException { try { String format = Constant.DATETIME_FORMAT; SimpleDateFormat simpleDateFormat = new SimpleDateFormat(format); return simpleDateFormat.parse(dateTime).getTime() / 1000; } catch (ParseException ignored) { } String format = Constant.DATE_FORMAT; SimpleDateFormat simpleDateFormat = new SimpleDateFormat(format); return simpleDateFormat.parse(dateTime).getTime() / 1000; } private GetCursorResponse getCursorWithRetry(final Client client, final String project, final String logstore, final int shard, final long fromTime) throws Exception { return RetryUtil.executeWithRetry(new Callable() { @Override public GetCursorResponse call() throws Exception { LOG.info("loghug get cursor with project: {} logstore: {} shard: {} time: {}", project, logstore, shard, fromTime); return client.GetCursor(project, logstore, shard, fromTime); } }, 7, 1000L, true); } private GetCursorResponse getCursorWithRetry(final Client client, final String project, final String logstore, final int shard, final CursorMode mode) throws Exception { return RetryUtil.executeWithRetry(new Callable() { @Override public GetCursorResponse call() throws Exception { LOG.info("loghug get cursor with project: {} logstore: {} shard: {} mode: {}", project, logstore, shard, mode); return client.GetCursor(project, logstore, shard, mode); } }, 7, 1000L, true); } private BatchGetLogResponse batchGetLogWithRetry(final Client client, final String project, final String logstore, final int shard, final int batchSize, final String curCursor, final String endCursor) throws Exception { return RetryUtil.executeWithRetry(new Callable() { @Override public BatchGetLogResponse call() throws Exception { return client.BatchGetLog(project, logstore, shard, batchSize, curCursor, endCursor); } }, 7, 1000L, true); } @Override public void startRead(RecordSender recordSender) { LOG.info("read start"); try { GetCursorResponse cursorRes; if (this.beginTimestampMillis != -1) { cursorRes = getCursorWithRetry(client, project, logstore, this.shard, beginTimestampMillis); } else { cursorRes = getCursorWithRetry(client, project, logstore, this.shard, CursorMode.BEGIN); } String beginCursor = cursorRes.GetCursor(); LOG.info("the begin cursor, loghub requestId: {} cursor: {}", cursorRes.GetRequestId(), cursorRes.GetCursor()); if (this.endTimestampMillis != -1) { cursorRes = getCursorWithRetry(client, project, logstore, this.shard, endTimestampMillis); } else { cursorRes = getCursorWithRetry(client, project, logstore, this.shard, CursorMode.END); } String endCursor = cursorRes.GetCursor(); LOG.info("the end cursor, loghub requestId: {} cursor: {}", cursorRes.GetRequestId(), cursorRes.GetCursor()); if (StringUtils.equals(beginCursor, endCursor)) { LOG.info("beginCursor:{} equals endCursor:{}, end directly!", beginCursor, endCursor); return; } String currentCursor = null; String nextCursor = beginCursor; HashMap metaMap = new HashMap(); HashMap dataMap = new HashMap(); JSONObject allMetaJson = new JSONObject(); while (!StringUtils.equals(currentCursor, nextCursor)) { currentCursor = nextCursor; BatchGetLogResponse logDataRes = batchGetLogWithRetry(client, project, logstore, this.shard, this.batchSize, currentCursor, endCursor); List logGroups = logDataRes.GetLogGroups(); for(LogGroupData logGroup: logGroups) { metaMap.clear(); allMetaJson.clear(); FastLogGroup flg = logGroup.GetFastLogGroup(); metaMap.put("C_Category", flg.getCategory()); metaMap.put(Constant.META_COL_CATEGORY, flg.getCategory()); allMetaJson.put(Constant.META_COL_CATEGORY, flg.getCategory()); metaMap.put("C_Source", flg.getSource()); metaMap.put(Constant.META_COL_SOURCE, flg.getSource()); allMetaJson.put(Constant.META_COL_SOURCE, flg.getSource()); metaMap.put("C_Topic", flg.getTopic()); metaMap.put(Constant.META_COL_TOPIC, flg.getTopic()); allMetaJson.put(Constant.META_COL_TOPIC, flg.getTopic()); metaMap.put("C_MachineUUID", flg.getMachineUUID()); metaMap.put(Constant.META_COL_MACHINEUUID, flg.getMachineUUID()); allMetaJson.put(Constant.META_COL_MACHINEUUID, flg.getMachineUUID()); for (int tagIdx = 0; tagIdx < flg.getLogTagsCount(); ++tagIdx) { FastLogTag logtag = flg.getLogTags(tagIdx); String tagKey = logtag.getKey(); String tagValue = logtag.getValue(); if (tagKey.equals(Constant.META_COL_HOSTNAME)) { metaMap.put("C_HostName", logtag.getValue()); } else if (tagKey.equals(Constant.META_COL_PATH)) { metaMap.put("C_Path", logtag.getValue()); } metaMap.put(tagKey, tagValue); allMetaJson.put(tagKey, tagValue); } for (int lIdx = 0; lIdx < flg.getLogsCount(); ++lIdx) { dataMap.clear(); FastLog log = flg.getLogs(lIdx); String logTime = String.valueOf(log.getTime()); metaMap.put("C_LogTime", logTime); metaMap.put(Constant.META_COL_LOGTIME, logTime); allMetaJson.put(Constant.META_COL_LOGTIME, logTime); for (int cIdx = 0; cIdx < log.getContentsCount(); ++cIdx) { FastLogContent content = log.getContents(cIdx); dataMap.put(content.getKey(), content.getValue()); } Record record = recordSender.createRecord(); JSONObject extractOthers = new JSONObject(); if(columns.contains(Constant.COL_EXTRACT_OTHERS)){ List keyList = Arrays.asList(dataMap.keySet().toArray(new String[dataMap.keySet().size()])); for (String otherKey:keyList) { if (!columns.contains(otherKey)){ extractOthers.put(otherKey,dataMap.get(otherKey)); } } } if (null != this.columns && 1 == this.columns.size()) { String columnsInStr = columns.get(0).toString(); if ("\"*\"".equals(columnsInStr) || "*".equals(columnsInStr)) { List keyList = Arrays.asList(dataMap.keySet().toArray(new String[dataMap.keySet().size()])); Collections.sort(keyList); for (String key : keyList) { record.addColumn(new StringColumn(key + ":" + dataMap.get(key))); } } else { if (dataMap.containsKey(columnsInStr)) { record.addColumn(new StringColumn(dataMap.get(columnsInStr))); } else if (metaMap.containsKey(columnsInStr)) { record.addColumn(new StringColumn(metaMap.get(columnsInStr))); } else if (Constant.COL_EXTRACT_OTHERS.equals(columnsInStr)){ record.addColumn(new StringColumn(extractOthers.toJSONString())); } else if (Constant.COL_EXTRACT_ALL_META.equals(columnsInStr)) { record.addColumn(new StringColumn(allMetaJson.toJSONString())); } } } else { for (String col : this.columns) { if (dataMap.containsKey(col)) { record.addColumn(new StringColumn(dataMap.get(col))); } else if (metaMap.containsKey(col)) { record.addColumn(new StringColumn(metaMap.get(col))); } else if (col != null && col.startsWith("'") && col.endsWith("'")){ String constant = col.substring(1, col.length()-1); record.addColumn(new StringColumn(constant)); }else if (Constant.COL_EXTRACT_OTHERS.equals(col)){ record.addColumn(new StringColumn(extractOthers.toJSONString())); } else if (Constant.COL_EXTRACT_ALL_META.equals(col)) { record.addColumn(new StringColumn(allMetaJson.toJSONString())); } else { record.addColumn(new StringColumn(null)); } } } recordSender.sendToWriter(record); } } nextCursor = logDataRes.GetNextCursor(); } } catch (LogException e) { if (e.GetErrorCode().equals("LogStoreNotExist")) { LOG.info("logStore[" + logstore +"] Not Exits! detail error messsage: " + e.toString()); } else { LOG.error("read LogStore[" + logstore + "] error, please check ! detail error messsage: " + e.toString()); throw DataXException.asDataXException(LogHubReaderErrorCode.LOG_HUB_ERROR, e); } } catch (Exception e) { LOG.error("read LogStore[" + logstore + "] error, please check ! detail error messsage: " + e.toString()); throw DataXException.asDataXException(LogHubReaderErrorCode.LOG_HUB_ERROR, e); } LOG.info("end read loghub shard..."); } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: loghubreader/src/main/java/com/alibaba/datax/plugin/reader/loghubreader/LogHubReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.loghubreader; import com.alibaba.datax.common.spi.ErrorCode; public enum LogHubReaderErrorCode implements ErrorCode { BAD_CONFIG_VALUE("LogHuReader-00", "The value you configured is invalid."), LOG_HUB_ERROR("LogHubReader-01","LogHub access encounter exception"), REQUIRE_VALUE("LogHubReader-02","Missing parameters"), EMPTY_LOGSTORE_VALUE("LogHubReader-03","There is no shard in this LogStore"); private final String code; private final String description; private LogHubReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: loghubreader/src/main/resources/plugin.json ================================================ { "name": "loghubreader", "class": "com.alibaba.datax.plugin.reader.loghubreader.LogHubReader", "description": "适用于: 从SLS LogHub中读取数据", "developer": "alibaba" } ================================================ FILE: loghubreader/src/main/resources/plugin_job_template.json ================================================ { "name": "loghubreader", "parameter": { "endpoint": "", "accessId": "", "accessKey": "", "project": "", "logstore": "", "batchSize":1024, "column": [] } } ================================================ FILE: loghubwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 loghubwriter 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.aliyun.openservices aliyun-log 0.6.12 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: loghubwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin/writer/loghubwriter target/ loghubwriter-0.0.1-SNAPSHOT.jar plugin/writer/loghubwriter false plugin/writer/loghubwriter/libs runtime ================================================ FILE: loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.loghubwriter; /** * 配置关键字 * @author */ public final class Key { /** * 此处声明插件用到的需要插件使用者提供的配置项 */ public static final String ENDPOINT = "endpoint"; public static final String ACCESS_KEY_ID = "accessId"; public static final String ACCESS_KEY_SECRET = "accessKey"; public static final String PROJECT = "project"; public static final String LOG_STORE = "logstore"; public static final String TOPIC = "topic"; public static final String COLUMN = "column"; public static final String BATCH_SIZE = "batchSize"; public static final String TIME = "time"; public static final String TIME_FORMAT = "timeformat"; public static final String SOURCE = "source"; public static final String HASH_BY_KEY = "hashKey"; } ================================================ FILE: loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/LogHubWriter.java ================================================ package com.alibaba.datax.plugin.writer.loghubwriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.common.util.StrUtil; import com.aliyun.openservices.log.Client; import com.aliyun.openservices.log.common.LogItem; import com.aliyun.openservices.log.common.Shard; import com.aliyun.openservices.log.exception.LogException; import com.aliyun.openservices.log.request.ListShardRequest; import com.aliyun.openservices.log.request.PutLogsRequest; import com.aliyun.openservices.log.response.ListShardResponse; import com.aliyun.openservices.log.response.PutLogsResponse; import org.apache.commons.codec.digest.Md5Crypt; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import sun.security.provider.MD5; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Date; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.concurrent.Callable; /** * SLS 写插件 * @author */ public class LogHubWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration jobConfig = null; @Override public void init() { info(LOG, "loghub writer job init begin ..."); this.jobConfig = super.getPluginJobConf(); validateParameter(jobConfig); info(LOG, "loghub writer job init end."); } private void validateParameter(Configuration conf){ conf.getNecessaryValue(Key.ENDPOINT,LogHubWriterErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.ACCESS_KEY_ID,LogHubWriterErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.ACCESS_KEY_SECRET,LogHubWriterErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.PROJECT,LogHubWriterErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.LOG_STORE,LogHubWriterErrorCode.REQUIRE_VALUE); conf.getNecessaryValue(Key.COLUMN,LogHubWriterErrorCode.REQUIRE_VALUE); } @Override public List split(int mandatoryNumber) { info(LOG, "split begin..."); List configurationList = new ArrayList(); for (int i = 0; i < mandatoryNumber; i++) { configurationList.add(this.jobConfig.clone()); } info(LOG, "split end..."); return configurationList; } @Override public void post() { } @Override public void destroy() { } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration taskConfig; private com.aliyun.openservices.log.Client logHubClient; private String logStore; private String topic; private String project; private List columnList; private int batchSize; private String timeCol; private String timeFormat; private String source; private boolean isHashKey; private List shards; public void init() { this.taskConfig = super.getPluginJobConf(); String endpoint = taskConfig.getString(Key.ENDPOINT); String accessKeyId = taskConfig.getString(Key.ACCESS_KEY_ID); String accessKeySecret = taskConfig.getString(Key.ACCESS_KEY_SECRET); project = taskConfig.getString(Key.PROJECT); logStore = taskConfig.getString(Key.LOG_STORE); topic = taskConfig.getString(Key.TOPIC,""); columnList = taskConfig.getList(Key.COLUMN,String.class); batchSize = taskConfig.getInt(Key.BATCH_SIZE,1024); timeCol = taskConfig.getString(Key.TIME,""); timeFormat = taskConfig.getString(Key.TIME_FORMAT,""); source = taskConfig.getString(Key.SOURCE,""); isHashKey = taskConfig.getBool(Key.HASH_BY_KEY,false); logHubClient = new Client(endpoint, accessKeyId, accessKeySecret); if (isHashKey) { listShard(); info(LOG, "init loghub writer with hash key mode."); } if (LOG.isInfoEnabled()) { LOG.info("init loghub writer task finished.project:{} logstore:{} topic:{} batchSize:{}",project,logStore,topic,batchSize); } } /** * 获取通道的分片信息 */ private void listShard() { try { ListShardResponse response = logHubClient.ListShard(new ListShardRequest(project,logStore)); shards = response.GetShards(); if (LOG.isInfoEnabled()) { LOG.info("Get shard count:{}", shards.size()); } } catch (LogException e) { info(LOG, "Get shard failed!"); throw new RuntimeException("Get shard failed!", e); } } @Override public void prepare() { } private int getTime(String v) { try { if ("bigint".equalsIgnoreCase(timeFormat)) { return Integer.valueOf(v); } DateFormat sdf = new SimpleDateFormat(timeFormat); Date date = sdf.parse(v); return (int)(date.getTime()/1000); } catch (Exception e) { LOG.warn("Format time failed!", e); } return (int)(((new Date())).getTime()/1000); } @Override public void startWrite(RecordReceiver recordReceiver) { info(LOG, "start to write....................."); // 按照shared做hash处理 if (isHashKey) { processDataWithHashKey(recordReceiver); } else { processDataWithoutHashKey(recordReceiver); } info(LOG, "finish to write........."); } private void processDataWithHashKey(RecordReceiver receiver) { Record record; Map> logMap = new HashMap>(shards.size()); int count = 0; try { while ((record = receiver.getFromReader()) != null) { LogItem logItem = new LogItem(); if (record.getColumnNumber() != columnList.size()) { this.getTaskPluginCollector().collectDirtyRecord(record, "column not match"); } String id = ""; for (int i = 0; i < record.getColumnNumber(); i++) { String colName = columnList.get(i); String colValue = record.getColumn(i).asString(); if (colName.endsWith("_id")) { id = colValue; } logItem.PushBack(colName, colValue); if (colName.equals(timeCol)) { logItem.SetTime(getTime(colValue)); } } String hashKey = getShardHashKey(StrUtil.getMd5(id), shards); if (!logMap.containsKey(hashKey)) { info(LOG, "Hash key:" + hashKey); logMap.put(hashKey, new ArrayList()); } logMap.get(hashKey).add(logItem); if (logMap.get(hashKey).size() % batchSize == 0) { PutLogsRequest request = new PutLogsRequest(project, logStore, topic, source, logMap.get(hashKey), hashKey); PutLogsResponse response = putLog(request); count += logMap.get(hashKey).size(); if (LOG.isDebugEnabled()) { LOG.debug("record count:{}, request id:{}", logMap.get(hashKey).size(), response.GetRequestId()); } logMap.get(hashKey).clear(); } } for (Map.Entry> entry : logMap.entrySet()) { if (!entry.getValue().isEmpty()) { // 将剩余的数据发送 PutLogsRequest request = new PutLogsRequest(project, logStore, topic, source, entry.getValue(), entry.getKey()); PutLogsResponse response = putLog(request); count += entry.getValue().size(); if (LOG.isDebugEnabled()) { LOG.debug("record count:{}, request id:{}", entry.getValue().size(), response.GetRequestId()); } entry.getValue().clear(); } } LOG.info("{} records have been sent", count); } catch (LogException ex) { throw DataXException.asDataXException(LogHubWriterErrorCode.LOG_HUB_ERROR, ex.getMessage(), ex); } catch (Exception e) { throw DataXException.asDataXException(LogHubWriterErrorCode.LOG_HUB_ERROR, e.getMessage(), e); } } private void processDataWithoutHashKey(RecordReceiver receiver) { Record record; ArrayList logGroup = new ArrayList(); int count = 0; try { while ((record = receiver.getFromReader()) != null) { LogItem logItem = new LogItem(); if(record.getColumnNumber() != columnList.size()){ this.getTaskPluginCollector().collectDirtyRecord(record,"column not match"); } for (int i = 0; i < record.getColumnNumber(); i++) { String colName = columnList.get(i); String colValue = record.getColumn(i).asString(); logItem.PushBack(colName, colValue); if(colName.equals(timeCol)){ logItem.SetTime(getTime(colValue)); } } logGroup.add(logItem); count++; if (count % batchSize == 0) { PutLogsRequest request = new PutLogsRequest(project, logStore, topic, source, logGroup); PutLogsResponse response = putLog(request); logGroup.clear(); if (LOG.isDebugEnabled()) { LOG.debug("record count:{}, request id:{}", count, response.GetRequestId()); } } } if (!logGroup.isEmpty()) { //将剩余的数据发送 PutLogsRequest request = new PutLogsRequest(project, logStore, topic, source, logGroup); PutLogsResponse response = putLog(request); logGroup.clear(); if (LOG.isDebugEnabled()) { LOG.debug("record count:{}, request id:{}", count, response.GetRequestId()); } } LOG.info("{} records have been sent", count); } catch (LogException ex) { throw DataXException.asDataXException(LogHubWriterErrorCode.LOG_HUB_ERROR, ex.getMessage(), ex); } catch (Exception e) { throw DataXException.asDataXException(LogHubWriterErrorCode.LOG_HUB_ERROR, e.getMessage(), e); } } private PutLogsResponse putLog(final PutLogsRequest request) throws Exception{ final Client client = this.logHubClient; return RetryUtil.executeWithRetry(new Callable() { public PutLogsResponse call() throws LogException{ return client.PutLogs(request); } }, 3, 1000L, false); } private String getShardHashKey(String hashKey, List shards) { for (Shard shard : shards) { if (hashKey.compareTo(shard.getExclusiveEndKey()) < 0 && hashKey.compareTo(shard.getInclusiveBeginKey()) >= 0) { return shard.getInclusiveBeginKey(); } } return shards.get(0).getInclusiveBeginKey(); } @Override public void post() { } @Override public void destroy() { } } /** * 日志打印控制 * * @param logger * @param message */ public static void info(Logger logger, String message) { if (logger.isInfoEnabled()) { logger.info(message); } } } ================================================ FILE: loghubwriter/src/main/java/com/alibaba/datax/plugin/writer/loghubwriter/LogHubWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.loghubwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum LogHubWriterErrorCode implements ErrorCode { BAD_CONFIG_VALUE("LogHubWriter-00", "The value you configured is invalid."), LOG_HUB_ERROR("LogHubWriter-01","LogHub access encounter exception"), REQUIRE_VALUE("LogHubWriter-02","Missing parameters"); private final String code; private final String description; private LogHubWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: loghubwriter/src/main/resources/plugin.json ================================================ { "name": "loghubwriter", "class": "com.alibaba.datax.plugin.writer.loghubwriter.LogHubWriter", "description": "适用于: 将数据导入到SLS LogHub中", "developer": "alibaba" } ================================================ FILE: loghubwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "loghubwriter", "parameter": { "endpoint": "", "accessId": "", "accessKey": "", "project": "", "logstore": "", "topic": "", "batchSize":1024, "column": [] } } ================================================ FILE: milvuswriter/doc/milvuswriter.md ================================================ # DataX milvuswriter --- ## 1 快速介绍 milvuswriter 插件实现了写入数据到 milvus集合的功能; 面向ETL开发工程师,使用 milvuswriter 从数仓导入数据到 milvus, 同时 milvuswriter 亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 milvuswriter 通过 DataX 框架获取 Reader 生成的协议数据,通过 `upsert/insert `方式写入数据到milvus, 并通过batchSize累积的方式进行数据提交。
注意:upsert写入方式(推荐): 在非autid表场景下根据主键更新 Collection 中的某个 Entity;autid表场景下会将 Entity 中的主键替换为自动生成的主键并插入数据。 insert写入方式: 多用于autid表插入数据milvus自动生成主键, 非autoid表下使用insert会导致数据重复。 ## 3 功能说明 ### 3.1 配置样例 * 这里提供一份从内存产生数据导入到 milvus的配置样例。 ```json { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": 1, "type": "long" }, { "value": "[1.1,1.2,1.3]", "type": "string" }, { "value": 100, "type": "long" }, { "value": 200, "type": "long" }, { "value": 300, "type": "long" }, { "value": 3.14159, "type": "double" }, { "value": 3.1415926, "type": "double" }, { "value": "testvarcharvalue", "type": "string" }, { "value": true, "type": "bool" }, { "value": "[1.123,1.2456,1.3789]", "type": "string" }, { "value": "[2.123,2.2456,2.3789]", "type": "string" }, { "value": "12345678", "type": "string" }, { "value": "{\"a\":1,\"b\":2,\"c\":3}", "type": "string" }, { "value": "[1,2,3,4]", "type": "string" } ], "sliceRecordCount": 1 } }, "writer": { "parameter": { "schemaCreateMode": "createIfNotExist", "connectTimeoutMs": 60000, "writeMode": "upsert", "collection": "demo01", "type": "milvus", "token": "xxxxxxx", "endpoint": "https://xxxxxxxx.com:443", "batchSize": 1024, "column": [ { "name": "id", "type": "Int64", "primaryKey": "true" }, { "name": "floatvector", "type": "FloatVector", "dimension": "3" }, { "name": "int8col", "type": "Int8" }, { "name": "int16col", "type": "Int16" }, { "name": "int32col", "type": "Int32" }, { "name": "floatcol", "type": "Float" }, { "name": "doublecol", "type": "Double" }, { "name": "varcharcol", "type": "VarChar" }, { "name": "boolcol", "type": "Bool" }, { "name": "bfloat16vectorcol", "type": "BFloat16Vector", "dimension": "3" }, { "name": "float16vectorcol", "type": "Float16Vector", "dimension": "3" }, { "name": "binaryvectorcol", "type": "BinaryVector", "dimension": "64" }, { "name": "jsoncol", "type": "JSON" }, { "name": "intarraycol", "maxCapacity": "8", "type": "Array", "elementType": "Int32" } ] }, "name": "milvuswriter" } } ], "setting": { "errorLimit": { "record": "0" }, "speed": { "concurrent": 2, "channel": 2 } } } } ``` ### 3.2 参数说明 * **endpoint** * 描述:milvus数据库的连接信息,包含地址和端口,例如https://xxxxxxxx.com:443 注意:1、在一个数据库上只能配置一个 endpoint 值 2、一个milvus 写入任务仅能配置一个 endpoint * 必选:是
* 默认值:无
* *schemaCreateMode* * 描述: 集合创建的模式,同步时milvus集合不存在的处理方式, 根据配置的column属性进行创建 * 取值 * createIfNotExist: 如果集合不存在,则创建集合,如果集合存在,则不执行任何操作 * ignore: 如果集合不存在,任务异常报错,如果集合存在,则不执行任何操作 * recreate: 如果集合不存在,则创建集合,如果集合存在,则删除集合重建集合 * 必选:否
* 默认值:createIfNotExist
* **connectTimeoutMs** * 描述:与milvus交互是客户端的连接超时时间,单位毫秒
* 必选:否
* 默认值:10000
* **token** * 描述:milvus实例认证的token秘钥,与username认证方式二选一配置
* 必选:否
* 默认值:无
* **username** * 描述:目的milvus数据库的用户名, 与token二选一配置
* 必选:否
* 默认值:无
* **password** * 描述:目的milvus数据库的密码
* 必选:否
* 默认值:无
* *writeMode* * 描述: 写入milvus集合的写入方式 * 取值 * upsert(推荐): 在非autid表场景下根据主键更新 Collection 中的某个 Entity;autid表场景下会将 Entity 中的主键替换为自动生成的主键并插入数据。 * insert: 多用于autid表插入数据milvus自动生成主键, 非autoid表下使用insert会导致数据重复。 * 必选:是
* 默认值:upsert
* **collection** * 描述:目的集合名称。 只能配置一个milvus的集合名称。 * 必选:是
* 默认值:无
* **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与milvus的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
* 必选:否
* 默认值:1024
* **column** * 描述:目的集合需要写入数据的字段,字段内容用json格式描述,字段之间用英文逗号分隔。字段属性必填name、type, 其他属性在需要schemaCreateMode创建集合按需填入,例如: "column": [ { "name": "id", "type": "Int64", "primaryKey": "true" }, { "name": "floatvector", "type": "FloatVector", "dimension": "3" }] * 必选:是
* 默认值:否
### 3.3 支持同步milvus字段类型 Bool, Int8, Int16, Int32, Int64, Float, Double, String, VarChar, Array, JSON, BinaryVector, FloatVector, Float16Vector, BFloat16Vector, SparseFloatVector ================================================ FILE: milvuswriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT milvuswriter UTF-8 official 1.8 com.alibaba.fastjson2 fastjson2 2.0.49 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.projectlombok lombok 1.18.30 guava com.google.guava 32.0.1-jre io.milvus milvus-sdk-java 2.5.2 org.mockito mockito-core 3.3.3 test junit junit 4.11 test org.jetbrains.kotlin kotlin-stdlib 2.0.0 org.powermock powermock-module-junit4 2.0.9 test org.powermock powermock-api-mockito2 2.0.9 test src/main/resources **/*.* true maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: milvuswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/milvuswriter target/ milvuswriter-0.0.1-SNAPSHOT.jar plugin/writer/milvuswriter false plugin/writer/milvuswriter/libs runtime ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/KeyConstant.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter; public class KeyConstant { public static final String USERNAME = "username"; public static final String PASSWORD = "password"; public static final String ENDPOINT = "endpoint"; public static final String TOKEN = "token"; public static final String DATABASE = "database"; public static final String COLLECTION = "collection"; public static final String BATCH_SIZE = "batchSize"; public static final String COLUMN = "column"; public static final String SCHAME_CREATE_MODE = "schemaCreateMode"; public static final String WRITE_MODE = "writeMode"; public static final String PARTITION = "partition"; public static final String CONNECT_TIMEOUT_MS = "connectTimeoutMs"; public static final String ENABLE_DYNAMIC_SCHEMA = "enableDynamicSchema"; } ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/MilvusBufferWriter.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.plugin.writer.milvuswriter.enums.WriteModeEnum; import com.alibaba.fastjson2.JSONArray; import com.google.gson.Gson; import com.google.gson.JsonObject; import io.milvus.v2.common.DataType; import io.milvus.v2.service.vector.request.data.BFloat16Vec; import io.milvus.v2.service.vector.request.data.Float16Vec; import lombok.extern.slf4j.Slf4j; import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.List; import java.util.TreeMap; import java.util.stream.Collectors; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; @Slf4j public class MilvusBufferWriter { private final MilvusClient milvusClient; private final String collection; private final Integer batchSize; private List dataCache; private List milvusColumnMeta; private WriteModeEnum writeMode; private String partition; public MilvusBufferWriter(MilvusClient milvusClient, Configuration writerSliceConfig) { this.milvusClient = milvusClient; this.collection = writerSliceConfig.getString(KeyConstant.COLLECTION); this.batchSize = writerSliceConfig.getInt(KeyConstant.BATCH_SIZE, 100); this.dataCache = new ArrayList<>(batchSize); this.milvusColumnMeta = JSON.parseObject(writerSliceConfig.getString(KeyConstant.COLUMN), new TypeReference>() { }); this.writeMode = WriteModeEnum.getEnum(writerSliceConfig.getString(KeyConstant.WRITE_MODE)); this.partition = writerSliceConfig.getString(KeyConstant.PARTITION); } public void add(Record record, TaskPluginCollector taskPluginCollector) { try { JsonObject data = this.convertByType(milvusColumnMeta, record); dataCache.add(data); } catch (Exception e) { taskPluginCollector.collectDirtyRecord(record, String.format("parse record error errorMessage: %s", e.getMessage())); } } public Boolean needCommit() { return dataCache.size() >= batchSize; } public void commit() { if (dataCache.isEmpty()) { log.info("dataCache is empty, skip commit"); return; } if (writeMode == WriteModeEnum.INSERT) { milvusClient.insert(collection, partition, dataCache); } else { milvusClient.upsert(collection, partition, dataCache); } dataCache = new ArrayList<>(batchSize); } public int getDataCacheSize() { return dataCache.size(); } private JsonObject convertByType(List milvusColumnMeta, Record record) { JsonObject data = new JsonObject(); Gson gson = new Gson(); for (int i = 0; i < record.getColumnNumber(); i++) { MilvusColumn milvusColumn = milvusColumnMeta.get(i); DataType fieldType = milvusColumn.getMilvusTypeEnum(); String fieldName = milvusColumn.getName(); Column column = record.getColumn(i); try { Object field = convertToMilvusField(fieldType, column, milvusColumn); data.add(fieldName, gson.toJsonTree(field)); } catch (Exception e) { log.error("parse error for column: {} errorMessage: {}", fieldName, e.getMessage(), e); throw e; } } return data; } //值需要跟这里匹配:io.milvus.param.ParamUtils#checkFieldData(io.milvus.param.collection.FieldType, java.util.List, boolean) private Object convertToMilvusField(DataType type, Column column, MilvusColumn milvusColumn) { if (column.getRawData() == null) { return null; } switch (type) { case Int8: case Int16: case Int32: case Int64: return column.asLong(); case Float: case Double: return column.asDouble(); case String: case VarChar: return column.asString(); case Bool: return column.asBoolean(); case BFloat16Vector: JSONArray bFloat16ArrayJson = JSON.parseArray(column.asString()); List bfloat16Vector = new ArrayList<>(); for (int i = 0; i < bFloat16ArrayJson.size(); i++) { Float value = Float.parseFloat(bFloat16ArrayJson.getString(i)); bfloat16Vector.add(value); } BFloat16Vec bFloat16Vec = new BFloat16Vec(bfloat16Vector); ByteBuffer byteBuffer = (ByteBuffer) bFloat16Vec.getData(); return byteBuffer.array(); case Float16Vector: JSONArray float16ArrayJson = JSON.parseArray(column.asString()); List float16Vector = new ArrayList<>(); for (int i = 0; i < float16ArrayJson.size(); i++) { Float floatValue = Float.parseFloat(float16ArrayJson.getString(i)); float16Vector.add(floatValue); } Float16Vec float16Vec = new Float16Vec(float16Vector); ByteBuffer data = (ByteBuffer) float16Vec.getData(); return data.array(); case BinaryVector: return column.asBytes(); case FloatVector: JSONArray arrayJson = JSON.parseArray(column.asString()); return arrayJson.stream().map(item -> Float.parseFloat(String.valueOf(item))).collect(Collectors.toList()); case SparseFloatVector: //[3:0.5, 24:0.8, 76:0.2] try { JSONArray sparseFloatArray = JSON.parseArray(column.asString()); TreeMap mapValue = new TreeMap<>(); for (int i = 0; i < sparseFloatArray.size(); i++) { String value = sparseFloatArray.getString(i); String[] split = value.split(":"); Long key = Long.parseLong(split[0]); Float val = Float.parseFloat(split[1]); mapValue.put(key, val); } return mapValue; } catch (Exception e) { log.error("parse column[{}] SparseFloatVector value error, value should like [3:0.5, 24:0.8, 76:0.2], but get:{}", milvusColumn.getName(), column.asString()); throw e; } case JSON: return column.asString(); case Array: JSONArray parseArray = JSON.parseArray(column.asString()); return parseArray.stream().map(item -> String.valueOf(item)).collect(Collectors.toList()); default: throw new RuntimeException(String.format("Unsupported data type[%s]", type)); } } } ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/MilvusClient.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter; import java.util.List; import com.alibaba.datax.common.util.Configuration; import com.google.gson.JsonObject; import io.milvus.v2.client.ConnectConfig; import io.milvus.v2.client.MilvusClientV2; import io.milvus.v2.service.collection.request.CreateCollectionReq; import io.milvus.v2.service.collection.request.DropCollectionReq; import io.milvus.v2.service.collection.request.HasCollectionReq; import io.milvus.v2.service.partition.request.CreatePartitionReq; import io.milvus.v2.service.partition.request.HasPartitionReq; import io.milvus.v2.service.vector.request.InsertReq; import io.milvus.v2.service.vector.request.UpsertReq; import lombok.extern.slf4j.Slf4j; import org.apache.commons.lang3.StringUtils; /** * @author ziming(子茗) * @date 12/27/24 * @description */ @Slf4j public class MilvusClient { private MilvusClientV2 milvusClientV2; public MilvusClient(Configuration conf) { // connect to milvus ConnectConfig connectConfig = ConnectConfig.builder().uri(conf.getString(KeyConstant.ENDPOINT)).build(); String token = null; if (conf.getString(KeyConstant.TOKEN) != null) { token = conf.getString(KeyConstant.TOKEN); } else { token = conf.getString(KeyConstant.USERNAME) + ":" + conf.getString(KeyConstant.PASSWORD); } connectConfig.setToken(token); String database = conf.getString(KeyConstant.DATABASE); if (StringUtils.isNotBlank(database)) { log.info("use database {}", database); connectConfig.setDbName(conf.getString(KeyConstant.DATABASE)); } Integer connectTimeOut = conf.getInt(KeyConstant.CONNECT_TIMEOUT_MS); if (connectTimeOut != null) { connectConfig.setConnectTimeoutMs(connectTimeOut); } this.milvusClientV2 = new MilvusClientV2(connectConfig); } public void upsert(String collection, String partition, List data) { UpsertReq upsertReq = UpsertReq.builder().collectionName(collection).data(data).build(); if (StringUtils.isNotEmpty(partition)) { upsertReq.setPartitionName(partition); } milvusClientV2.upsert(upsertReq); } public void insert(String collection, String partition, List data) { InsertReq insertReq = InsertReq.builder().collectionName(collection).data(data).build(); if (StringUtils.isNotEmpty(partition)) { insertReq.setPartitionName(partition); } milvusClientV2.insert(insertReq); } public Boolean hasCollection(String collection) { HasCollectionReq build = HasCollectionReq.builder().collectionName(collection).build(); return milvusClientV2.hasCollection(build); } public void createCollection(String collection, CreateCollectionReq.CollectionSchema schema) { CreateCollectionReq createCollectionReq = CreateCollectionReq.builder().collectionName(collection).collectionSchema(schema).build(); milvusClientV2.createCollection(createCollectionReq); } public void dropCollection(String collection) { DropCollectionReq request = DropCollectionReq.builder().collectionName(collection).build(); milvusClientV2.dropCollection(request); } public Boolean hasPartition(String collection, String partition) { HasPartitionReq hasPartitionReq = HasPartitionReq.builder().collectionName(collection).partitionName(partition).build(); return milvusClientV2.hasPartition(hasPartitionReq); } public void createPartition(String collectionName, String partitionName) { CreatePartitionReq createPartitionReq = CreatePartitionReq.builder().collectionName(collectionName).partitionName(partitionName).build(); milvusClientV2.createPartition(createPartitionReq); } public void close() { log.info("Closing Milvus client"); milvusClientV2.close(); } } ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/MilvusColumn.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter; import io.milvus.v2.common.DataType; import java.util.Arrays; /** * @author ziming(子茗) * @date 12/27/24 * @description */ public class MilvusColumn { private String name; private String type; private DataType milvusTypeEnum; private Boolean isPrimaryKey; private Integer dimension; private Boolean isPartitionKey; private Integer maxLength; private Boolean isAutoId; private Integer maxCapacity; private String elementType; public String getName() { return name; } public void setName(String name) { this.name = name; } public String getType() { return type; } public void setType(String type) { this.type = type; for (DataType item : DataType.values()) { if (item.name().equalsIgnoreCase(type)) { this.milvusTypeEnum = item; break; } } if (this.milvusTypeEnum == null) { throw new RuntimeException("Unsupported type: " + type + " supported types: " + Arrays.toString(DataType.values())); } } public Integer getDimension() { return dimension; } public void setDimension(Integer dimension) { this.dimension = dimension; } public Integer getMaxLength() { return maxLength; } public void setMaxLength(Integer maxLength) { this.maxLength = maxLength; } public Boolean getPrimaryKey() { return isPrimaryKey; } public Boolean getPartitionKey() { return isPartitionKey; } public void setPartitionKey(Boolean partitionKey) { isPartitionKey = partitionKey; } public void setPrimaryKey(Boolean primaryKey) { isPrimaryKey = primaryKey; } public Boolean getAutoId() { return isAutoId; } public void setAutoId(Boolean autoId) { isAutoId = autoId; } public Integer getMaxCapacity() { return maxCapacity; } public void setMaxCapacity(Integer maxCapacity) { this.maxCapacity = maxCapacity; } public String getElementType() { return elementType; } public void setElementType(String elementType) { this.elementType = elementType; } public DataType getMilvusTypeEnum() { return milvusTypeEnum; } public void setMilvusTypeEnum(DataType milvusTypeEnum) { this.milvusTypeEnum = milvusTypeEnum; } } ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/MilvusCreateCollection.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter; import java.util.List; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.milvuswriter.enums.SchemaCreateModeEnum; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import io.milvus.v2.common.DataType; import io.milvus.v2.service.collection.request.AddFieldReq; import io.milvus.v2.service.collection.request.CreateCollectionReq; import lombok.extern.slf4j.Slf4j; import static io.milvus.v2.common.DataType.valueOf; /** * @author ziming(子茗) * @date 12/27/24 * @description */ @Slf4j public class MilvusCreateCollection { private Configuration config; MilvusCreateCollection(Configuration originalConfig) { this.config = originalConfig; } public void createCollectionByMode(MilvusClient milvusClient) { String collection = this.config.getString(KeyConstant.COLLECTION); SchemaCreateModeEnum schemaCreateMode = SchemaCreateModeEnum.getEnum(this.config.getString(KeyConstant.SCHAME_CREATE_MODE)); List milvusColumnMeta = JSON.parseObject(config.getString(KeyConstant.COLUMN), new TypeReference>() { }); Boolean hasCollection = milvusClient.hasCollection(collection); if (schemaCreateMode == SchemaCreateModeEnum.CREATEIFNOTEXIT) { // create collection if (hasCollection) { log.info("collection[{}] already exists, continue create", collection); } else { log.info("creating collection[{}]", collection); CreateCollectionReq.CollectionSchema collectionSchema = prepareCollectionSchema(milvusColumnMeta); milvusClient.createCollection(collection, collectionSchema); } } else if (schemaCreateMode == SchemaCreateModeEnum.RECREATE) { if (hasCollection) { log.info("collection already exist, try to drop"); milvusClient.dropCollection(collection); } log.info("creating collection[{}]", collection); CreateCollectionReq.CollectionSchema collectionSchema = prepareCollectionSchema(milvusColumnMeta); milvusClient.createCollection(collection, collectionSchema); } else if (schemaCreateMode == SchemaCreateModeEnum.IGNORE && !hasCollection) { log.error("Collection not exist, throw exception"); throw new RuntimeException("Collection not exist"); } } private CreateCollectionReq.CollectionSchema prepareCollectionSchema(List milvusColumnMeta) { CreateCollectionReq.CollectionSchema collectionSchema = CreateCollectionReq.CollectionSchema.builder().build(); for (int i = 0; i < milvusColumnMeta.size(); i++) { MilvusColumn milvusColumn = milvusColumnMeta.get(i); AddFieldReq addFieldReq = AddFieldReq.builder() .fieldName(milvusColumn.getName()) .dataType(valueOf(milvusColumn.getType())) .build(); if (milvusColumn.getPrimaryKey() != null) { addFieldReq.setIsPrimaryKey(milvusColumn.getPrimaryKey()); } if (milvusColumn.getDimension() != null) { addFieldReq.setDimension(milvusColumn.getDimension()); } if (milvusColumn.getPartitionKey() != null) { addFieldReq.setIsPartitionKey(milvusColumn.getPartitionKey()); } if (milvusColumn.getMaxLength() != null) { addFieldReq.setMaxLength(milvusColumn.getMaxLength()); } if (milvusColumn.getAutoId() != null) { addFieldReq.setAutoID(milvusColumn.getAutoId()); } if (milvusColumn.getMaxCapacity() != null) { addFieldReq.setMaxCapacity(milvusColumn.getMaxCapacity()); } if (milvusColumn.getElementType() != null) { addFieldReq.setElementType(DataType.valueOf(milvusColumn.getElementType())); } try { collectionSchema.addField(addFieldReq); } catch (Exception e) { log.error("add filed[{}] error", milvusColumn.getName()); throw e; } } Boolean enableDynamic = config.getBool(KeyConstant.ENABLE_DYNAMIC_SCHEMA); if (enableDynamic != null) { collectionSchema.setEnableDynamicField(enableDynamic); } return collectionSchema; } } ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/MilvusWriter.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import lombok.extern.slf4j.Slf4j; import java.util.ArrayList; import java.util.List; @Slf4j public class MilvusWriter extends Writer { public static class Job extends Writer.Job { private Configuration originalConfig = null; @Override public void init() { this.originalConfig = super.getPluginJobConf(); originalConfig.getNecessaryValue(KeyConstant.ENDPOINT, MilvusWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(KeyConstant.COLUMN, MilvusWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(KeyConstant.COLLECTION, MilvusWriterErrorCode.REQUIRED_VALUE); } @Override public void prepare() { //collection create process MilvusClient milvusClient = new MilvusClient(originalConfig); try { MilvusCreateCollection milvusCreateCollection = new MilvusCreateCollection(originalConfig); milvusCreateCollection.createCollectionByMode(milvusClient); String collection = originalConfig.getString(KeyConstant.COLLECTION); String partition = originalConfig.getString(KeyConstant.PARTITION); if (partition != null && !milvusClient.hasPartition(collection, partition)) { log.info("collection[{}] not contain partition[{}],try to create partition", collection, partition); milvusClient.createPartition(collection, partition); } } catch (Exception e) { throw DataXException.asDataXException(MilvusWriterErrorCode.MILVUS_COLLECTION, e.getMessage(), e); } finally { milvusClient.close(); } } /** * 切分任务。
* * @param mandatoryNumber 为了做到Reader、Writer任务数对等,这里要求Writer插件必须按照源端的切分数进行切分。否则框架报错! */ @Override public List split(int mandatoryNumber) { List configList = new ArrayList<>(); for (int i = 0; i < mandatoryNumber; i++) { configList.add(this.originalConfig.clone()); } return configList; } @Override public void destroy() { } } public static class Task extends Writer.Task { private MilvusBufferWriter milvusBufferWriter; MilvusClient milvusClient; @Override public void init() { log.info("Initializing Milvus writer"); // get configuration Configuration writerSliceConfig = this.getPluginJobConf(); this.milvusClient = new MilvusClient(writerSliceConfig); this.milvusBufferWriter = new MilvusBufferWriter(this.milvusClient, writerSliceConfig); log.info("Milvus writer initialized"); } @Override public void startWrite(RecordReceiver lineReceiver) { Record record = null; while ((record = lineReceiver.getFromReader()) != null) { milvusBufferWriter.add(record, this.getTaskPluginCollector()); if (milvusBufferWriter.needCommit()) { log.info("begin committing data size[{}]", milvusBufferWriter.getDataCacheSize()); milvusBufferWriter.commit(); } } if (milvusBufferWriter.getDataCacheSize() > 0) { log.info("begin committing data size[{}]", milvusBufferWriter.getDataCacheSize()); milvusBufferWriter.commit(); } } @Override public void prepare() { super.prepare(); } @Override public void destroy() { if (this.milvusClient != null) { this.milvusClient.close(); } } } } ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/MilvusWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter; import com.alibaba.datax.common.spi.ErrorCode; /** * @author ziming(子茗) * @date 12/27/24 * @description */ public enum MilvusWriterErrorCode implements ErrorCode { MILVUS_COLLECTION("MilvusWriter-01", "collection process error"), REQUIRED_VALUE("MilvusWriter-02", "miss required parameter"); private final String code; private final String description; MilvusWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/enums/SchemaCreateModeEnum.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter.enums; import lombok.extern.slf4j.Slf4j; /** * @author ziming(子茗) * @date 12/27/24 * @description */ @Slf4j public enum SchemaCreateModeEnum { CREATEIFNOTEXIT("createIfNotExist"), IGNORE("ignore"), RECREATE("recreate"); String type; SchemaCreateModeEnum(String type) { this.type = type; } public String getType() { return type; } public static SchemaCreateModeEnum getEnum(String name) { for (SchemaCreateModeEnum value : SchemaCreateModeEnum.values()) { if (value.getType().equalsIgnoreCase(name)) { return value; } } log.info("use default CREATEIFNOTEXIT schame create mode"); return CREATEIFNOTEXIT; } } ================================================ FILE: milvuswriter/src/main/java/com/alibaba/datax/plugin/writer/milvuswriter/enums/WriteModeEnum.java ================================================ package com.alibaba.datax.plugin.writer.milvuswriter.enums; import lombok.extern.slf4j.Slf4j; @Slf4j public enum WriteModeEnum { INSERT("insert"), UPSERT("upsert"); String mode; public String getMode() { return mode; } WriteModeEnum(String mode) { this.mode = mode; } public static WriteModeEnum getEnum(String mode) { for (WriteModeEnum writeModeEnum : WriteModeEnum.values()) { if (writeModeEnum.getMode().equalsIgnoreCase(mode)) { return writeModeEnum; } } log.info("use default write mode upsert"); return UPSERT; } } ================================================ FILE: milvuswriter/src/main/resources/plugin.json ================================================ { "name": "milvuswriter", "class": "com.alibaba.datax.plugin.writer.milvuswriter.MilvusWriter", "description": "useScene: prod. mechanism: via milvusclient connect milvus write data concurrent.", "developer": "nianliuu" } ================================================ FILE: milvuswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "milvuswriter", "parameter": { "endpoint": "", "username": "", "password": "", "database": "", "collection": "", "column": [], "enableDynamicSchema": "" } } ================================================ FILE: mongodbreader/doc/mongodbreader.md ================================================ ### Datax MongoDBReader #### 1 快速介绍 MongoDBReader 插件利用 MongoDB 的java客户端MongoClient进行MongoDB的读操作。最新版本的Mongo已经将DB锁的粒度从DB级别降低到document级别,配合上MongoDB强大的索引功能,基本可以达到高性能的读取MongoDB的需求。 #### 2 实现原理 MongoDBReader通过Datax框架从MongoDB并行的读取数据,通过主控的JOB程序按照指定的规则对MongoDB中的数据进行分片,并行读取,然后将MongoDB支持的类型通过逐一判断转换成Datax支持的类型。 #### 3 功能说明 * 该示例从MongoDB读一份数据到ODPS。 { "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": { "name": "mongodbreader", "parameter": { "address": ["127.0.0.1:27017"], "userName": "", "userPassword": "", "dbName": "tag_per_data", "collectionName": "tag_data12", "column": [ { "name": "unique_id", "type": "string" }, { "name": "sid", "type": "string" }, { "name": "user_id", "type": "string" }, { "name": "auction_id", "type": "string" }, { "name": "content_type", "type": "string" }, { "name": "pool_type", "type": "string" }, { "name": "frontcat_id", "type": "Array", "spliter": "" }, { "name": "categoryid", "type": "Array", "spliter": "" }, { "name": "gmt_create", "type": "string" }, { "name": "taglist", "type": "Array", "spliter": " " }, { "name": "property", "type": "string" }, { "name": "scorea", "type": "int" }, { "name": "scoreb", "type": "int" }, { "name": "scorec", "type": "int" } ] } }, "writer": { "name": "odpswriter", "parameter": { "project": "tb_ai_recommendation", "table": "jianying_tag_datax_read_test01", "column": [ "unique_id", "sid", "user_id", "auction_id", "content_type", "pool_type", "frontcat_id", "categoryid", "gmt_create", "taglist", "property", "scorea", "scoreb" ], "accessId": "**************", "accessKey": "********************", "truncate": true, "odpsServer": "xxx/api", "tunnelServer": "xxx" } } } ] } } #### 4 参数说明 * address: MongoDB的数据地址信息,因为MonogDB可能是个集群,则ip端口信息需要以Json数组的形式给出。【必填】 * userName:MongoDB的用户名。【选填】 * userPassword: MongoDB的密码。【选填】 * authDb: MongoDB认证数据库【选填】 * collectionName: MonogoDB的集合名。【必填】 * column:MongoDB的文档列名。【必填】 * name:Column的名字。【必填】 * type:Column的类型。【选填】 * splitter:因为MongoDB支持数组类型,但是Datax框架本身不支持数组类型,所以mongoDB读出来的数组类型要通过这个分隔符合并成字符串。【选填】 * query: MongoDB的额外查询条件。【选填】 #### 5 类型转换 | DataX 内部类型| MongoDB 数据类型 | | -------- | ----- | | Long | int, Long | | Double | double | | String | string, array | | Date | date | | Boolean | boolean | | Bytes | bytes | #### 6 性能报告 #### 7 测试报告 ================================================ FILE: mongodbreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 mongodbreader com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic org.mongodb mongo-java-driver 3.2.2 com.google.guava guava 16.0.1 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: mongodbreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/mongodbreader target/ mongodbreader-0.0.1-SNAPSHOT.jar plugin/reader/mongodbreader false plugin/reader/mongodbreader/libs runtime ================================================ FILE: mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/KeyConstant.java ================================================ package com.alibaba.datax.plugin.reader.mongodbreader; /** * Created by jianying.wcj on 2015/3/17 0017. */ public class KeyConstant { /** * 数组类型 */ public static final String ARRAY_TYPE = "array"; /** * 嵌入文档数组类型 */ public static final String DOCUMENT_ARRAY_TYPE = "document.array"; /** * 嵌入文档类型 */ public static final String DOCUMENT_TYPE = "document"; /** * mongodb 的 host 地址 */ public static final String MONGO_ADDRESS = "address"; /** * mongodb 的用户名 */ public static final String MONGO_USER_NAME = "userName"; public static final String MONGO_USERNAME = "username"; /** * mongodb 密码 */ public static final String MONGO_USER_PASSWORD = "userPassword"; public static final String MONGO_PASSWORD = "password"; /** * mongodb 数据库名 */ public static final String MONGO_DB_NAME = "dbName"; public static final String MONGO_DATABASE = "database"; public static final String MONGO_AUTHDB = "authDb"; /** * mongodb 集合名 */ public static final String MONGO_COLLECTION_NAME = "collectionName"; /** * mongodb 查询条件 */ public static final String MONGO_QUERY = "query"; /** * mongodb 的列 */ public static final String MONGO_COLUMN = "column"; /** * 每个列的名字 */ public static final String COLUMN_NAME = "name"; /** * 每个列的类型 */ public static final String COLUMN_TYPE = "type"; /** * 列分隔符 */ public static final String COLUMN_SPLITTER = "splitter"; /** * 跳过的列数 */ public static final String SKIP_COUNT = "skipCount"; public static final String LOWER_BOUND = "lowerBound"; public static final String UPPER_BOUND = "upperBound"; public static final String IS_OBJECTID = "isObjectId"; /** * 批量获取的记录数 */ public static final String BATCH_SIZE = "batchSize"; /** * MongoDB的_id */ public static final String MONGO_PRIMARY_ID = "_id"; /** * MongoDB的错误码 */ public static final int MONGO_UNAUTHORIZED_ERR_CODE = 13; public static final int MONGO_ILLEGALOP_ERR_CODE = 20; /** * 判断是否为数组类型 * @param type 数据类型 * @return */ public static boolean isArrayType(String type) { return ARRAY_TYPE.equals(type) || DOCUMENT_ARRAY_TYPE.equals(type); } public static boolean isDocumentType(String type) { return type.startsWith(DOCUMENT_TYPE); } } ================================================ FILE: mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReader.java ================================================ package com.alibaba.datax.plugin.reader.mongodbreader; import java.util.ArrayList; import java.util.Arrays; import java.util.Date; import java.util.Iterator; import java.util.List; import com.alibaba.datax.common.element.BoolColumn; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.mongodbreader.util.CollectionSplitUtil; import com.alibaba.datax.plugin.reader.mongodbreader.util.MongoUtil; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import com.google.common.base.Joiner; import com.google.common.base.Strings; import com.mongodb.MongoClient; import com.mongodb.client.MongoCollection; import com.mongodb.client.MongoCursor; import com.mongodb.client.MongoDatabase; import org.bson.Document; import org.bson.types.ObjectId; /** * Created by jianying.wcj on 2015/3/19 0019. * Modified by mingyan.zc on 2016/6/13. * Modified by mingyan.zc on 2017/7/5. */ public class MongoDBReader extends Reader { public static class Job extends Reader.Job { private Configuration originalConfig = null; private MongoClient mongoClient; private String userName = null; private String password = null; @Override public List split(int adviceNumber) { return CollectionSplitUtil.doSplit(originalConfig,adviceNumber,mongoClient); } @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.userName = originalConfig.getString(KeyConstant.MONGO_USER_NAME, originalConfig.getString(KeyConstant.MONGO_USERNAME)); this.password = originalConfig.getString(KeyConstant.MONGO_USER_PASSWORD, originalConfig.getString(KeyConstant.MONGO_PASSWORD)); String database = originalConfig.getString(KeyConstant.MONGO_DB_NAME, originalConfig.getString(KeyConstant.MONGO_DATABASE)); String authDb = originalConfig.getString(KeyConstant.MONGO_AUTHDB, database); if(!Strings.isNullOrEmpty(this.userName) && !Strings.isNullOrEmpty(this.password)) { this.mongoClient = MongoUtil.initCredentialMongoClient(originalConfig,userName,password,authDb); } else { this.mongoClient = MongoUtil.initMongoClient(originalConfig); } } @Override public void destroy() { } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private MongoClient mongoClient; private String userName = null; private String password = null; private String authDb = null; private String database = null; private String collection = null; private String query = null; private JSONArray mongodbColumnMeta = null; private Object lowerBound = null; private Object upperBound = null; private boolean isObjectId = true; @Override public void startRead(RecordSender recordSender) { if(lowerBound== null || upperBound == null || mongoClient == null || database == null || collection == null || mongodbColumnMeta == null) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_VALUE, MongoDBReaderErrorCode.ILLEGAL_VALUE.getDescription()); } MongoDatabase db = mongoClient.getDatabase(database); MongoCollection col = db.getCollection(this.collection); MongoCursor dbCursor = null; Document filter = new Document(); if (lowerBound.equals("min")) { if (!upperBound.equals("max")) { filter.append(KeyConstant.MONGO_PRIMARY_ID, new Document("$lt", isObjectId ? new ObjectId(upperBound.toString()) : upperBound)); } } else if (upperBound.equals("max")) { filter.append(KeyConstant.MONGO_PRIMARY_ID, new Document("$gte", isObjectId ? new ObjectId(lowerBound.toString()) : lowerBound)); } else { filter.append(KeyConstant.MONGO_PRIMARY_ID, new Document("$gte", isObjectId ? new ObjectId(lowerBound.toString()) : lowerBound).append("$lt", isObjectId ? new ObjectId(upperBound.toString()) : upperBound)); } if(!Strings.isNullOrEmpty(query)) { Document queryFilter = Document.parse(query); filter = new Document("$and", Arrays.asList(filter, queryFilter)); } dbCursor = col.find(filter).iterator(); while (dbCursor.hasNext()) { Document item = dbCursor.next(); Record record = recordSender.createRecord(); Iterator columnItera = mongodbColumnMeta.iterator(); while (columnItera.hasNext()) { JSONObject column = (JSONObject)columnItera.next(); Object tempCol = item.get(column.getString(KeyConstant.COLUMN_NAME)); if (tempCol == null) { if (KeyConstant.isDocumentType(column.getString(KeyConstant.COLUMN_TYPE))) { String[] name = column.getString(KeyConstant.COLUMN_NAME).split("\\."); if (name.length > 1) { Object obj; Document nestedDocument = item; for (String str : name) { obj = nestedDocument.get(str); if (obj instanceof Document) { nestedDocument = (Document) obj; } } if (null != nestedDocument) { Document doc = nestedDocument; tempCol = doc.get(name[name.length - 1]); } } } } if (tempCol == null) { //continue; 这个不能直接continue会导致record到目的端错位 record.addColumn(new StringColumn(null)); }else if (tempCol instanceof Double) { //TODO deal with Double.isNaN() record.addColumn(new DoubleColumn((Double) tempCol)); } else if (tempCol instanceof Boolean) { record.addColumn(new BoolColumn((Boolean) tempCol)); } else if (tempCol instanceof Date) { record.addColumn(new DateColumn((Date) tempCol)); } else if (tempCol instanceof Integer) { record.addColumn(new LongColumn((Integer) tempCol)); }else if (tempCol instanceof Long) { record.addColumn(new LongColumn((Long) tempCol)); } else { if(KeyConstant.isArrayType(column.getString(KeyConstant.COLUMN_TYPE))) { String splitter = column.getString(KeyConstant.COLUMN_SPLITTER); if(Strings.isNullOrEmpty(splitter)) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_VALUE, MongoDBReaderErrorCode.ILLEGAL_VALUE.getDescription()); } else { ArrayList array = (ArrayList)tempCol; String tempArrayStr = Joiner.on(splitter).join(array); record.addColumn(new StringColumn(tempArrayStr)); } } else { record.addColumn(new StringColumn(tempCol.toString())); } } } recordSender.sendToWriter(record); } } @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.userName = readerSliceConfig.getString(KeyConstant.MONGO_USER_NAME, readerSliceConfig.getString(KeyConstant.MONGO_USERNAME)); this.password = readerSliceConfig.getString(KeyConstant.MONGO_USER_PASSWORD, readerSliceConfig.getString(KeyConstant.MONGO_PASSWORD)); this.database = readerSliceConfig.getString(KeyConstant.MONGO_DB_NAME, readerSliceConfig.getString(KeyConstant.MONGO_DATABASE)); this.authDb = readerSliceConfig.getString(KeyConstant.MONGO_AUTHDB, this.database); if(!Strings.isNullOrEmpty(userName) && !Strings.isNullOrEmpty(password)) { mongoClient = MongoUtil.initCredentialMongoClient(readerSliceConfig,userName,password,authDb); } else { mongoClient = MongoUtil.initMongoClient(readerSliceConfig); } this.collection = readerSliceConfig.getString(KeyConstant.MONGO_COLLECTION_NAME); this.query = readerSliceConfig.getString(KeyConstant.MONGO_QUERY); this.mongodbColumnMeta = JSON.parseArray(readerSliceConfig.getString(KeyConstant.MONGO_COLUMN)); this.lowerBound = readerSliceConfig.get(KeyConstant.LOWER_BOUND); this.upperBound = readerSliceConfig.get(KeyConstant.UPPER_BOUND); this.isObjectId = readerSliceConfig.getBool(KeyConstant.IS_OBJECTID); } @Override public void destroy() { } } } ================================================ FILE: mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/MongoDBReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.mongodbreader; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by jianying.wcj on 2015/3/19 0019. */ public enum MongoDBReaderErrorCode implements ErrorCode { ILLEGAL_VALUE("ILLEGAL_PARAMETER_VALUE","参数不合法"), ILLEGAL_ADDRESS("ILLEGAL_ADDRESS","不合法的Mongo地址"), UNEXCEPT_EXCEPTION("UNEXCEPT_EXCEPTION","未知异常"); private final String code; private final String description; private MongoDBReaderErrorCode(String code,String description) { this.code = code; this.description = description; } @Override public String getCode() { return code; } @Override public String getDescription() { return description; } } ================================================ FILE: mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/util/CollectionSplitUtil.java ================================================ package com.alibaba.datax.plugin.reader.mongodbreader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.mongodbreader.KeyConstant; import com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReaderErrorCode; import com.google.common.base.Strings; import com.mongodb.MongoClient; import com.mongodb.MongoCommandException; import com.mongodb.client.MongoCollection; import com.mongodb.client.MongoDatabase; import org.bson.Document; import org.bson.types.ObjectId; import java.util.ArrayList; import java.util.Arrays; import java.util.List; /** * Created by jianying.wcj on 2015/3/19 0019. * Modified by mingyan.zc on 2016/6/13. * Modified by mingyan.zc on 2017/7/5. */ public class CollectionSplitUtil { public static List doSplit( Configuration originalSliceConfig, int adviceNumber, MongoClient mongoClient) { List confList = new ArrayList(); String dbName = originalSliceConfig.getString(KeyConstant.MONGO_DB_NAME, originalSliceConfig.getString(KeyConstant.MONGO_DATABASE)); String collName = originalSliceConfig.getString(KeyConstant.MONGO_COLLECTION_NAME); if(Strings.isNullOrEmpty(dbName) || Strings.isNullOrEmpty(collName) || mongoClient == null) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_VALUE, MongoDBReaderErrorCode.ILLEGAL_VALUE.getDescription()); } boolean isObjectId = isPrimaryIdObjectId(mongoClient, dbName, collName); List rangeList = doSplitCollection(adviceNumber, mongoClient, dbName, collName, isObjectId); for(Range range : rangeList) { Configuration conf = originalSliceConfig.clone(); conf.set(KeyConstant.LOWER_BOUND, range.lowerBound); conf.set(KeyConstant.UPPER_BOUND, range.upperBound); conf.set(KeyConstant.IS_OBJECTID, isObjectId); confList.add(conf); } return confList; } private static boolean isPrimaryIdObjectId(MongoClient mongoClient, String dbName, String collName) { MongoDatabase database = mongoClient.getDatabase(dbName); MongoCollection col = database.getCollection(collName); Document doc = col.find().limit(1).first(); Object id = doc.get(KeyConstant.MONGO_PRIMARY_ID); if (id instanceof ObjectId) { return true; } return false; } // split the collection into multiple chunks, each chunk specifies a range private static List doSplitCollection(int adviceNumber, MongoClient mongoClient, String dbName, String collName, boolean isObjectId) { MongoDatabase database = mongoClient.getDatabase(dbName); List rangeList = new ArrayList(); if (adviceNumber == 1) { Range range = new Range(); range.lowerBound = "min"; range.upperBound = "max"; return Arrays.asList(range); } Document result = database.runCommand(new Document("collStats", collName)); int docCount = result.getInteger("count"); if (docCount == 0) { return rangeList; } int avgObjSize = 1; Object avgObjSizeObj = result.get("avgObjSize"); if (avgObjSizeObj instanceof Integer) { avgObjSize = ((Integer) avgObjSizeObj).intValue(); } else if (avgObjSizeObj instanceof Double) { avgObjSize = ((Double) avgObjSizeObj).intValue(); } int splitPointCount = adviceNumber - 1; int chunkDocCount = docCount / adviceNumber; ArrayList splitPoints = new ArrayList(); // test if user has splitVector role(clusterManager) boolean supportSplitVector = true; try { database.runCommand(new Document("splitVector", dbName + "." + collName) .append("keyPattern", new Document(KeyConstant.MONGO_PRIMARY_ID, 1)) .append("force", true)); } catch (MongoCommandException e) { if (e.getErrorCode() == KeyConstant.MONGO_UNAUTHORIZED_ERR_CODE || e.getErrorCode() == KeyConstant.MONGO_ILLEGALOP_ERR_CODE) { supportSplitVector = false; } } if (supportSplitVector) { boolean forceMedianSplit = false; int maxChunkSize = (docCount / splitPointCount - 1) * 2 * avgObjSize / (1024 * 1024); //int maxChunkSize = (chunkDocCount - 1) * 2 * avgObjSize / (1024 * 1024); if (maxChunkSize < 1) { forceMedianSplit = true; } if (!forceMedianSplit) { result = database.runCommand(new Document("splitVector", dbName + "." + collName) .append("keyPattern", new Document(KeyConstant.MONGO_PRIMARY_ID, 1)) .append("maxChunkSize", maxChunkSize) .append("maxSplitPoints", adviceNumber - 1)); } else { result = database.runCommand(new Document("splitVector", dbName + "." + collName) .append("keyPattern", new Document(KeyConstant.MONGO_PRIMARY_ID, 1)) .append("force", true)); } ArrayList splitKeys = result.get("splitKeys", ArrayList.class); for (int i = 0; i < splitKeys.size(); i++) { Document splitKey = splitKeys.get(i); Object id = splitKey.get(KeyConstant.MONGO_PRIMARY_ID); if (isObjectId) { ObjectId oid = (ObjectId)id; splitPoints.add(oid.toHexString()); } else { splitPoints.add(id); } } } else { int skipCount = chunkDocCount; MongoCollection col = database.getCollection(collName); for (int i = 0; i < splitPointCount; i++) { Document doc = col.find().skip(skipCount).limit(chunkDocCount).first(); Object id = doc.get(KeyConstant.MONGO_PRIMARY_ID); if (isObjectId) { ObjectId oid = (ObjectId)id; splitPoints.add(oid.toHexString()); } else { splitPoints.add(id); } skipCount += chunkDocCount; } } Object lastObjectId = "min"; for (Object splitPoint : splitPoints) { Range range = new Range(); range.lowerBound = lastObjectId; lastObjectId = splitPoint; range.upperBound = lastObjectId; rangeList.add(range); } Range range = new Range(); range.lowerBound = lastObjectId; range.upperBound = "max"; rangeList.add(range); return rangeList; } } class Range { Object lowerBound; Object upperBound; } ================================================ FILE: mongodbreader/src/main/java/com/alibaba/datax/plugin/reader/mongodbreader/util/MongoUtil.java ================================================ package com.alibaba.datax.plugin.reader.mongodbreader.util; import java.net.UnknownHostException; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.mongodbreader.KeyConstant; import com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReaderErrorCode; import com.mongodb.MongoClient; import com.mongodb.MongoCredential; import com.mongodb.ServerAddress; /** * Created by jianying.wcj on 2015/3/17 0017. * Modified by mingyan.zc on 2016/6/13. */ public class MongoUtil { public static MongoClient initMongoClient(Configuration conf) { List addressList = conf.getList(KeyConstant.MONGO_ADDRESS); if(addressList == null || addressList.size() <= 0) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_VALUE,"不合法参数"); } try { return new MongoClient(parseServerAddress(addressList)); } catch (UnknownHostException e) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_ADDRESS,"不合法的地址"); } catch (NumberFormatException e) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_VALUE,"不合法参数"); } catch (Exception e) { throw DataXException.asDataXException(MongoDBReaderErrorCode.UNEXCEPT_EXCEPTION,"未知异常"); } } public static MongoClient initCredentialMongoClient(Configuration conf, String userName, String password, String database) { List addressList = conf.getList(KeyConstant.MONGO_ADDRESS); if(!isHostPortPattern(addressList)) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_VALUE,"不合法参数"); } try { MongoCredential credential = MongoCredential.createCredential(userName, database, password.toCharArray()); return new MongoClient(parseServerAddress(addressList), Arrays.asList(credential)); } catch (UnknownHostException e) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_ADDRESS,"不合法的地址"); } catch (NumberFormatException e) { throw DataXException.asDataXException(MongoDBReaderErrorCode.ILLEGAL_VALUE,"不合法参数"); } catch (Exception e) { throw DataXException.asDataXException(MongoDBReaderErrorCode.UNEXCEPT_EXCEPTION,"未知异常"); } } /** * 判断地址类型是否符合要求 * @param addressList * @return */ private static boolean isHostPortPattern(List addressList) { for(Object address : addressList) { String regex = "(\\S+):([0-9]+)"; if(!((String)address).matches(regex)) { return false; } } return true; } /** * 转换为mongo地址协议 * @param rawAddressList * @return */ private static List parseServerAddress(List rawAddressList) throws UnknownHostException{ List addressList = new ArrayList(); for(Object address : rawAddressList) { String[] tempAddress = ((String)address).split(":"); try { ServerAddress sa = new ServerAddress(tempAddress[0],Integer.valueOf(tempAddress[1])); addressList.add(sa); } catch (Exception e) { throw new UnknownHostException(); } } return addressList; } } ================================================ FILE: mongodbreader/src/main/resources/plugin.json ================================================ { "name": "mongodbreader", "class": "com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader", "description": "useScene: prod. mechanism: via mongoclient connect mongodb reader data concurrent.", "developer": "alibaba" } ================================================ FILE: mongodbreader/src/main/resources/plugin_job_template.json ================================================ { "name": "mongodbreader", "parameter": { "address": [], "userName": "", "userPassword": "", "dbName": "", "collectionName": "", "column": [] } } ================================================ FILE: mongodbwriter/doc/mongodbwriter.md ================================================ ### Datax MongoDBWriter #### 1 快速介绍 MongoDBWriter 插件利用 MongoDB 的java客户端MongoClient进行MongoDB的写操作。最新版本的Mongo已经将DB锁的粒度从DB级别降低到document级别,配合上MongoDB强大的索引功能,基本可以满足数据源向MongoDB写入数据的需求,针对数据更新的需求,通过配置业务主键的方式也可以实现。 #### 2 实现原理 MongoDBWriter通过Datax框架获取Reader生成的数据,然后将Datax支持的类型通过逐一判断转换成MongoDB支持的类型。其中一个值得指出的点就是Datax本身不支持数组类型,但是MongoDB支持数组类型,并且数组类型的索引还是蛮强大的。为了使用MongoDB的数组类型,则可以通过参数的特殊配置,将字符串可以转换成MongoDB中的数组。类型转换之后,就可以依托于Datax框架并行的写入MongoDB。 #### 3 功能说明 * 该示例从ODPS读一份数据到MongoDB。 { "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": { "name": "odpsreader", "parameter": { "accessId": "********", "accessKey": "*********", "project": "tb_ai_recommendation", "table": "jianying_tag_datax_test", "column": [ "unique_id", "sid", "user_id", "auction_id", "content_type", "pool_type", "frontcat_id", "categoryid", "gmt_create", "taglist", "property", "scorea", "scoreb" ], "splitMode": "record", "odpsServer": "http://xxx/api" } }, "writer": { "name": "mongodbwriter", "parameter": { "address": [ "127.0.0.1:27017" ], "userName": "", "userPassword": "", "dbName": "tag_per_data", "collectionName": "tag_data", "column": [ { "name": "unique_id", "type": "string" }, { "name": "sid", "type": "string" }, { "name": "user_id", "type": "string" }, { "name": "auction_id", "type": "string" }, { "name": "content_type", "type": "string" }, { "name": "pool_type", "type": "string" }, { "name": "frontcat_id", "type": "Array", "splitter": " " }, { "name": "categoryid", "type": "Array", "splitter": " " }, { "name": "gmt_create", "type": "string" }, { "name": "taglist", "type": "Array", "splitter": " " }, { "name": "property", "type": "string" }, { "name": "scorea", "type": "int" }, { "name": "scoreb", "type": "int" }, { "name": "scorec", "type": "int" } ], "writeMode": { "isReplace": "true", "replaceKey": "unique_id" } } } } ] } } #### 4 参数说明 * address: MongoDB的数据地址信息,因为MonogDB可能是个集群,则ip端口信息需要以Json数组的形式给出。【必填】 * userName:MongoDB的用户名。【选填】 * userPassword: MongoDB的密码。【选填】 * collectionName: MonogoDB的集合名。【必填】 * column:MongoDB的文档列名。【必填】 * name:Column的名字。【必填】 * type:Column的类型。【必填】 * splitter:特殊分隔符,当且仅当要处理的字符串要用分隔符分隔为字符数组时,才使用这个参数,通过这个参数指定的分隔符,将字符串分隔存储到MongoDB的数组中。【选填】 * writeMode:指定了传输数据时更新的信息。【选填】 * isReplace:当设置为true时,表示针对相同的replaceKey做更新操作。【选填】 * replaceKey:replaceKey指定了每行记录的业务主键。用来做更新时使用。【选填】 #### 5 类型转换 | DataX 内部类型| MongoDB 数据类型 | | -------- | ----- | | Long | int, Long | | Double | double | | String | string, array | | Date | date | | Boolean | boolean | | Bytes | bytes | #### 6 性能报告 #### 7 测试报告 ================================================ FILE: mongodbwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 mongodbwriter com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic org.mongodb mongo-java-driver 3.2.2 com.google.guava guava 16.0.1 com.alibaba.datax plugin-rdbms-util ${datax-project-version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: mongodbwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/mongodbwriter target/ mongodbwriter-0.0.1-SNAPSHOT.jar plugin/writer/mongodbwriter false plugin/writer/mongodbwriter/libs runtime ================================================ FILE: mongodbwriter/src/main/java/com/alibaba/datax/plugin/writer/mongodbwriter/KeyConstant.java ================================================ package com.alibaba.datax.plugin.writer.mongodbwriter; public class KeyConstant { /** * mongodb 的 host 地址 */ public static final String MONGO_ADDRESS = "address"; /** * 数组类型 */ public static final String ARRAY_TYPE = "array"; /** * ObjectId类型 */ public static final String OBJECT_ID_TYPE = "objectid"; /** * mongodb 的用户名 */ public static final String MONGO_USER_NAME = "userName"; /** * mongodb 密码 */ public static final String MONGO_USER_PASSWORD = "userPassword"; /** * mongodb 数据库名 */ public static final String MONGO_DB_NAME = "dbName"; /** * mongodb 集合名 */ public static final String MONGO_COLLECTION_NAME = "collectionName"; /** * mongodb 的列 */ public static final String MONGO_COLUMN = "column"; /** * 每个列的名字 */ public static final String COLUMN_NAME = "name"; /** * 每个列的类型 */ public static final String COLUMN_TYPE = "type"; /** * 数组中每个元素的类型 */ public static final String ITEM_TYPE = "itemtype"; /** * 列分隔符 */ public static final String COLUMN_SPLITTER = "splitter"; /** * 数据更新列信息 */ public static final String WRITE_MODE = "writeMode"; /** * 有相同的记录是否覆盖,默认为false */ public static final String IS_REPLACE = "isReplace"; /** * 指定用来判断是否覆盖的 业务主键 */ public static final String UNIQUE_KEY = "replaceKey"; /** * 判断是否为数组类型 * @param type 数据类型 * @return */ public static boolean isArrayType(String type) { return ARRAY_TYPE.equals(type); } /** * 判断是否为ObjectId类型 * @param type 数据类型 * @return */ public static boolean isObjectIdType(String type) { return OBJECT_ID_TYPE.equals(type); } /** * 判断一个值是否为true * @param value * @return */ public static boolean isValueTrue(String value){ return "true".equals(value); } } ================================================ FILE: mongodbwriter/src/main/java/com/alibaba/datax/plugin/writer/mongodbwriter/MongoDBWriter.java ================================================ package com.alibaba.datax.plugin.writer.mongodbwriter; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.writer.mongodbwriter.util.MongoUtil; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import com.google.common.base.Strings; import com.mongodb.*; import com.mongodb.client.MongoCollection; import com.mongodb.client.MongoDatabase; import com.mongodb.client.model.BulkWriteOptions; import com.mongodb.client.model.ReplaceOneModel; import com.mongodb.client.model.UpdateOptions; import org.bson.types.ObjectId; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public class MongoDBWriter extends Writer{ public static class Job extends Writer.Job { private Configuration originalConfig = null; @Override public List split(int mandatoryNumber) { List configList = new ArrayList(); for(int i = 0; i < mandatoryNumber; i++) { configList.add(this.originalConfig.clone()); } return configList; } @Override public void init() { this.originalConfig = super.getPluginJobConf(); } @Override public void prepare() { super.prepare(); } @Override public void destroy() { } } public static class Task extends Writer.Task { private static final Logger logger = LoggerFactory.getLogger(Task.class); private Configuration writerSliceConfig; private MongoClient mongoClient; private String userName = null; private String password = null; private String database = null; private String collection = null; private Integer batchSize = null; private JSONArray mongodbColumnMeta = null; private JSONObject writeMode = null; private static int BATCH_SIZE = 1000; @Override public void prepare() { super.prepare(); //获取presql配置,并执行 String preSql = writerSliceConfig.getString(Key.PRE_SQL); if(Strings.isNullOrEmpty(preSql)) { return; } Configuration conConf = Configuration.from(preSql); if(Strings.isNullOrEmpty(database) || Strings.isNullOrEmpty(collection) || mongoClient == null || mongodbColumnMeta == null || batchSize == null) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_VALUE, MongoDBWriterErrorCode.ILLEGAL_VALUE.getDescription()); } MongoDatabase db = mongoClient.getDatabase(database); MongoCollection col = db.getCollection(this.collection); String type = conConf.getString("type"); if (Strings.isNullOrEmpty(type)){ return; } if (type.equals("drop")){ col.drop(); } else if (type.equals("remove")){ String json = conConf.getString("json"); BasicDBObject query; if (Strings.isNullOrEmpty(json)) { query = new BasicDBObject(); List items = conConf.getList("item", Object.class); for (Object con : items) { Configuration _conf = Configuration.from(con.toString()); if (Strings.isNullOrEmpty(_conf.getString("condition"))) { query.put(_conf.getString("name"), _conf.get("value")); } else { query.put(_conf.getString("name"), new BasicDBObject(_conf.getString("condition"), _conf.get("value"))); } } // and { "pv" : { "$gt" : 200 , "$lt" : 3000} , "pid" : { "$ne" : "xxx"}} // or { "$or" : [ { "age" : { "$gt" : 27}} , { "age" : { "$lt" : 15}}]} } else { query = (BasicDBObject) com.mongodb.util.JSON.parse(json); } col.deleteMany(query); } if(logger.isDebugEnabled()) { logger.debug("After job prepare(), originalConfig now is:[\n{}\n]", writerSliceConfig.toJSON()); } } @Override public void startWrite(RecordReceiver lineReceiver) { if(Strings.isNullOrEmpty(database) || Strings.isNullOrEmpty(collection) || mongoClient == null || mongodbColumnMeta == null || batchSize == null) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_VALUE, MongoDBWriterErrorCode.ILLEGAL_VALUE.getDescription()); } MongoDatabase db = mongoClient.getDatabase(database); MongoCollection col = db.getCollection(this.collection, BasicDBObject.class); List writerBuffer = new ArrayList(this.batchSize); Record record = null; while((record = lineReceiver.getFromReader()) != null) { writerBuffer.add(record); if(writerBuffer.size() >= this.batchSize) { doBatchInsert(col,writerBuffer,mongodbColumnMeta); writerBuffer.clear(); } } if(!writerBuffer.isEmpty()) { doBatchInsert(col,writerBuffer,mongodbColumnMeta); writerBuffer.clear(); } } private void doBatchInsert(MongoCollection collection, List writerBuffer, JSONArray columnMeta) { List dataList = new ArrayList(); for(Record record : writerBuffer) { BasicDBObject data = new BasicDBObject(); for(int i = 0; i < record.getColumnNumber(); i++) { String type = columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_TYPE); //空记录处理 if (Strings.isNullOrEmpty(record.getColumn(i).asString())) { if (KeyConstant.isArrayType(type.toLowerCase())) { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), new Object[0]); } else { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), record.getColumn(i).asString()); } continue; } if (Column.Type.INT.name().equalsIgnoreCase(type)) { //int是特殊类型, 其他类型按照保存时Column的类型进行处理 try { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), Integer.parseInt( String.valueOf(record.getColumn(i).getRawData()))); } catch (Exception e) { super.getTaskPluginCollector().collectDirtyRecord(record, e); } } else if(record.getColumn(i) instanceof StringColumn){ //处理ObjectId和数组类型 try { if (KeyConstant.isObjectIdType(type.toLowerCase())) { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), new ObjectId(record.getColumn(i).asString())); } else if (KeyConstant.isArrayType(type.toLowerCase())) { String splitter = columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_SPLITTER); if (Strings.isNullOrEmpty(splitter)) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_VALUE, MongoDBWriterErrorCode.ILLEGAL_VALUE.getDescription()); } String itemType = columnMeta.getJSONObject(i).getString(KeyConstant.ITEM_TYPE); if (itemType != null && !itemType.isEmpty()) { //如果数组指定类型不为空,将其转换为指定类型 String[] item = record.getColumn(i).asString().split(splitter); if (itemType.equalsIgnoreCase(Column.Type.DOUBLE.name())) { ArrayList list = new ArrayList(); for (String s : item) { list.add(Double.parseDouble(s)); } data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), list.toArray(new Double[0])); } else if (itemType.equalsIgnoreCase(Column.Type.INT.name())) { ArrayList list = new ArrayList(); for (String s : item) { list.add(Integer.parseInt(s)); } data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), list.toArray(new Integer[0])); } else if (itemType.equalsIgnoreCase(Column.Type.LONG.name())) { ArrayList list = new ArrayList(); for (String s : item) { list.add(Long.parseLong(s)); } data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), list.toArray(new Long[0])); } else if (itemType.equalsIgnoreCase(Column.Type.BOOL.name())) { ArrayList list = new ArrayList(); for (String s : item) { list.add(Boolean.parseBoolean(s)); } data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), list.toArray(new Boolean[0])); } else if (itemType.equalsIgnoreCase(Column.Type.BYTES.name())) { ArrayList list = new ArrayList(); for (String s : item) { list.add(Byte.parseByte(s)); } data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), list.toArray(new Byte[0])); } else { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), record.getColumn(i).asString().split(splitter)); } } else { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), record.getColumn(i).asString().split(splitter)); } } else if(type.toLowerCase().equalsIgnoreCase("json")) { //如果是json类型,将其进行转换 Object mode = com.mongodb.util.JSON.parse(record.getColumn(i).asString()); data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME),JSON.toJSON(mode)); } else { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), record.getColumn(i).asString()); } } catch (Exception e) { super.getTaskPluginCollector().collectDirtyRecord(record, e); } } else if(record.getColumn(i) instanceof LongColumn) { if (Column.Type.LONG.name().equalsIgnoreCase(type)) { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME),record.getColumn(i).asLong()); } else { super.getTaskPluginCollector().collectDirtyRecord(record, "record's [" + i + "] column's type should be: " + type); } } else if(record.getColumn(i) instanceof DateColumn) { if (Column.Type.DATE.name().equalsIgnoreCase(type)) { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), record.getColumn(i).asDate()); } else { super.getTaskPluginCollector().collectDirtyRecord(record, "record's [" + i + "] column's type should be: " + type); } } else if(record.getColumn(i) instanceof DoubleColumn) { if (Column.Type.DOUBLE.name().equalsIgnoreCase(type)) { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), record.getColumn(i).asDouble()); } else { super.getTaskPluginCollector().collectDirtyRecord(record, "record's [" + i + "] column's type should be: " + type); } } else if(record.getColumn(i) instanceof BoolColumn) { if (Column.Type.BOOL.name().equalsIgnoreCase(type)) { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), record.getColumn(i).asBoolean()); } else { super.getTaskPluginCollector().collectDirtyRecord(record, "record's [" + i + "] column's type should be: " + type); } } else if(record.getColumn(i) instanceof BytesColumn) { if (Column.Type.BYTES.name().equalsIgnoreCase(type)) { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME), record.getColumn(i).asBytes()); } else { super.getTaskPluginCollector().collectDirtyRecord(record, "record's [" + i + "] column's type should be: " + type); } } else { data.put(columnMeta.getJSONObject(i).getString(KeyConstant.COLUMN_NAME),record.getColumn(i).asString()); } } dataList.add(data); } /** * 如果存在重复的值覆盖 */ if(this.writeMode != null && this.writeMode.getString(KeyConstant.IS_REPLACE) != null && KeyConstant.isValueTrue(this.writeMode.getString(KeyConstant.IS_REPLACE))) { String uniqueKey = this.writeMode.getString(KeyConstant.UNIQUE_KEY); if(!Strings.isNullOrEmpty(uniqueKey)) { List> replaceOneModelList = new ArrayList>(); for(BasicDBObject data : dataList) { BasicDBObject query = new BasicDBObject(); if(uniqueKey != null) { query.put(uniqueKey,data.get(uniqueKey)); } ReplaceOneModel replaceOneModel = new ReplaceOneModel(query, data, new UpdateOptions().upsert(true)); replaceOneModelList.add(replaceOneModel); } collection.bulkWrite(replaceOneModelList, new BulkWriteOptions().ordered(false)); } else { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_VALUE, MongoDBWriterErrorCode.ILLEGAL_VALUE.getDescription()); } } else { collection.insertMany(dataList); } } @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.userName = writerSliceConfig.getString(KeyConstant.MONGO_USER_NAME); this.password = writerSliceConfig.getString(KeyConstant.MONGO_USER_PASSWORD); this.database = writerSliceConfig.getString(KeyConstant.MONGO_DB_NAME); if(!Strings.isNullOrEmpty(userName) && !Strings.isNullOrEmpty(password)) { this.mongoClient = MongoUtil.initCredentialMongoClient(this.writerSliceConfig,userName,password,database); } else { this.mongoClient = MongoUtil.initMongoClient(this.writerSliceConfig); } this.collection = writerSliceConfig.getString(KeyConstant.MONGO_COLLECTION_NAME); this.batchSize = BATCH_SIZE; this.mongodbColumnMeta = JSON.parseArray(writerSliceConfig.getString(KeyConstant.MONGO_COLUMN)); this.writeMode = JSON.parseObject(writerSliceConfig.getString(KeyConstant.WRITE_MODE)); } @Override public void destroy() { mongoClient.close(); } } } ================================================ FILE: mongodbwriter/src/main/java/com/alibaba/datax/plugin/writer/mongodbwriter/MongoDBWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.mongodbwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum MongoDBWriterErrorCode implements ErrorCode { ILLEGAL_VALUE("ILLEGAL_PARAMETER_VALUE","参数不合法"), ILLEGAL_ADDRESS("ILLEGAL_ADDRESS","不合法的Mongo地址"), JSONCAST_EXCEPTION("JSONCAST_EXCEPTION","json类型转换异常"), UNEXCEPT_EXCEPTION("UNEXCEPT_EXCEPTION","未知异常"); private final String code; private final String description; private MongoDBWriterErrorCode(String code,String description) { this.code = code; this.description = description; } @Override public String getCode() { return code; } @Override public String getDescription() { return description; } } ================================================ FILE: mongodbwriter/src/main/java/com/alibaba/datax/plugin/writer/mongodbwriter/util/MongoUtil.java ================================================ package com.alibaba.datax.plugin.writer.mongodbwriter.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.mongodbwriter.KeyConstant; import com.alibaba.datax.plugin.writer.mongodbwriter.MongoDBWriterErrorCode; import com.mongodb.MongoClient; import com.mongodb.MongoCredential; import com.mongodb.ServerAddress; import java.net.UnknownHostException; import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class MongoUtil { public static MongoClient initMongoClient(Configuration conf) { List addressList = conf.getList(KeyConstant.MONGO_ADDRESS); if(addressList == null || addressList.size() <= 0) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_VALUE,"不合法参数"); } try { return new MongoClient(parseServerAddress(addressList)); } catch (UnknownHostException e) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_ADDRESS,"不合法的地址"); } catch (NumberFormatException e) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_VALUE,"不合法参数"); } catch (Exception e) { throw DataXException.asDataXException(MongoDBWriterErrorCode.UNEXCEPT_EXCEPTION,"未知异常"); } } public static MongoClient initCredentialMongoClient(Configuration conf,String userName,String password,String database) { List addressList = conf.getList(KeyConstant.MONGO_ADDRESS); if(!isHostPortPattern(addressList)) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_VALUE,"不合法参数"); } try { MongoCredential credential = MongoCredential.createCredential(userName, database, password.toCharArray()); return new MongoClient(parseServerAddress(addressList), Arrays.asList(credential)); } catch (UnknownHostException e) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_ADDRESS,"不合法的地址"); } catch (NumberFormatException e) { throw DataXException.asDataXException(MongoDBWriterErrorCode.ILLEGAL_VALUE,"不合法参数"); } catch (Exception e) { throw DataXException.asDataXException(MongoDBWriterErrorCode.UNEXCEPT_EXCEPTION,"未知异常"); } } /** * 判断地址类型是否符合要求 * @param addressList * @return */ private static boolean isHostPortPattern(List addressList) { for(Object address : addressList) { String regex = "(\\S+):([0-9]+)"; if(!((String)address).matches(regex)) { return false; } } return true; } /** * 转换为mongo地址协议 * @param rawAddressList * @return */ private static List parseServerAddress(List rawAddressList) throws UnknownHostException{ List addressList = new ArrayList(); for(Object address : rawAddressList) { String[] tempAddress = ((String)address).split(":"); try { ServerAddress sa = new ServerAddress(tempAddress[0],Integer.valueOf(tempAddress[1])); addressList.add(sa); } catch (Exception e) { throw new UnknownHostException(); } } return addressList; } public static void main(String[] args) { try { ArrayList hostAddress = new ArrayList(); hostAddress.add("127.0.0.1:27017"); System.out.println(MongoUtil.isHostPortPattern(hostAddress)); } catch (Exception e) { e.printStackTrace(); } } } ================================================ FILE: mongodbwriter/src/main/resources/plugin.json ================================================ { "name": "mongodbwriter", "class": "com.alibaba.datax.plugin.writer.mongodbwriter.MongoDBWriter", "description": "useScene: prod. mechanism: via mongoclient connect mongodb write data concurrent.", "developer": "alibaba" } ================================================ FILE: mongodbwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "mongodbwriter", "parameter": { "address": [], "userName": "", "userPassword": "", "dbName": "", "collectionName": "", "column": [], "upsertInfo": { "isUpsert": "", "upsertKey": "" } } } ================================================ FILE: mysqlreader/doc/mysqlreader.md ================================================ # MysqlReader 插件文档 ___ ## 1 快速介绍 MysqlReader插件实现了从Mysql读取数据。在底层实现上,MysqlReader通过JDBC连接远程Mysql数据库,并执行相应的sql语句将数据从mysql库中SELECT出来。 **不同于其他关系型数据库,MysqlReader不支持FetchSize.** ## 2 实现原理 简而言之,MysqlReader通过JDBC连接器连接到远程的Mysql数据库,并根据用户配置的信息生成查询SELECT SQL语句,然后发送到远程Mysql数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,MysqlReader将其拼接为SQL语句发送到Mysql数据库;对于用户配置querySql信息,MysqlReader直接将其发送到Mysql数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Mysql数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { "channel": 3 }, "errorLimit": { "record": 0, "percentage": 0.02 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "root", "column": [ "id", "name" ], "splitPk": "db_id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/database" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print":true } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "speed": { "channel":1 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "root", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;" ], "jdbcUrl": [ "jdbc:mysql://bad_ip:3306/database", "jdbc:mysql://127.0.0.1:bad_port/database", "jdbc:mysql://127.0.0.1:3306/database" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,MysqlReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,MysqlReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 jdbcUrl按照Mysql官方规范,并可以填写连接附件控制信息。具体请参看[Mysql官方文档](http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html)。 * 必选:是
* 默认值:无
* **username** * 描述:数据源的用户名
* 必选:是
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,MysqlReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照Mysql SQL语法格式: ["id", "\`table\`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] id为普通列名,\`table\`为包含保留字的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 * 必选:是
* 默认值:无
* **splitPk** * 描述:MysqlReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。  目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,MysqlReader将报错! 如果splitPk不填写,包括不提供splitPk或者splitPk值为空,DataX视作使用单通道同步该表数据。 * 必选:否
* 默认值:空
* **where** * 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
where条件可以有效地进行业务增量同步。如果不填写where语句,包括不提供where的key或者value,DataX均视作同步全量数据。 * 必选:否
* 默认值:无
* **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
`当用户配置querySql时,MysqlReader直接忽略table、column、where条件的配置`,querySql优先级大于table、column、where选项。 * 必选:否
* 默认值:无
### 3.3 类型转换 目前MysqlReader支持大部分Mysql类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出MysqlReader针对Mysql类型转换列表: | DataX 内部类型| Mysql 数据类型 | | -------- | ----- | | Long |int, tinyint, smallint, mediumint, int, bigint| | Double |float, double, decimal| | String |varchar, char, tinytext, text, mediumtext, longtext, year | | Date |date, datetime, timestamp, time | | Boolean |bit, bool | | Bytes |tinyblob, mediumblob, blob, longblob, varbinary | 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 * `tinyint(1) DataX视作为整形`。 * `year DataX视作为字符串类型` * `bit DataX属于未定义行为`。 ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: CREATE TABLE `tc_biz_vertical_test_0000` ( `biz_order_id` bigint(20) NOT NULL COMMENT 'id', `key_value` varchar(4000) NOT NULL COMMENT 'Key-value的内容', `gmt_create` datetime NOT NULL COMMENT '创建时间', `gmt_modified` datetime NOT NULL COMMENT '修改时间', `attribute_cc` int(11) DEFAULT NULL COMMENT '防止并发修改的标志', `value_type` int(11) NOT NULL DEFAULT '0' COMMENT '类型', `buyer_id` bigint(20) DEFAULT NULL COMMENT 'buyerid', `seller_id` bigint(20) DEFAULT NULL COMMENT 'seller_id', PRIMARY KEY (`biz_order_id`,`value_type`), KEY `idx_biz_vertical_gmtmodified` (`gmt_modified`) ) ENGINE=InnoDB DEFAULT CHARSET=gbk COMMENT='tc_biz_vertical' 单行记录类似于: biz_order_id: 888888888 key_value: ;orderIds:20148888888,2014888888813800; gmt_create: 2011-09-24 11:07:20 gmt_modified: 2011-10-24 17:56:34 attribute_cc: 1 value_type: 3 buyer_id: 8888888 seller_id: 1 #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 24核 Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz 2. mem: 48GB 3. net: 千兆双网卡 4. disc: DataX 数据不落磁盘,不统计此项 * Mysql数据库机器参数为: 1. cpu: 32核 Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz 2. mem: 256GB 3. net: 千兆双网卡 4. disc: BTWL419303E2800RGN INTEL SSDSC2BB800G4 D2010370 #### 4.1.3 DataX jvm 参数 -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数| 是否按照主键切分| DataX速度(Rec/s)|DataX流量(MB/s)| DataX机器网卡进入流量(MB/s)|DataX机器运行负载|DB网卡流出流量(MB/s)|DB运行负载| |--------|--------| --------|--------|--------|--------|--------|--------| |1| 否 | 183185 | 18.11 | 29| 0.6 | 31| 0.6 | |1| 是 | 183185 | 18.11 | 29| 0.6 | 31| 0.6 | |4| 否 | 183185 | 18.11 | 29| 0.6 | 31| 0.6 | |4| 是 | 329733 | 32.60 | 58| 0.8 | 60| 0.76 | |8| 否 | 183185 | 18.11 | 29| 0.6 | 31| 0.6 | |8| 是 | 549556 | 54.33 | 115| 1.46 | 120| 0.78 | 说明: 1. 这里的单表,主键类型为 bigint(20),范围为:190247559466810-570722244711460,从主键范围划分看,数据分布均匀。 2. 对单表如果没有安装主键切分,那么配置通道个数不会提升速度,效果与1个通道一样。 #### 4.2.2 分表测试报告(2个分库,每个分库16张分表,共计32张分表) | 通道数| DataX速度(Rec/s)|DataX流量(MB/s)| DataX机器网卡进入流量(MB/s)|DataX机器运行负载|DB网卡流出流量(MB/s)|DB运行负载| |--------| --------|--------|--------|--------|--------|--------| |1| 202241 | 20.06 | 31.5| 1.0 | 32 | 1.1 | |4| 726358 | 72.04 | 123.9 | 3.1 | 132 | 3.6 | |8|1074405 | 106.56| 197 | 5.5 | 205| 5.1| |16| 1227892 | 121.79 | 229.2 | 8.1 | 233 | 7.3 | ## 5 约束限制 ### 5.1 主备同步数据恢复问题 主备同步问题指Mysql使用主从灾备,备库从主库不间断通过binlog恢复数据。由于主备数据同步存在一定的时间差,特别在于某些特定情况,例如网络延迟等问题,导致备库同步恢复的数据与主库有较大差别,导致从备库同步的数据不是一份当前时间的完整镜像。 针对这个问题,我们提供了preSql功能,该功能待补充。 ### 5.2 一致性约束 Mysql在数据存储划分中属于RDBMS系统,对外可以提供强一致性数据查询接口。例如当一次同步任务启动运行过程中,当该库存在其他数据写入方写入数据时,MysqlReader完全不会获取到写入更新数据,这是由于数据库本身的快照特性决定的。关于数据库快照特性,请参看[MVCC Wikipedia](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) 上述是在MysqlReader单线程模型下数据同步一致性的特性,由于MysqlReader可以根据用户配置信息使用了并发数据抽取,因此不能严格保证数据一致性:当MysqlReader根据splitPk进行数据切分后,会先后启动多个并发任务完成数据同步。由于多个并发任务相互之间不属于同一个读事务,同时多个并发任务存在时间间隔。因此这份数据并不是`完整的`、`一致的`数据快照信息。 针对多线程的一致性快照需求,在技术上目前无法实现,只能从工程角度解决,工程化的方式存在取舍,我们提供几个解决思路给用户,用户可以自行选择: 1. 使用单线程同步,即不再进行数据切片。缺点是速度比较慢,但是能够很好保证一致性。 2. 关闭其他数据写入方,保证当前数据为静态数据,例如,锁表、关闭备库同步等等。缺点是可能影响在线业务。 ### 5.3 数据库编码问题 Mysql本身的编码设置非常灵活,包括指定编码到库、表、字段级别,甚至可以均不同编码。优先级从高到低为字段、表、库、实例。我们不推荐数据库用户设置如此混乱的编码,最好在库级别就统一到UTF-8。 MysqlReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此MysqlReader不需用户指定编码,可以自动获取编码并转码。 对于Mysql底层写入编码和其设定的编码不一致的混乱情况,MysqlReader对此无法识别,对此也无法提供解决方案,对于这类情况,`导出有可能为乱码`。 ### 5.4 增量数据同步 MysqlReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种: * 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,MysqlReader只需要WHERE条件跟上一同步阶段时间戳即可。 * 对于新增流水型数据,MysqlReader可以WHERE条件后跟上一阶段最大自增ID即可。 对于业务上无字段区分新增、修改数据情况,MysqlReader也无法进行增量数据同步,只能同步全量数据。 ### 5.5 Sql安全性 MysqlReader提供querySql语句交给用户自己实现SELECT抽取语句,MysqlReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。 ## 6 FAQ *** **Q: MysqlReader同步报错,报错信息为XXX** A: 网络或者权限问题,请使用mysql命令行测试: mysql -u -p -h -D -e "select * from <表名>" 如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 ================================================ FILE: mysqlreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT mysqlreader mysqlreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java ${mysql.driver.version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: mysqlreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/mysqlreader target/ mysqlreader-0.0.1-SNAPSHOT.jar plugin/reader/mysqlreader false plugin/reader/mysqlreader/libs runtime ================================================ FILE: mysqlreader/src/main/java/com/alibaba/datax/plugin/reader/mysqlreader/MysqlReader.java ================================================ package com.alibaba.datax.plugin.reader.mysqlreader; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class MysqlReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.MySql; public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private Configuration originalConfig = null; private CommonRdbmsReader.Job commonRdbmsReaderJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); Integer userConfigedFetchSize = this.originalConfig.getInt(Constant.FETCH_SIZE); if (userConfigedFetchSize != null) { LOG.warn("对 mysqlreader 不需要配置 fetchSize, mysqlreader 将会忽略这项配置. 如果您不想再看到此警告,请去除fetchSize 配置."); } this.originalConfig.set(Constant.FETCH_SIZE, Integer.MIN_VALUE); this.commonRdbmsReaderJob = new CommonRdbmsReader.Job(DATABASE_TYPE); this.commonRdbmsReaderJob.init(this.originalConfig); } @Override public void preCheck(){ init(); this.commonRdbmsReaderJob.preCheck(this.originalConfig,DATABASE_TYPE); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderJob.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderJob.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderTask; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderTask = new CommonRdbmsReader.Task(DATABASE_TYPE,super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderTask.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(Constant.FETCH_SIZE); this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderTask.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); } } } ================================================ FILE: mysqlreader/src/main/java/com/alibaba/datax/plugin/reader/mysqlreader/MysqlReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.mysqlreader; import com.alibaba.datax.common.spi.ErrorCode; public enum MysqlReaderErrorCode implements ErrorCode { ; private final String code; private final String description; private MysqlReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: mysqlreader/src/main/resources/plugin.json ================================================ { "name": "mysqlreader", "class": "com.alibaba.datax.plugin.reader.mysqlreader.MysqlReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: mysqlreader/src/main/resources/plugin_job_template.json ================================================ { "name": "mysqlreader", "parameter": { "username": "", "password": "", "column": [], "connection": [ { "jdbcUrl": [], "table": [] } ], "where": "" } } ================================================ FILE: mysqlwriter/doc/mysqlwriter.md ================================================ # DataX MysqlWriter --- ## 1 快速介绍 MysqlWriter 插件实现了写入数据到 Mysql 主库的目的表的功能。在底层实现上, MysqlWriter 通过 JDBC 连接远程 Mysql 数据库,并执行相应的 insert into ... 或者 ( replace into ...) 的 sql 语句将数据写入 Mysql,内部会分批次提交入库,需要数据库本身采用 InnoDB 引擎。 MysqlWriter 面向ETL开发工程师,他们使用 MysqlWriter 从数仓导入数据到 Mysql。同时 MysqlWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 MysqlWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据你配置的 `writeMode` 生成 * `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行) ##### 或者 * `replace into...`(没有遇到主键/唯一性索引冲突时,与 insert into 行为一致,冲突时会用新行替换原有行所有字段) 的语句写入数据到 Mysql。出于性能考虑,采用了 `PreparedStatement + Batch`,并且设置了:`rewriteBatchedStatements=true`,将数据缓冲到线程上下文 Buffer 中,当 Buffer 累计到预定阈值时,才发起写入请求。
注意:目的表所在数据库必须是主库才能写入数据;整个任务至少需要具备 insert/replace into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 Mysql 导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "mysqlwriter", "parameter": { "writeMode": "insert", "username": "root", "password": "root", "column": [ "id", "name" ], "session": [ "set session sql_mode='ANSI'" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax?useUnicode=true&characterEncoding=gbk", "table": [ "test" ] } ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息。作业运行时,DataX 会在你提供的 jdbcUrl 后面追加如下属性:yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true 注意:1、在一个数据库上只能配置一个 jdbcUrl 值。这与 MysqlReader 支持多个备库探测不同,因为此处不支持同一个数据库存在多个主库的情况(双主导入数据情况) 2、jdbcUrl按照Mysql官方规范,并可以填写连接附加控制信息,比如想指定连接编码为 gbk ,则在 jdbcUrl 后面追加属性 useUnicode=true&characterEncoding=gbk。具体请参看 Mysql官方文档或者咨询对应 DBA。 * 必选:是
* 默认值:无
* **username** * 描述:目的数据库的用户名
* 必选:是
* 默认值:无
* **password** * 描述:目的数据库的密码
* 必选:是
* 默认值:无
* **table** * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
* 默认值:无
* **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用`*`表示, 例如: `"column": ["*"]`。 **column配置项必须指定,不能留空!** 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、 column 不能配置任何常量值 * 必选:是
* 默认值:否
* **session** * 描述: DataX在获取Mysql连接时,执行session指定的SQL语句,修改当前connection session属性 * 必须: 否 * 默认值: 空 * **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from 表名"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
* 必选:否
* 默认值:无
* **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
* 必选:否
* 默认值:无
* **writeMode** * 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句
* 必选:是
* 所有选项:insert/replace/update
* 默认值:insert
* **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与Mysql的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
* 必选:否
* 默认值:1024
### 3.3 类型转换 类似 MysqlReader ,目前 MysqlWriter 支持大部分 Mysql 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出 MysqlWriter 针对 Mysql 类型转换列表: | DataX 内部类型| Mysql 数据类型 | | -------- | ----- | | Long |int, tinyint, smallint, mediumint, int, bigint, year| | Double |float, double, decimal| | String |varchar, char, tinytext, text, mediumtext, longtext | | Date |date, datetime, timestamp, time | | Boolean |bit, bool | | Bytes |tinyblob, mediumblob, blob, longblob, varbinary | * `bit类型目前是未定义类型转换` ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: CREATE TABLE `datax_mysqlwriter_perf_00` ( `biz_order_id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'id', `key_value` varchar(4000) NOT NULL COMMENT 'Key-value的内容', `gmt_create` datetime NOT NULL COMMENT '创建时间', `gmt_modified` datetime NOT NULL COMMENT '修改时间', `attribute_cc` int(11) DEFAULT NULL COMMENT '防止并发修改的标志', `value_type` int(11) NOT NULL DEFAULT '0' COMMENT '类型', `buyer_id` bigint(20) DEFAULT NULL COMMENT 'buyerid', `seller_id` bigint(20) DEFAULT NULL COMMENT 'seller_id', PRIMARY KEY (`biz_order_id`,`value_type`), KEY `idx_biz_vertical_gmtmodified` (`gmt_modified`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='datax perf test' 单行记录类似于: key_value: ;orderIds:20148888888,2014888888813800; gmt_create: 2011-09-24 11:07:20 gmt_modified: 2011-10-24 17:56:34 attribute_cc: 1 value_type: 3 buyer_id: 8888888 seller_id: 1 #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 24核 Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz 2. mem: 48GB 3. net: 千兆双网卡 4. disc: DataX 数据不落磁盘,不统计此项 * Mysql数据库机器参数为: 1. cpu: 32核 Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz 2. mem: 256GB 3. net: 千兆双网卡 4. disc: BTWL419303E2800RGN INTEL SSDSC2BB800G4 D2010370 #### 4.1.3 DataX jvm 参数 -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数| 批量提交行数| DataX速度(Rec/s)|DataX流量(MB/s)| DataX机器网卡流出流量(MB/s)|DataX机器运行负载|DB网卡进入流量(MB/s)|DB运行负载|DB TPS| |--------|--------| --------|--------|--------|--------|--------|--------|--------| |1| 128 | 5319 | 0.260 | 0.580 | 0.05 | 0.620| 0.5 | 50 | |1| 512 | 14285 | 0.697 | 1.6 | 0.12 | 1.6 | 0.6 | 28 | |1| 1024 | 17241 | 0.842 | 1.9 | 0.20 | 1.9 | 0.6 | 16| |1| 2048 | 31250 | 1.49 | 2.8 | 0.15 | 3.0| 0.8 | 15 | |1| 4096 | 31250 | 1.49 | 3.5 | 0.20 | 3.6| 0.8 | 8 | |4| 128 | 11764 | 0.574 | 1.5 | 0.21 | 1.6| 0.8 | 112 | |4| 512 | 30769 | 1.47 | 3.5 | 0.3 | 3.6 | 0.9 | 88 | |4| 1024 | 50000 | 2.38 | 5.4 | 0.3 | 5.5 | 1.0 | 66 | |4| 2048 | 66666 | 3.18 | 7.0 | 0.3 | 7.1| 1.37 | 46 | |4| 4096 | 80000 | 3.81 | 7.3| 0.5 | 7.3| 1.40 | 26 | |8| 128 | 17777 | 0.868 | 2.9 | 0.28 | 2.9| 0.8 | 200 | |8| 512 | 57142 | 2.72 | 8.5 | 0.5 | 8.5| 0.70 | 159 | |8| 1024 | 88888 | 4.24 | 12.2 | 0.9 | 12.4 | 1.0 | 108 | |8| 2048 | 133333 | 6.36 | 14.7 | 0.9 | 14.7 | 1.0 | 81 | |8| 4096 | 166666 | 7.95 | 19.5 | 0.9 | 19.5 | 3.0 | 45 | |16| 128 | 32000 | 1.53 | 3.3 | 0.6 | 3.4 | 0.88 | 401 | |16| 512 | 106666 | 5.09 | 16.1| 0.9 | 16.2 | 2.16 | 260 | |16| 1024 | 173913 | 8.29 | 22.1| 1.5 | 22.2 | 4.5 | 200 | |16| 2048 | 228571 | 10.90 | 28.6 | 1.61 | 28.7 | 4.60 | 128 | |16| 4096 | 246153 | 11.74 | 31.1| 1.65 | 31.2| 4.66 | 57 | |32| 1024 | 246153 | 11.74 | 30.5| 3.17 | 30.7 | 12.10 | 270 | 说明: 1. 这里的单表,主键类型为 bigint(20),自增。 2. batchSize 和 通道个数,对性能影响较大。 3. 16通道,4096批量提交时,出现 full gc 2次。 #### 4.2.2 分表测试报告(2个分库,每个分库4张分表,共计8张分表) | 通道数| 批量提交行数| DataX速度(Rec/s)|DataX流量(MB/s)| DataX机器网卡流出流量(MB/s)|DataX机器运行负载|DB网卡进入流量(MB/s)|DB运行负载|DB TPS| |--------|--------| --------|--------|--------|--------|--------|--------|--------| |8| 128 | 26764 | 1.28 | 2.9 | 0.5 | 3.0| 0.8 | 209 | |8| 512 | 95180 | 4.54 | 10.5 | 0.7 | 10.9 | 0.8 | 188 | |8| 1024 | 94117 | 4.49 | 12.3 | 0.6 | 12.4 | 1.09 | 120 | |8| 2048 | 133333 | 6.36 | 19.4 | 0.9 | 19.5| 1.35 | 85 | |8| 4096 | 191692 | 9.14 | 22.1 | 1.0 | 22.2| 1.45 | 45 | #### 4.2.3 分表测试报告(2个分库,每个分库8张分表,共计16张分表) | 通道数| 批量提交行数| DataX速度(Rec/s)|DataX流量(MB/s)| DataX机器网卡流出流量(MB/s)|DataX机器运行负载|DB网卡进入流量(MB/s)|DB运行负载|DB TPS| |--------|--------| --------|--------|--------|--------|--------|--------|--------| |16| 128 | 50124 | 2.39 | 5.6 | 0.40 | 6.0| 2.42 | 378 | |16| 512 | 155084 | 7.40 | 18.6 | 1.30 | 18.9| 2.82 | 325 | |16| 1024 | 177777 | 8.48 | 24.1 | 1.43 | 25.5| 3.5 | 233 | |16| 2048 | 289382 | 13.8 | 33.1 | 2.5 | 33.5| 4.5 | 150 | |16| 4096 | 326451 | 15.52 | 33.7 | 1.5 | 33.9| 4.3 | 80 | #### 4.2.4 性能测试小结 1. 批量提交行数(batchSize)对性能影响很大,当 `batchSize>=512` 之后,单线程写入速度能达到每秒写入一万行 2. 在 `batchSize>=512` 的基础上,随着通道数的增加(通道数<32),速度呈线性比增加。 3. `通常不建议写入数据库时,通道个数 >32` ## 5 约束限制 ## FAQ *** **Q: MysqlWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 *** **Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。第二种,向临时表导入数据,完成后再 rename 到线上表。 *** **Q: 上面第二种方法可以避免对线上数据造成影响,那我具体怎样操作?** A: 可以配置临时表导入 ================================================ FILE: mysqlwriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT mysqlwriter mysqlwriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java ${mysql.driver.version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: mysqlwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/mysqlwriter target/ mysqlwriter-0.0.1-SNAPSHOT.jar plugin/writer/mysqlwriter false plugin/writer/mysqlwriter/libs runtime ================================================ FILE: mysqlwriter/src/main/java/com/alibaba/datax/plugin/writer/mysqlwriter/MysqlWriter.java ================================================ package com.alibaba.datax.plugin.writer.mysqlwriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.util.List; //TODO writeProxy public class MysqlWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.MySql; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterJob; @Override public void preCheck(){ this.init(); this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE); } @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterJob.init(this.originalConfig); } // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) @Override public void prepare() { //实跑先不支持 权限 检验 //this.commonRdbmsWriterJob.privilegeValid(this.originalConfig, DATABASE_TYPE); this.commonRdbmsWriterJob.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); } // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) @Override public void post() { this.commonRdbmsWriterJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterJob.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterTask; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterTask = new CommonRdbmsWriter.Task(DATABASE_TYPE); this.commonRdbmsWriterTask.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); } //TODO 改用连接池,确保每次获取的连接都是可用的(注意:连接可能需要每次都初始化其 session) public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterTask.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); } @Override public boolean supportFailOver(){ String writeMode = writerSliceConfig.getString(Key.WRITE_MODE); return "replace".equalsIgnoreCase(writeMode); } } } ================================================ FILE: mysqlwriter/src/main/resources/plugin.json ================================================ { "name": "mysqlwriter", "class": "com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: mysqlwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "mysqlwriter", "parameter": { "username": "", "password": "", "writeMode": "", "column": [], "session": [], "preSql": [], "connection": [ { "jdbcUrl": "", "table": [] } ] } } ================================================ FILE: neo4jwriter/doc/neo4jwriter.md ================================================ # DataX neo4jWriter 插件文档 ## 功能简介 本目前市面上的neo4j 批量导入主要有Cypher Create,Load CSV,第三方或者官方提供的Batch Import。Load CSV支持节点10W级别一下,Batch Import 需要对数据库进行停机。要想实现不停机的数据写入,Cypher是最好的方式。 ## 支持版本 支持Neo4j 4 和Neo4j 5,如果是Neo4j 3,需要自行将驱动降低至相对应的版本进行编译。 ## 实现原理 将datax的数据转换成了neo4j驱动能识别的对象,利用 unwind 语法进行批量插入。 ## 如何配置 ### 配置项介绍 | 配置 | 说明 | 是否必须 | 默认值 | 示例 | |:-------------------------------|--------------------| -------- | ------ | ---------------------------------------------------- | | database | 数据库名字 | 是 | - | neo4j | | uri | 数据库访问链接 | 是 | - | bolt://localhost:7687 | | username | 访问用户名 | 是 | - | neo4j | | password | 访问密码 | 是 | - | neo4j | | bearerToken | 权限相关 | 否 | - | - | | kerberosTicket | 权限相关 | 否 | - | - | | cypher | 同步语句 | 是 | - | unwind $batch as row create(p) set p.name = row.name | | batchDataVariableName | unwind 携带的数据变量名 | | | batch | | properties | 定义neo4j中数据的属性名字和类型 | 是 | - | 见后续案例 | | batchSize | 一批写入数据量 | 否 | 1000 | | | maxTransactionRetryTimeSeconds | 事务运行最长时间 | 否 | 30秒 | 30 | | maxConnectionTimeoutSeconds | 驱动最长链接时间 | 否 | 30秒 | 30 | | retryTimes | 发生错误的重试次数 | 否 | 3次 | 3 | | retrySleepMills | 重试失败后的等待时间 | 否 | 3秒 | 3 | ### 支持的数据类型 > 配置时均忽略大小写 ``` BOOLEAN, STRING, LONG, SHORT, INTEGER, DOUBLE, FLOAT, LOCAL_DATE, LOCAL_TIME, LOCAL_DATE_TIME, LIST, //map类型支持 . 属性表达式取值 MAP, CHAR_ARRAY, BYTE_ARRAY, BOOLEAN_ARRAY, STRING_ARRAY, LONG_ARRAY, INT_ARRAY, SHORT_ARRAY, DOUBLE_ARRAY, FLOAT_ARRAY, Object_ARRAY ``` ### 写节点 这里提供了一个写节点包含很多类型属性的例子。你可以在我的测试方法中运行。 ```json "writer": { "name": "neo4jWriter", "parameter": { "uri": "neo4j://localhost:7687", "username": "neo4j", "password": "Test@12343", "database": "neo4j", "cypher": "unwind $batch as row create(p:Person) set p.pbool = row.pbool,p.pstring = row.pstring,p.plong = row.plong,p.pshort = row.pshort,p.pdouble=row.pdouble,p.pstringarr=row.pstringarr,p.plocaldate=row.plocaldate", "batchDataVariableName": "batch", "batchSize": "33", "properties": [ { "name": "pbool", "type": "BOOLEAN" }, { "name": "pstring", "type": "STRING" }, { "name": "plong", "type": "LONG" }, { "name": "pshort", "type": "SHORT" }, { "name": "pdouble", "type": "DOUBLE" }, { "name": "pstringarr", "type": "STRING_ARRAY", "split": "," }, { "name": "plocaldate", "type": "LOCAL_DATE", "dateFormat": "yyyy-MM-dd" } ] } } ``` ### 写关系 ```json "writer": { "name": "neo4jWriter", "parameter": { "uri": "neo4j://localhost:7687", "username": "neo4j", "password": "Test@12343", "database": "neo4j", "cypher": "unwind $batch as row match(p1:Person) where p1.id = row.startNodeId match(p2:Person) where p2.id = row.endNodeId create (p1)-[:LINK]->(p2)", "batchDataVariableName": "batch", "batch_size": "33", "properties": [ { "name": "startNodeId", "type": "STRING" }, { "name": "endNodeId", "type": "STRING" } ] } } ``` ### 节点/关系类型动态写 > 需要使用AOPC函数拓展,如果你的数据库没有,请安装APOC函数拓展 ```json "writer": { "name": "neo4jWriter", "parameter": { "uri": "bolt://localhost:7687", "username": "yourUserName", "password": "yourPassword", "database": "yourDataBase", "cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ", "batchDataVariableName": "batch", "batch_size": "1", "properties": [ { "name": "Label", "type": "STRING" }, { "name": "id", "type": "STRING" } ] } } ``` ## 注意事项 * properties定义的顺序需要与reader端顺序一一对应。 * 灵活使用map类型,可以免去很多数据加工的烦恼。在cypher中,可以根据 . 属性访问符号一直取值。比如 unwind $batch as row create (p) set p.name = row.prop.name,set p.age = row.prop.age,在这个例子中,prop是map类型,包含name和age两个属性。 * 如果提示事务超时,建议调大事务运行时间或者调小batchSize * 如果用于更新场景,遇到死锁问题影响写入,建议二开源码加入死锁异常检测,并进行重试。 ## 性能报告 **JVM参数** 16G G1垃圾收集器 8核心 **Neo4j数据库配置** 32核心,256G **datax 配置** * Channel 20 batchsize = 1000 * 任务平均流量:15.23MB/s * 记录写入速度:44440 rec/s * 读出记录总数:2222013 ================================================ FILE: neo4jwriter/pom.xml ================================================ com.alibaba.datax datax-all 0.0.1-SNAPSHOT 4.0.0 neo4jwriter neo4jwriter jar 8 8 UTF-8 4.4.9 4.13.2 1.17.6 org.slf4j slf4j-api ch.qos.logback logback-classic org.neo4j.driver neo4j-java-driver ${neo4j-java-driver.version} com.alibaba.datax datax-common ${datax-project-version} org.testcontainers testcontainers ${test.container.version} junit junit ${junit4.version} test src/main/resources **/*.* true maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: neo4jwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/neo4jwriter target/ neo4jwriter-0.0.1-SNAPSHOT.jar plugin/writer/neo4jwriter false plugin/writer/neo4jwriter/libs runtime ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/Neo4jClient.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.writer.neo4jwriter.adapter.DateAdapter; import com.alibaba.datax.plugin.writer.neo4jwriter.adapter.ValueAdapter; import com.alibaba.datax.plugin.writer.neo4jwriter.config.Neo4jProperty; import com.alibaba.datax.plugin.writer.neo4jwriter.exception.Neo4jErrorCode; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.neo4j.driver.*; import org.neo4j.driver.exceptions.Neo4jException; import org.neo4j.driver.internal.value.MapValue; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; import java.util.concurrent.TimeUnit; import static com.alibaba.datax.plugin.writer.neo4jwriter.config.ConfigConstants.*; import static com.alibaba.datax.plugin.writer.neo4jwriter.exception.Neo4jErrorCode.DATABASE_ERROR; public class Neo4jClient { private static final Logger LOGGER = LoggerFactory.getLogger(Neo4jClient.class); private Driver driver; private WriteConfig writeConfig; private RetryConfig retryConfig; private TaskPluginCollector taskPluginCollector; private Session session; private List writerBuffer; public Neo4jClient(Driver driver, WriteConfig writeConfig, RetryConfig retryConfig, TaskPluginCollector taskPluginCollector) { this.driver = driver; this.writeConfig = writeConfig; this.retryConfig = retryConfig; this.taskPluginCollector = taskPluginCollector; this.writerBuffer = new ArrayList<>(writeConfig.batchSize); } public void init() { String database = writeConfig.database; //neo4j 3.x 没有数据库 if (null != database && !"".equals(database)) { this.session = driver.session(SessionConfig.forDatabase(database)); } else { this.session = driver.session(); } } public static Neo4jClient build(Configuration config, TaskPluginCollector taskPluginCollector) { Driver driver = buildNeo4jDriver(config); String cypher = checkCypher(config); String database = config.getString(DATABASE.getKey()); String batchVariableName = config.getString(BATCH_DATA_VARIABLE_NAME.getKey(), BATCH_DATA_VARIABLE_NAME.getDefaultValue()); List neo4jProperties = JSON.parseArray(config.getString(NEO4J_PROPERTIES.getKey()), Neo4jProperty.class); int batchSize = config.getInt(BATCH_SIZE.getKey(), BATCH_SIZE.getDefaultValue()); int retryTimes = config.getInt(RETRY_TIMES.getKey(), RETRY_TIMES.getDefaultValue()); return new Neo4jClient(driver, new WriteConfig(cypher, database, batchVariableName, neo4jProperties, batchSize), new RetryConfig(retryTimes, config.getLong(RETRY_SLEEP_MILLS.getKey(), RETRY_SLEEP_MILLS.getDefaultValue())), taskPluginCollector ); } private static String checkCypher(Configuration config) { String cypher = config.getString(CYPHER.getKey()); if (StringUtils.isBlank(cypher)) { throw DataXException.asDataXException(Neo4jErrorCode.CONFIG_INVALID, "cypher must not null or empty"); } return cypher; } private static Driver buildNeo4jDriver(Configuration config) { Config.ConfigBuilder configBuilder = Config.builder().withMaxConnectionPoolSize(1); String uri = checkUriConfig(config); //connection timeout //连接超时时间 Long maxConnTime = config.getLong(MAX_CONNECTION_TIMEOUT_SECONDS.getKey(), MAX_TRANSACTION_RETRY_TIME.getDefaultValue()); configBuilder .withConnectionAcquisitionTimeout( maxConnTime * 2, TimeUnit.SECONDS) .withConnectionTimeout(maxConnTime, TimeUnit.SECONDS); //transaction timeout //事务运行超时时间 Long txRetryTime = config.getLong(MAX_TRANSACTION_RETRY_TIME.getKey(), MAX_TRANSACTION_RETRY_TIME.getDefaultValue()); configBuilder.withMaxTransactionRetryTime(txRetryTime, TimeUnit.SECONDS); String username = config.getString(USERNAME.getKey()); String password = config.getString(PASSWORD.getKey()); String bearerToken = config.getString(BEARER_TOKEN.getKey()); String kerberosTicket = config.getString(KERBEROS_TICKET.getKey()); if (StringUtils.isNotBlank(username) && StringUtils.isNotBlank(password)) { return GraphDatabase.driver(uri, AuthTokens.basic(username, password), configBuilder.build()); } else if (StringUtils.isNotBlank(bearerToken)) { return GraphDatabase.driver(uri, AuthTokens.bearer(bearerToken), configBuilder.build()); } else if (StringUtils.isNotBlank(kerberosTicket)) { return GraphDatabase.driver(uri, AuthTokens.kerberos(kerberosTicket), configBuilder.build()); } throw DataXException.asDataXException(Neo4jErrorCode.CONFIG_INVALID, "Invalid Auth config."); } private static String checkUriConfig(Configuration config) { String uri = config.getString(URI.getKey()); if (null == uri || uri.length() == 0) { throw DataXException.asDataXException(Neo4jErrorCode.CONFIG_INVALID, "Invalid uri configuration"); } return uri; } public void destroy() { tryFlushBuffer(); if (driver != null) { driver.close(); } if (session != null) { session.close(); } DateAdapter.destroy(); } private void tryFlushBuffer() { if (!writerBuffer.isEmpty()) { doWrite(writerBuffer); writerBuffer.clear(); } } private void tryBatchWrite() { if (!writerBuffer.isEmpty() && writerBuffer.size() >= writeConfig.batchSize) { doWrite(writerBuffer); writerBuffer.clear(); } } private void doWrite(List values) { Value batchValues = Values.parameters(this.writeConfig.batchVariableName, values); Query query = new Query(this.writeConfig.cypher, batchValues); // LOGGER.debug("query:{}", query.text()); // LOGGER.debug("batch:{}", toUnwindStr(values)); try { RetryUtil.executeWithRetry(() -> { session.writeTransaction(tx -> tx.run(query)); return null; }, this.retryConfig.retryTimes, retryConfig.retrySleepMills, true, Collections.singletonList(Neo4jException.class)); } catch (Exception e) { LOGGER.error("an exception occurred while writing to the database,message:{}", e.getMessage()); throw DataXException.asDataXException(DATABASE_ERROR, e.getMessage()); } } private String toUnwindStr(List values) { StringJoiner joiner = new StringJoiner(","); for (MapValue value : values) { joiner.add(value.toString()); } return "[" + joiner + "]"; } public void tryWrite(Record record) { MapValue neo4jValue = checkAndConvert(record); writerBuffer.add(neo4jValue); tryBatchWrite(); } private MapValue checkAndConvert(Record record) { int sourceColNum = record.getColumnNumber(); List neo4jProperties = writeConfig.neo4jProperties; if (neo4jProperties == null || neo4jProperties.size() != sourceColNum) { throw new DataXException(Neo4jErrorCode.CONFIG_INVALID, "the read and write columns do not match!"); } Map data = new HashMap<>(sourceColNum * 4 / 3); for (int i = 0; i < sourceColNum; i++) { Column column = record.getColumn(i); Neo4jProperty neo4jProperty = neo4jProperties.get(i); try { Value value = ValueAdapter.column2Value(column, neo4jProperty); data.put(neo4jProperty.getName(), value); } catch (Exception e) { LOGGER.info("dirty record:{},message :{}", column, e.getMessage()); this.taskPluginCollector.collectDirtyRecord(record, e.getMessage()); } } return new MapValue(data); } public List getNeo4jFields() { return this.writeConfig.neo4jProperties; } static class RetryConfig { int retryTimes; long retrySleepMills; RetryConfig(int retryTimes, long retrySleepMills) { this.retryTimes = retryTimes; this.retrySleepMills = retrySleepMills; } } static class WriteConfig { String cypher; String database; String batchVariableName; List neo4jProperties; int batchSize; public WriteConfig(String cypher, String database, String batchVariableName, List neo4jProperties, int batchSize) { this.cypher = cypher; this.database = database; this.batchVariableName = batchVariableName; this.neo4jProperties = neo4jProperties; this.batchSize = batchSize; } } } ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/Neo4jWriter.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.element.Record; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public class Neo4jWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOGGER = LoggerFactory.getLogger(Job.class); private Configuration jobConf = null; @Override public void init() { LOGGER.info("Neo4jWriter Job init success"); this.jobConf = getPluginJobConf(); } @Override public void destroy() { LOGGER.info("Neo4jWriter Job destroyed"); } @Override public List split(int mandatoryNumber) { List configurations = new ArrayList(mandatoryNumber); for (int i = 0; i < mandatoryNumber; i++) { configurations.add(this.jobConf.clone()); } return configurations; } } public static class Task extends Writer.Task { private static final Logger TASK_LOGGER = LoggerFactory.getLogger(Task.class); private Neo4jClient neo4jClient; @Override public void init() { Configuration taskConf = super.getPluginJobConf(); this.neo4jClient = Neo4jClient.build(taskConf,getTaskPluginCollector()); this.neo4jClient.init(); TASK_LOGGER.info("neo4j writer task init success."); } @Override public void destroy() { this.neo4jClient.destroy(); TASK_LOGGER.info("neo4j writer task destroyed."); } @Override public void startWrite(RecordReceiver receiver) { Record record; while ((record = receiver.getFromReader()) != null){ this.neo4jClient.tryWrite(record); } } } } ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/adapter/DateAdapter.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter.adapter; import com.alibaba.datax.plugin.writer.neo4jwriter.config.Neo4jProperty; import org.testcontainers.shaded.com.google.common.base.Supplier; import java.time.LocalDate; import java.time.LocalDateTime; import java.time.LocalTime; import java.time.format.DateTimeFormatter; /** * @author fuyouj */ public class DateAdapter { private static final ThreadLocal LOCAL_DATE_FORMATTER_MAP = new ThreadLocal<>(); private static final ThreadLocal LOCAL_TIME_FORMATTER_MAP = new ThreadLocal<>(); private static final ThreadLocal LOCAL_DATE_TIME_FORMATTER_MAP = new ThreadLocal<>(); private static final String DEFAULT_LOCAL_DATE_FORMATTER = "yyyy-MM-dd"; private static final String DEFAULT_LOCAL_TIME_FORMATTER = "HH:mm:ss"; private static final String DEFAULT_LOCAL_DATE_TIME_FORMATTER = "yyyy-MM-dd HH:mm:ss"; public static LocalDate localDate(String text, Neo4jProperty neo4jProperty) { if (LOCAL_DATE_FORMATTER_MAP.get() != null) { return LocalDate.parse(text, LOCAL_DATE_FORMATTER_MAP.get()); } String format = getOrDefault(neo4jProperty::getDateFormat, DEFAULT_LOCAL_DATE_FORMATTER); DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern(format); LOCAL_DATE_FORMATTER_MAP.set(dateTimeFormatter); return LocalDate.parse(text, dateTimeFormatter); } public static String getOrDefault(Supplier dateFormat, String defaultFormat) { String format = dateFormat.get(); if (null == format || "".equals(format)) { return defaultFormat; } else { return format; } } public static void destroy() { LOCAL_DATE_FORMATTER_MAP.remove(); LOCAL_TIME_FORMATTER_MAP.remove(); LOCAL_DATE_TIME_FORMATTER_MAP.remove(); } public static LocalTime localTime(String text, Neo4jProperty neo4JProperty) { if (LOCAL_TIME_FORMATTER_MAP.get() != null) { return LocalTime.parse(text, LOCAL_TIME_FORMATTER_MAP.get()); } String format = getOrDefault(neo4JProperty::getDateFormat, DEFAULT_LOCAL_TIME_FORMATTER); DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern(format); LOCAL_TIME_FORMATTER_MAP.set(dateTimeFormatter); return LocalTime.parse(text, dateTimeFormatter); } public static LocalDateTime localDateTime(String text, Neo4jProperty neo4JProperty) { if (LOCAL_DATE_TIME_FORMATTER_MAP.get() != null){ return LocalDateTime.parse(text,LOCAL_DATE_TIME_FORMATTER_MAP.get()); } String format = getOrDefault(neo4JProperty::getDateFormat, DEFAULT_LOCAL_DATE_TIME_FORMATTER); DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern(format); LOCAL_DATE_TIME_FORMATTER_MAP.set(dateTimeFormatter); return LocalDateTime.parse(text, dateTimeFormatter); } } ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/adapter/ValueAdapter.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter.adapter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.plugin.writer.neo4jwriter.config.Neo4jProperty; import com.alibaba.datax.plugin.writer.neo4jwriter.element.PropertyType; import com.alibaba.fastjson2.JSON; import org.neo4j.driver.Value; import org.neo4j.driver.Values; import org.neo4j.driver.internal.value.NullValue; import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.Map; import java.util.function.Function; /** * @author fuyouj */ public class ValueAdapter { public static Value column2Value(final Column column, final Neo4jProperty neo4JProperty) { String typeStr = neo4JProperty.getType(); PropertyType type = PropertyType.fromStrIgnoreCase(typeStr); if (column.asString() == null) { return NullValue.NULL; } switch (type) { case NULL: return NullValue.NULL; case MAP: return Values.value(JSON.parseObject(column.asString(), Map.class)); case BOOLEAN: return Values.value(column.asBoolean()); case STRING: return Values.value(column.asString()); case INTEGER: case LONG: return Values.value(column.asLong()); case SHORT: return Values.value(Short.valueOf(column.asString())); case FLOAT: case DOUBLE: return Values.value(column.asDouble()); case BYTE_ARRAY: return Values.value(parseArrayType(neo4JProperty, column.asString(), Byte::valueOf)); case CHAR_ARRAY: return Values.value(parseArrayType(neo4JProperty, column.asString(), (s) -> s.charAt(0))); case BOOLEAN_ARRAY: return Values.value(parseArrayType(neo4JProperty, column.asString(), Boolean::valueOf)); case STRING_ARRAY: case Object_ARRAY: case LIST: return Values.value(parseArrayType(neo4JProperty, column.asString(), Function.identity())); case LONG_ARRAY: return Values.value(parseArrayType(neo4JProperty, column.asString(), Long::valueOf)); case INT_ARRAY: return Values.value(parseArrayType(neo4JProperty, column.asString(), Integer::valueOf)); case SHORT_ARRAY: return Values.value(parseArrayType(neo4JProperty, column.asString(), Short::valueOf)); case DOUBLE_ARRAY: case FLOAT_ARRAY: return Values.value(parseArrayType(neo4JProperty, column.asString(), Double::valueOf)); case LOCAL_DATE: return Values.value(DateAdapter.localDate(column.asString(), neo4JProperty)); case LOCAL_TIME: return Values.value(DateAdapter.localTime(column.asString(), neo4JProperty)); case LOCAL_DATE_TIME: return Values.value(DateAdapter.localDateTime(column.asString(), neo4JProperty)); default: return Values.value(column.getRawData()); } } private static List parseArrayType(final Neo4jProperty neo4JProperty, final String strValue, final Function convertFunc) { if (null == strValue || "".equals(strValue)) { return Collections.emptyList(); } String split = neo4JProperty.getSplitOrDefault(); String[] strArr = strValue.split(split); List ans = new ArrayList<>(); for (String s : strArr) { ans.add(convertFunc.apply(s)); } return ans; } } ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/ConfigConstants.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter.config; import java.util.List; /** * @author fuyouj */ public final class ConfigConstants { public static final Long DEFAULT_MAX_TRANSACTION_RETRY_SECONDS = 30L; public static final Long DEFAULT_MAX_CONNECTION_SECONDS = 30L; public static final Option RETRY_TIMES = Option.builder() .key("retryTimes") .defaultValue(3) .desc("The number of overwrites when an error occurs") .build(); public static final Option RETRY_SLEEP_MILLS = Option.builder() .key("retrySleepMills") .defaultValue(3000L) .build(); /** * cluster mode please reference * how to connect cluster mode */ public static final Option URI = Option.builder() .key("uri") .noDefaultValue() .desc("uir of neo4j database") .build(); public static final Option USERNAME = Option.builder() .key("username") .noDefaultValue() .desc("username for accessing the neo4j database") .build(); public static final Option PASSWORD = Option.builder() .key("password") .noDefaultValue() .desc("password for accessing the neo4j database") .build(); public static final Option BEARER_TOKEN = Option.builder() .key("bearerToken") .noDefaultValue() .desc("base64 encoded bearer token of the Neo4j. for Auth.") .build(); public static final Option KERBEROS_TICKET = Option.builder() .key("kerberosTicket") .noDefaultValue() .desc("base64 encoded kerberos ticket of the Neo4j. for Auth.") .build(); public static final Option DATABASE = Option.builder() .key("database") .noDefaultValue() .desc("database name.") .build(); public static final Option CYPHER = Option.builder() .key("cypher") .noDefaultValue() .desc("cypher query.") .build(); public static final Option MAX_TRANSACTION_RETRY_TIME = Option.builder() .key("maxTransactionRetryTimeSeconds") .defaultValue(DEFAULT_MAX_TRANSACTION_RETRY_SECONDS) .desc("maximum transaction retry time(seconds). transaction fail if exceeded.") .build(); public static final Option MAX_CONNECTION_TIMEOUT_SECONDS = Option.builder() .key("maxConnectionTimeoutSeconds") .defaultValue(DEFAULT_MAX_CONNECTION_SECONDS) .desc("The maximum amount of time to wait for a TCP connection to be established (seconds).") .build(); public static final Option BATCH_DATA_VARIABLE_NAME = Option.builder() .key("batchDataVariableName") .defaultValue("batch") .desc("in a cypher statement, a variable name that represents a batch of data") .build(); public static final Option> NEO4J_PROPERTIES = Option.>builder() .key("properties") .noDefaultValue() .desc("neo4j node or relation`s props") .build(); public static final Option BATCH_SIZE = Option.builder(). key("batchSize") .defaultValue(1000) .desc("max batch size") .build(); } ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/Neo4jProperty.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter.config; /** * 由于dataX并不能传输数据的元数据,所以只能在writer端定义每列数据的名字 * datax does not support data metadata, * only the name of each column of data can be defined on neo4j writer * * @author fuyouj */ public class Neo4jProperty { public static final String DEFAULT_SPLIT = ","; /** * name of neo4j field */ private String name; /** * neo4j type * reference by org.neo4j.driver.Values */ private String type; /** * for date */ private String dateFormat; /** * for array type */ private String split; public Neo4jProperty() { } public Neo4jProperty(String name, String type, String format, String split) { this.name = name; this.type = type; this.dateFormat = format; this.split = split; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getType() { return type; } public void setType(String type) { this.type = type; } public String getDateFormat() { return dateFormat; } public void setDateFormat(String dateFormat) { this.dateFormat = dateFormat; } public String getSplit() { return getSplitOrDefault(); } public String getSplitOrDefault() { if (split == null || "".equals(split)) { return DEFAULT_SPLIT; } return split; } public void setSplit(String split) { this.split = split; } } ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/config/Option.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter.config; public class Option { public static class Builder { private String key; private String desc; private T defaultValue; public Builder key(String key) { this.key = key; return this; } public Builder desc(String desc) { this.desc = desc; return this; } public Builder defaultValue(T defaultValue) { this.defaultValue = defaultValue; return this; } public Builder noDefaultValue() { return this; } public Option build() { return new Option<>(this.key, this.desc, this.defaultValue); } } private final String key; private final String desc; private final T defaultValue; public Option(String key, String desc, T defaultValue) { this.key = key; this.desc = desc; this.defaultValue = defaultValue; } public static Builder builder(){ return new Builder<>(); } public String getKey() { return key; } public String getDesc() { return desc; } public T getDefaultValue() { if (defaultValue == null){ throw new IllegalStateException(key + ":defaultValue is null"); } return defaultValue; } } ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/element/PropertyType.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter.element; import java.util.Arrays; /** * @see org.neo4j.driver.Values * @author fuyouj */ public enum PropertyType { NULL, BOOLEAN, STRING, LONG, SHORT, INTEGER, DOUBLE, FLOAT, LOCAL_DATE, LOCAL_TIME, LOCAL_DATE_TIME, LIST, MAP, CHAR_ARRAY, BYTE_ARRAY, BOOLEAN_ARRAY, STRING_ARRAY, LONG_ARRAY, INT_ARRAY, SHORT_ARRAY, DOUBLE_ARRAY, FLOAT_ARRAY, Object_ARRAY; public static PropertyType fromStrIgnoreCase(String typeStr) { return Arrays.stream(PropertyType.values()) .filter(e -> e.name().equalsIgnoreCase(typeStr)) .findFirst() .orElse(PropertyType.STRING); } } ================================================ FILE: neo4jwriter/src/main/java/com/alibaba/datax/plugin/writer/neo4jwriter/exception/Neo4jErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.neo4jwriter.exception; import com.alibaba.datax.common.spi.ErrorCode; public enum Neo4jErrorCode implements ErrorCode { /** * Invalid configuration * 配置校验异常 */ CONFIG_INVALID("NEO4J_ERROR_01","invalid configuration"), /** * database error * 在执行写入到数据库时抛出的异常,可能是权限异常,也可能是连接超时,或者是配置到了从节点。 * 如果是更新操作,还会有死锁异常。具体原因根据报错信息确定,但是这与dataX无关。 */ DATABASE_ERROR("NEO4J_ERROR_02","database error"); private final String code; private final String description; @Override public String getCode() { return code; } @Override public String getDescription() { return description; } Neo4jErrorCode(String code, String description) { this.code = code; this.description = description; } } ================================================ FILE: neo4jwriter/src/main/resources/plugin.json ================================================ { "name": "neo4jWriter", "class": "com.alibaba.datax.plugin.writer.neo4jwriter.Neo4jWriter", "description": "dataX neo4j 写插件", "developer": "付有杰" } ================================================ FILE: neo4jwriter/src/main/resources/plugin_job_template.json ================================================ { "uri": "neo4j://localhost:7687", "username": "neo4j", "password": "Test@12343", "database": "neo4j", "cypher": "unwind $batch as row create(p:Person) set p.pbool = row.pbool,p.pstring = row.pstring,p.plong = row.plong,p.pshort = row.pshort,p.pdouble=row.pdouble,p.pstringarr=row.pstringarr,p.plocaldate=row.plocaldate", "batchDataVariableName": "batch", "batchSize": "33", "properties": [ { "name": "pbool", //type 忽略大小写 "type": "BOOLEAN" }, { "name": "pstring", "type": "STRING" }, { "name": "plong", "type": "LONG" }, { "name": "pshort", "type": "SHORT" }, { "name": "pdouble", "type": "DOUBLE" }, { "name": "pstringarr", "type": "STRING_ARRAY", "split": "," }, { "name": "plocaldate", "type": "LOCAL_DATE", "dateFormat": "yyyy-MM-dd" } ] } ================================================ FILE: neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/Neo4jWriterTest.java ================================================ package com.alibaba.datax.plugin.writer; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.mock.MockRecord; import com.alibaba.datax.plugin.writer.mock.MockUtil; import com.alibaba.datax.plugin.writer.neo4jwriter.Neo4jClient; import com.alibaba.datax.plugin.writer.neo4jwriter.config.Neo4jProperty; import com.alibaba.datax.plugin.writer.neo4jwriter.element.PropertyType; import org.junit.After; import org.junit.Before; import org.junit.Test; import org.neo4j.driver.*; import org.neo4j.driver.types.Node; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.testcontainers.containers.GenericContainer; import org.testcontainers.containers.Network; import org.testcontainers.containers.output.Slf4jLogConsumer; import org.testcontainers.lifecycle.Startables; import org.testcontainers.shaded.org.awaitility.Awaitility; import org.testcontainers.utility.DockerImageName; import org.testcontainers.utility.DockerLoggerFactory; import java.io.File; import java.net.URI; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.concurrent.TimeUnit; import java.util.stream.Stream; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; public class Neo4jWriterTest { private static final Logger LOGGER = LoggerFactory.getLogger(Neo4jWriterTest.class); private static final int MOCK_NUM = 100; private static final String CONTAINER_IMAGE = "neo4j:5.9.0"; private static final String CONTAINER_HOST = "neo4j-host"; private static final int HTTP_PORT = 7474; private static final int BOLT_PORT = 7687; private static final String CONTAINER_NEO4J_USERNAME = "neo4j"; private static final String CONTAINER_NEO4J_PASSWORD = "Test@12343"; private static final URI CONTAINER_URI = URI.create("neo4j://localhost:" + BOLT_PORT); protected static final Network NETWORK = Network.newNetwork(); private GenericContainer container; private Driver neo4jDriver; private Session neo4jSession; @Before public void init() { DockerImageName imageName = DockerImageName.parse(CONTAINER_IMAGE); container = new GenericContainer<>(imageName) .withNetwork(NETWORK) .withNetworkAliases(CONTAINER_HOST) .withExposedPorts(HTTP_PORT, BOLT_PORT) .withEnv( "NEO4J_AUTH", CONTAINER_NEO4J_USERNAME + "/" + CONTAINER_NEO4J_PASSWORD) .withEnv("apoc.export.file.enabled", "true") .withEnv("apoc.import.file.enabled", "true") .withEnv("apoc.import.file.use_neo4j_config", "true") .withEnv("NEO4J_PLUGINS", "[\"apoc\"]") .withLogConsumer( new Slf4jLogConsumer( DockerLoggerFactory.getLogger(CONTAINER_IMAGE))); container.setPortBindings( Arrays.asList( String.format("%s:%s", HTTP_PORT, HTTP_PORT), String.format("%s:%s", BOLT_PORT, BOLT_PORT))); Startables.deepStart(Stream.of(container)).join(); LOGGER.info("container started"); Awaitility.given() .ignoreExceptions() .await() .atMost(30, TimeUnit.SECONDS) .untilAsserted(this::initConnection); } @Test public void testCreateNodeAllTypeField() { final Result checkExists = neo4jSession.run("MATCH (p:Person) RETURN p limit 1"); if (checkExists.hasNext()) { neo4jSession.run("MATCH (p:Person) delete p"); } Configuration configuration = Configuration.from(new File("src/test/resources/allTypeFieldNode.json")); Neo4jClient neo4jClient = Neo4jClient.build(configuration, null); neo4jClient.init(); for (int i = 0; i < MOCK_NUM; i++) { neo4jClient.tryWrite(mockAllTypeFieldTestNode(neo4jClient.getNeo4jFields())); } neo4jClient.destroy(); Result result = neo4jSession.run("MATCH (p:Person) return p"); // nodes assertTrue(result.hasNext()); int cnt = 0; while (result.hasNext()) { org.neo4j.driver.Record record = result.next(); record.get("p").get("pbool").asBoolean(); record.get("p").get("pstring").asString(); record.get("p").get("plong").asLong(); record.get("p").get("pshort").asInt(); record.get("p").get("pdouble").asDouble(); List list = (List) record.get("p").get("pstringarr").asObject(); record.get("p").get("plocaldate").asLocalDate(); cnt++; } assertEquals(cnt, MOCK_NUM); } /** * 创建关系 必须先有节点 * 所以先创建节点再模拟关系 */ @Test public void testCreateRelation() { final Result checkExists = neo4jSession.run("MATCH (p1:Person)-[r:LINK]->(p1:Person) return r limit 1"); if (checkExists.hasNext()) { neo4jSession.run("MATCH (p1:Person)-[r:LINK]->(p1:Person) delete r,p1,p2"); } String createNodeCql = "create (p:Person) set p.id = '%s'"; Configuration configuration = Configuration.from(new File("src/test/resources/relationship.json")); Neo4jClient neo4jClient = Neo4jClient.build(configuration, null); neo4jClient.init(); //创建节点为后续写关系做准备 //Create nodes to prepare for subsequent write relationships for (int i = 0; i < MOCK_NUM; i++) { neo4jSession.run(String.format(createNodeCql, i + "start")); neo4jSession.run(String.format(createNodeCql, i + "end")); Record record = new MockRecord(); record.addColumn(new StringColumn(i + "start")); record.addColumn(new StringColumn(i + "end")); neo4jClient.tryWrite(record); } neo4jClient.destroy(); Result result = neo4jSession.run("MATCH (start:Person)-[r:LINK]->(end:Person) return r,start,end"); // relationships assertTrue(result.hasNext()); int cnt = 0; while (result.hasNext()) { org.neo4j.driver.Record record = result.next(); Node startNode = record.get("start").asNode(); assertTrue(startNode.hasLabel("Person")); assertTrue(startNode.asMap().containsKey("id")); Node endNode = record.get("end").asNode(); assertTrue(startNode.hasLabel("Person")); assertTrue(endNode.asMap().containsKey("id")); String name = record.get("r").type().name(); assertEquals("RELATIONSHIP", name); cnt++; } assertEquals(cnt, MOCK_NUM); } /** * neo4j中,Label和关系类型,想动态的写,需要借助于apoc函数 */ @Test public void testUseApocCreateDynamicLabel() { List dynamicLabel = new ArrayList<>(); for (int i = 0; i < MOCK_NUM; i++) { dynamicLabel.add("Label" + i); } //删除原有数据 //remove test data if exist //这种占位符的方式不支持批量动态写,当然可以使用union拼接,但是性能不好 String query = "match (p:%s) return p"; String delete = "match (p:%s) delete p"; for (String label : dynamicLabel) { Result result = neo4jSession.run(String.format(query, label)); if (result.hasNext()) { neo4jSession.run(String.format(delete, label)); } } Configuration configuration = Configuration.from(new File("src/test/resources/dynamicLabel.json")); Neo4jClient neo4jClient = Neo4jClient.build(configuration, null); neo4jClient.init(); for (int i = 0; i < dynamicLabel.size(); i++) { Record record = new MockRecord(); record.addColumn(new StringColumn(dynamicLabel.get(i))); record.addColumn(new StringColumn(String.valueOf(i))); neo4jClient.tryWrite(record); } neo4jClient.destroy(); //校验脚本的批量写入是否正确 int cnt = 0; for (int i = 0; i < dynamicLabel.size(); i++) { String label = dynamicLabel.get(i); Result result = neo4jSession.run(String.format(query, label)); while (result.hasNext()) { org.neo4j.driver.Record record = result.next(); Node node = record.get("p").asNode(); assertTrue(node.hasLabel(label)); assertEquals(node.asMap().get("id"), i + ""); cnt++; } } assertEquals(cnt, MOCK_NUM); } private Record mockAllTypeFieldTestNode(List neo4JProperties) { Record mock = new MockRecord(); for (Neo4jProperty field : neo4JProperties) { mock.addColumn(MockUtil.mockColumnByType(PropertyType.fromStrIgnoreCase(field.getType()))); } return mock; } @After public void destroy() { if (neo4jSession != null) { neo4jSession.close(); } if (neo4jDriver != null) { neo4jDriver.close(); } if (container != null) { container.close(); } } private void initConnection() { neo4jDriver = GraphDatabase.driver( CONTAINER_URI, AuthTokens.basic(CONTAINER_NEO4J_USERNAME, CONTAINER_NEO4J_PASSWORD)); neo4jSession = neo4jDriver.session(SessionConfig.forDatabase("neo4j")); } } ================================================ FILE: neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/mock/MockRecord.java ================================================ package com.alibaba.datax.plugin.writer.mock; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.fastjson2.JSON; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; public class MockRecord implements Record { private static final int RECORD_AVERGAE_COLUMN_NUMBER = 16; private List columns; private int byteSize; private Map meta; public MockRecord() { this.columns = new ArrayList<>(RECORD_AVERGAE_COLUMN_NUMBER); } @Override public void addColumn(Column column) { columns.add(column); incrByteSize(column); } @Override public Column getColumn(int i) { if (i < 0 || i >= columns.size()) { return null; } return columns.get(i); } @Override public void setColumn(int i, final Column column) { if (i < 0) { throw new IllegalArgumentException("不能给index小于0的column设置值"); } if (i >= columns.size()) { expandCapacity(i + 1); } decrByteSize(getColumn(i)); this.columns.set(i, column); incrByteSize(getColumn(i)); } @Override public String toString() { Map json = new HashMap(); json.put("size", this.getColumnNumber()); json.put("data", this.columns); return JSON.toJSONString(json); } @Override public int getColumnNumber() { return this.columns.size(); } @Override public int getByteSize() { return byteSize; } public int getMemorySize() { throw new UnsupportedOperationException(); } @Override public void setMeta(Map meta) { } @Override public Map getMeta() { return null; } private void decrByteSize(final Column column) { } private void incrByteSize(final Column column) { } private void expandCapacity(int totalSize) { if (totalSize <= 0) { return; } int needToExpand = totalSize - columns.size(); while (needToExpand-- > 0) { this.columns.add(null); } } } ================================================ FILE: neo4jwriter/src/test/java/com/alibaba/datax/plugin/writer/mock/MockUtil.java ================================================ package com.alibaba.datax.plugin.writer.mock; import com.alibaba.datax.common.element.*; import com.alibaba.datax.plugin.writer.neo4jwriter.element.PropertyType; import com.alibaba.fastjson2.JSON; import java.time.LocalDate; import java.time.format.DateTimeFormatter; import java.util.HashMap; import java.util.Map; import java.util.Random; public class MockUtil { public static Column mockColumnByType(PropertyType type) { Random random = new Random(); switch (type) { case SHORT: return new StringColumn("1"); case BOOLEAN: return new BoolColumn(random.nextInt() % 2 == 0); case INTEGER: case LONG: return new LongColumn(random.nextInt(Integer.MAX_VALUE)); case FLOAT: case DOUBLE: return new DoubleColumn(random.nextDouble()); case NULL: return null; case BYTE_ARRAY: return new BytesColumn(new byte[]{(byte) (random.nextInt() % 2)}); case LOCAL_DATE: return new StringColumn(LocalDate.now().format(DateTimeFormatter.ofPattern("yyyy-MM-dd"))); case MAP: return new StringColumn(JSON.toJSONString(propmap())); case STRING_ARRAY: return new StringColumn("[1,1,1,1,1,1,1]"); default: return new StringColumn("randomStr" + random.nextInt(Integer.MAX_VALUE)); } } public static Map propmap() { Map prop = new HashMap<>(); prop.put("name", "neo4jWriter"); prop.put("age", "1"); return prop; } } ================================================ FILE: neo4jwriter/src/test/resources/allTypeFieldNode.json ================================================ { "uri": "neo4j://localhost:7687", "username":"neo4j", "password":"Test@12343", "database":"neo4j", "cypher": "unwind $batch as row create(p:Person) set p.pbool = row.pbool,p.pstring = row.pstring,p.plong = row.plong,p.pshort = row.pshort,p.pdouble=row.pdouble,p.pstringarr=row.pstringarr,p.plocaldate=row.plocaldate", "batchDataVariableName": "batch", "batchSize": "33", "properties": [ { "name": "pbool", "type": "BOOLEAN" }, { "name": "pstring", "type": "STRING" }, { "name": "plong", "type": "LONG" }, { "name": "pshort", "type": "SHORT" }, { "name": "pdouble", "type": "DOUBLE" }, { "name": "pstringarr", "type": "STRING_ARRAY", "split": "," }, { "name": "plocaldate", "type": "LOCAL_DATE", "dateFormat": "yyyy-MM-dd" } ] } ================================================ FILE: neo4jwriter/src/test/resources/dynamicLabel.json ================================================ { "uri": "bolt://localhost:7687", "username":"neo4j", "password":"Test@12343", "database":"neo4j", "cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ", "batchDataVariableName": "batch", "batchSize": "33", "properties": [ { "name": "Label", "type": "string" }, { "name": "id", "type": "STRING" } ] } ================================================ FILE: neo4jwriter/src/test/resources/relationship.json ================================================ { "uri": "neo4j://localhost:7687", "username":"neo4j", "password":"Test@12343", "database":"neo4j", "cypher": "unwind $batch as row match(p1:Person) where p1.id = row.startNodeId match(p2:Person) where p2.id = row.endNodeId create (p1)-[:LINK]->(p2)", "batchDataVariableName": "batch", "batchSize": "33", "properties": [ { "name": "startNodeId", "type": "STRING" }, { "name": "endNodeId", "type": "STRING" } ] } ================================================ FILE: neo4jwriter/src/test/resources/streamreader2neo4j.json ================================================ { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "sliceRecordCount": 10, "column": [ { "type": "string", "value": "StreamReader" }, { "type": "string", "value": "1997" } ] } }, "writer": { "name": "neo4jWriter", "parameter": { "uri": "bolt://localhost:7687", "username":"neo4j", "password":"Test@12343", "database":"neo4j", "cypher": "unwind $batch as row CALL apoc.cypher.doIt( 'create (n:`' + row.Label + '`{id:$id})' ,{id: row.id} ) YIELD value RETURN 1 ", "batchDataVariableName": "batch", "batchSize": "3", "properties": [ { "name": "Label", "type": "string" }, { "name": "id", "type": "STRING" } ] } } } ], "setting": { "speed": { "channel": 5 } } } } ================================================ FILE: obhbasereader/doc/obhbasereader.md ================================================ OceanBase的table api为应用提供了ObHBase的访问接口,因此,OceanBase的table api的reader与HBase Reader的结构和配置方法类似。 obhbasereader插件支持sql和hbase api两种读取方式,两种方式存在如下区别: 1. sql方式可以按照分区或者K值进行数据切片,而hbase api方式的数据切片需要用户手动设置。 2. sql方式会将从obhbase读取的kqtv形式的数据转换为单一横行,而hbase api则不做行列转换,直接以kqtv形式将数据传递给下游。 3. sql方式需要配置column属性,hbase api则不需要配置,数据均为固定的kqtv四列。 4. sql方式仅支持获取获得最新或者最旧版本的数据,而hbase api支持获得多版本数据。 #### 脚本配置 ```json { "job": { "setting": { "speed": { "channel": 3, "byte": 104857600 }, "errorLimit": { "record": 10 } }, "content": [ { "reader": { "name": "obhbasereader", "parameter": { "username": "username", "password": "password", "encoding": "utf8", "column": [ { "name": "f1:column1_1", "type": "string" }, { "name": "f1:column2_2", "type": "string" }, { "name": "f1:column1_1", "type": "string" }, { "name": "f1:column2_2", "type": "string" } ], "range": [ { "startRowkey": "aaa", "endRowkey": "ccc", "isBinaryRowkey": false }, { "startRowkey": "eee", "endRowkey": "zzz", "isBinaryRowkey": false } ], "mode": "normal", "readByPartition": "true", "scanCacheSize": "", "readerHint": "", "readBatchSize": "1000", "connection": [ { "table": [ "htable1", "htable2" ], "jdbcUrl": [ "||_dsc_ob10_dsc_||集群:租户||_dsc_ob10_dsc_||jdbc:mysql://ip:port/dbName1" ], "username": "username", "password": "password" }, { "table": [ "htable1", "htable2" ], "jdbcUrl": [ "jdbc:mysql://ip:port/database" ] } ] } }, "writer": { "name": "txtfilewriter", "parameter": { "path": "/Users/xujing/datax/txtfile", "charset": "UTF-8", "fieldDelimiter": ",", "fileName": "hbase", "nullFormat": "null", "writeMode": "truncate" } } } ] } } ``` ##### 参数解释 - **connection** - 描述:配置分库分表的jdbcUrl和分表名。如果一个分库中有多个分表可以用逗号隔开,也可以写成表名[起始序号-截止序号] - 必须:是 - 默认值:无 - **jdbcUrl** - 描述:连接ob使用的jdbc url,支持如下两种格式: - jdbc:mysql://obproxyIp:obproxyPort/db - 此格式下username需要写成三段式格式 - ||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/db - 此格式下username仅填写用户名本身,无需三段式写法 - 必选:是 - 默认值:无 - **table** - 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,obhbasereader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。 - 必选:是 - 默认值:无 - **readByPartition** - 描述:使用sql方式读取时,配置**仅**按照分区进行切片。 - 必须:否 - 默认值:false - **partitionName** - 描述:使用sql方式读取时,标识仅读取指定分区名的数据,用户需要保证配置的分区名在表结构中真实存在(要求严格大小写)。 - 必须:否 - 默认值:无 - **readBatchSize** - 描述:使用sql方式读取时,分页大小。 - 必须:否 - 默认值:10w - **fetchSize** - 描述:使用sql方式读取时,控制每次读取数据时从结果集中获取的数据行数。 - 必须:否 - 默认值:-2147483648 - **scanCacheSize** - 描述:使用hbase api读取时,每次rpc从服务器端读取的行数 - 必须:否 - 默认值:256 - **readerHint** - 描述:obhbasereader使用sql方式读取时使用的hint - 必须:否 - 默认值:/*+READ_CONSISTENCY(weak),QUERY_TIMEOUT(86400000000)*/ - **column** - 描述:使用sql方式读取数据时,所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。 - 支持列裁剪,即列可以挑选部分列进行导出。 ``` 支持列换序,即列可以不按照表schema信息进行导出,同时支持通配符*,在使用之前需仔细核对列信息。 ``` - 必选:sql方式读取时必选 - 默认值:无 - **range** - 描述**:**指定hbasereader读取的rowkey范围 - 必须:否 - 默认值:无 - **username** - 描述:访问OceanBase的用户名 - 必选:是 - 默认值:无 - **mode** - 描述:读取obhbase的模式,normal 模式,即仅读取一个版本的数据。 - 必选:是 - 默认值:normal - **version** - 描述:读取obhbase的版本,当前支持oldest、latest模式,分别表示读取最旧和最新的数据。 - 必须:是 - 默认值:oldest 一些注意点: 注:如果配置了**partitionName**,则无需再配置readByPartition,即便配置了也会忽略readByPartition选项,而是仅会读取指定分区的数据。 注:如果配置了**readByPartition**,任务将仅按照分区切分任务,而不会再按照K值进行切分。如果是非分区表,则整张表会被当作一个任务而不会再切分。 ================================================ FILE: obhbasereader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT obhbasereader com.alibaba.datax obhbasereader 0.0.1-SNAPSHOT com.alibaba.datax datax-core ${datax-project-version} provided com.alibaba.datax oceanbasev10reader 0.0.1-SNAPSHOT guava com.google.guava org.apache.zookeeper zookeeper 3.3.2 log4j log4j commons-collections commons-collections 3.2.1 com.oceanbase obkv-hbase-client 0.1.4.2 guava com.google.guava com.google.guava guava ${guava-version} org.json json 20160810 junit junit 4.11 test org.powermock powermock-module-junit4 1.4.10 test org.powermock powermock-api-mockito 1.4.10 test org.mockito mockito-core 1.8.5 test src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: obhbasereader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/obhbasereader target/ obhbasereader-0.0.1-SNAPSHOT.jar plugin/reader/obhbasereader false plugin/reader/obhbasereader/libs runtime ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader; import ch.qos.logback.classic.Level; public final class Constant { public static final String ROWKEY_FLAG = "rowkey"; public static final int DEFAULT_SCAN_CACHE = 256; public static final int DEFAULT_FETCH_SIZE = Integer.MIN_VALUE; public static final int DEFAULT_READ_BATCH_SIZE = 100000; // timeout:24 * 3600 = 86400s public static final String OB_READ_HINT = "/*+READ_CONSISTENCY(weak),QUERY_TIMEOUT(86400000000)*/"; public static final String DEFAULT_DATE_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String DEFAULT_ENCODING = "UTF-8"; public static final String DEFAULT_TIMEZONE = "UTC"; public static final boolean DEFAULT_USE_SQLREADER = true; public static final boolean DEFAULT_USE_ODPMODE = true; public static final String OB_TABLE_CLIENT_PROPERTY = "logging.path.com.alipay.oceanbase-table-client"; public static final String OB_TABLE_HBASE_PROPERTY = "logging.path.com.alipay.oceanbase-table-hbase"; public static final String OB_TABLE_CLIENT_LOG_LEVEL = "logging.level.oceanbase-table-client"; public static final String OB_TABLE_HBASE_LOG_LEVEL = "logging.level.oceanbase-table-hbase"; public static final String OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL = "logging.level.com.alipay.oceanbase-table-client"; public static final String OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL = "logging.level.com.alipay.oceanbase-table-hbase"; public static final String OB_HBASE_LOG_PATH = System.getProperty("datax.home") + "/log/"; public static final String DEFAULT_OB_TABLE_CLIENT_LOG_LEVEL = Level.OFF.toString(); public static final String DEFAULT_OB_TABLE_HBASE_LOG_LEVEL = Level.OFF.toString(); public static final String OBMYSQL_KEYWORDS = "CUME_DIST,DENSE_RANK,EMPTY,FIRST_VALUE,GROUPING,GROUPS,INTERSECT,JSON_TABLE,LAG,LAST_VALUE,LATERAL,LEAD,NTH_VALUE,NTILE,OF,OVER,PERCENT_RANK,RANK,RECURSIVE,ROW_NUMBER,SYSTEM,WINDOW,ACCESSIBLE,ACCOUNT,ACTION,ADD,AFTER,AGAINST,AGGREGATE,ALGORITHM,ALL,ALTER,ALWAYS,ANALYSE,AND,ANY,AS,ASC,ASCII,ASENSITIVE,AT,AUTO_INCREMENT,AUTOEXTEND_SIZE,AVG,AVG_ROW_LENGTH,BACKUP,BEFORE,BEGIN,BETWEEN,BIGINT,BINARY,BINLOG,BIT,BLOB,BLOCK,BOOL,BOOLEAN,BOTH,BTREE,BY,BYTE,CACHE,CALL,CASCADE,CASCADED,CASE,CATALOG_NAME,CHAIN,CHANGE,CHANGED,CHANNEL,CHAR,CHARACTER,CHARSET,CHECK,CHECKSUM,CIPHER,CLASS_ORIGIN,CLIENT,CLOSE,COALESCE,CODE,COLLATE,COLLATION,COLUMN,COLUMN_FORMAT,COLUMN_NAME,COLUMNS,COMMENT,COMMIT,COMMITTED,COMPACT,COMPLETION,COMPRESSED,COMPRESSION,CONCURRENT,CONDITION,CONNECTION,CONSISTENT,CONSTRAINT,CONSTRAINT_CATALOG,CONSTRAINT_NAME,CONSTRAINT_SCHEMA,CONTAINS,CONTEXT,CONTINUE,CONVERT,CPU,CREATE,CROSS,CUBE,CURRENT,CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,CURRENT_USER,CURSOR," + "CURSOR_NAME,DATA,DATABASE,DATABASES,DATAFILE,DATE,DATETIME,DAY,DAY_HOUR,DAY_MICROSECOND,DAY_MINUTE,DAY_SECOND,DEALLOCATE,DEC,DECIMAL,DECLARE,DEFAULT,DEFAULT_AUTH,DEFINER,DELAY_KEY_WRITE,DELAYED,DELETE,DES_KEY_FILE,DESC,DESCRIBE,DETERMINISTIC,DIAGNOSTICS,DIRECTORY,DISABLE,DISCARD,DISK,DISTINCT,DISTINCTROW,DIV,DO,DOUBLE,DROP,DUAL,DUMPFILE,DUPLICATE,DYNAMIC,EACH,ELSE,ELSEIF,ENABLE,ENCLOSED,ENCRYPTION,END,ENDS,ENGINE,ENGINES,ENUM,ERROR,ERRORS,ESCAPE,ESCAPED,EVENT,EVENTS,EVERY,EXCHANGE,EXECUTE,EXISTS,EXIT,EXPANSION,EXPIRE,EXPLAIN,EXPORT,EXTENDED,EXTENT_SIZE,FAST,FAULTS,FETCH,FIELDS,FILE,FILE_BLOCK_SIZE,FILTER,FIRST,FIXED,FLOAT,FLOAT4,FLOAT8,FLUSH,FOLLOWS,FOR,FORCE,FOREIGN,FORMAT,FOUND,FROM,FULL,FULLTEXT,FUNCTION,GENERAL,GENERATED,GEOMETRY,GEOMETRYCOLLECTION,GET,GET_FORMAT,GLOBAL,GRANT,GRANTS,GROUP,GROUP_REPLICATION,HANDLER,HASH,HAVING,HELP,HIGH_PRIORITY,HOST,HOSTS,HOUR,HOUR_MICROSECOND,HOUR_MINUTE,HOUR_SECOND,IDENTIFIED,IF,IGNORE,IGNORE_SERVER_IDS,IMPORT,IN,INDEX," + "INDEXES," + "INFILE,INITIAL_SIZE,INNER,INOUT,INSENSITIVE,INSERT,INSERT_METHOD,INSTALL,INSTANCE,INT,INT1,INT2,INT3,INT4,INT8,INTEGER,INTERVAL,INTO,INVOKE,INVOKER,IO,IO_AFTER_GTIDS,IO_BEFORE_GTIDS,IO_THREAD,IPC,IS,ISOLATION,ISSUER,ITERATE,JOIN,JSON,KEY,KEY_BLOCK_SIZE,KEYS,KILL,LANGUAGE,LAST,LEADING,LEAVE,LEAVES,LEFT,LESS,LEVEL,LIKE,LIMIT,LINEAR,LINES,LINESTRING,LIST,LOAD,LOCAL,LOCALTIME,LOCALTIMESTAMP,LOCK,LOCKS,LOGFILE,LOGS,LONG,LONGBLOB,LONGTEXT,LOOP,LOW_PRIORITY,MASTER,MASTER_AUTO_POSITION,MASTER_BIND,MASTER_CONNECT_RETRY,MASTER_DELAY,MASTER_HEARTBEAT_PERIOD,MASTER_HOST,MASTER_LOG_FILE,MASTER_LOG_POS,MASTER_PASSWORD,MASTER_PORT,MASTER_RETRY_COUNT,MASTER_SERVER_ID,MASTER_SSL,MASTER_SSL_CA,MASTER_SSL_CAPATH,MASTER_SSL_CERT,MASTER_SSL_CIPHER,MASTER_SSL_CRL,MASTER_SSL_CRLPATH,MASTER_SSL_KEY,MASTER_SSL_VERIFY_SERVER_CERT,MASTER_TLS_VERSION,MASTER_USER,MATCH,MAX_CONNECTIONS_PER_HOUR,MAX_QUERIES_PER_HOUR,MAX_ROWS,MAX_SIZE,MAX_STATEMENT_TIME,MAX_UPDATES_PER_HOUR," + "MAX_USER_CONNECTIONS," + "MAXVALUE,MEDIUM,MEDIUMBLOB,MEDIUMINT,MEDIUMTEXT,MEMORY,MERGE,MESSAGE_TEXT,MICROSECOND,MIDDLEINT,MIGRATE,MIN_ROWS,MINUTE,MINUTE_MICROSECOND,MINUTE_SECOND,MOD,MODE,MODIFIES,MODIFY,MONTH,MULTILINESTRING,MULTIPOINT,MULTIPOLYGON,MUTEX,MYSQL_ERRNO,NAME,NAMES,NATIONAL,NATURAL,NCHAR,NDB,NDBCLUSTER,NEVER,NEW,NEXT,NO,NO_WAIT,NO_WRITE_TO_BINLOG,NODEGROUP,NONBLOCKING,NONE,NOT,NUMBER,NUMERIC,NVARCHAR,OFFSET,OLD_PASSWORD,ON,ONE,ONLY,OPEN,OPTIMIZE,OPTIMIZER_COSTS,OPTION,OPTIONALLY,OPTIONS,OR,ORDER,OUT,OUTER,OUTFILE,OWNER,PACK_KEYS,PAGE,PARSE_GCOL_EXPR,PARSER,PARTIAL,PARTITION,PARTITIONING,PARTITIONS,PASSWORD,PHASE,PLUGIN,PLUGIN_DIR,PLUGINS,POINT,POLYGON,PORT,PRECEDES,PRECISION,PREPARE,PRESERVE,PREV,PRIMARY,PRIVILEGES,PROCEDURE,PROCESSLIST,PROFILE,PROFILES,PROXY,PURGE,QUARTER,QUERY,QUICK,RANGE,READ,READ_ONLY,READ_WRITE,READS,REAL,REBUILD,RECOVER,REDO_BUFFER_SIZE,REDOFILE,REDUNDANT,REFERENCES,REGEXP,RELAY,RELAY_LOG_FILE,RELAY_LOG_POS,RELAY_THREAD,RELAYLOG,RELEASE,RELOAD,REMOVE," + "RENAME,REORGANIZE,REPAIR,REPEAT,REPEATABLE,REPLACE,REPLICATE_DO_DB,REPLICATE_DO_TABLE,REPLICATE_IGNORE_DB,REPLICATE_IGNORE_TABLE,REPLICATE_REWRITE_DB,REPLICATE_WILD_DO_TABLE,REPLICATE_WILD_IGNORE_TABLE,REPLICATION,REQUIRE,RESET,RESIGNAL,RESTORE,RESTRICT,RESUME,RETURN,RETURNED_SQLSTATE,RETURNS,REVERSE,REVOKE,RIGHT,RLIKE,ROLLBACK,ROLLUP,ROTATE,ROUTINE,ROW,ROW_COUNT,ROW_FORMAT,ROWS,RTREE,SAVEPOINT,SCHEDULE,SCHEMA,SCHEMA_NAME,SCHEMAS,SECOND,SECOND_MICROSECOND,SECURITY,SELECT,SENSITIVE,SEPARATOR,SERIAL,SERIALIZABLE,SERVER,SESSION,SET,SHARE,SHOW,SHUTDOWN,SIGNAL,SIGNED,SIMPLE,SLAVE,SLOW,SMALLINT,SNAPSHOT,SOCKET,SOME,SONAME,SOUNDS,SOURCE,SPATIAL,SPECIFIC,SQL,SQL_AFTER_GTIDS,SQL_AFTER_MTS_GAPS,SQL_BEFORE_GTIDS,SQL_BIG_RESULT,SQL_BUFFER_RESULT,SQL_CACHE,SQL_CALC_FOUND_ROWS,SQL_NO_CACHE,SQL_SMALL_RESULT,SQL_THREAD,SQL_TSI_DAY,SQL_TSI_HOUR,SQL_TSI_MINUTE,SQL_TSI_MONTH,SQL_TSI_QUARTER,SQL_TSI_SECOND,SQL_TSI_WEEK,SQL_TSI_YEAR,SQLEXCEPTION,SQLSTATE,SQLWARNING,SSL,STACKED," + "START," + "STARTING,STARTS,STATS_AUTO_RECALC,STATS_PERSISTENT,STATS_SAMPLE_PAGES,STATUS,STOP,STORAGE,STORED,STRAIGHT_JOIN,STRING,SUBCLASS_ORIGIN,SUBJECT,SUBPARTITION,SUBPARTITIONS,SUPER,SUSPEND,SWAPS,SWITCHES,TABLE,TABLE_CHECKSUM,TABLE_NAME,TABLES,TABLESPACE,TEMPORARY,TEMPTABLE,TERMINATED,TEXT,THAN,THEN,TIME,TIMESTAMP,TIMESTAMPADD,TIMESTAMPDIFF,TINYBLOB,TINYINT,TINYTEXT,TO,TRAILING,TRANSACTION,TRIGGER,TRIGGERS,TRUNCATE,TYPE,TYPES,UNCOMMITTED,UNDEFINED,UNDO,UNDO_BUFFER_SIZE,UNDOFILE,UNICODE,UNINSTALL,UNION,UNIQUE,UNKNOWN,UNLOCK,UNSIGNED,UNTIL,UPDATE,UPGRADE,USAGE,USE,USE_FRM,USER,USER_RESOURCES,USING,UTC_DATE,UTC_TIME,UTC_TIMESTAMP,VALIDATION,VALUE,VALUES,VARBINARY,VARCHAR,VARCHARACTER,VARIABLES,VARYING,VIEW,VIRTUAL,WAIT,WARNINGS,WEEK,WEIGHT_STRING,WHEN,WHERE,WHILE,WITH,WITHOUT,WORK,WRAPPER,WRITE,X509,XA,XID,XML,XOR,YEAR,YEAR_MONTH,ZEROFILL,FALSE,TRUE"; } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/HTableManager.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader; import com.alipay.oceanbase.hbase.OHTable; import org.apache.hadoop.conf.Configuration; import java.io.IOException; public final class HTableManager { public static OHTable createHTable(Configuration config, String tableName) throws IOException { return new OHTable(config, tableName); } public static void closeHTable(OHTable hTable) throws IOException { if (hTable != null) { hTable.close(); } } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/HbaseColumnCell.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader; import com.alibaba.datax.common.base.BaseObject; import com.alibaba.datax.plugin.reader.obhbasereader.enums.ColumnType; import com.alibaba.datax.plugin.reader.obhbasereader.util.ObHbaseReaderUtil; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.hbase.util.Bytes; /** * 描述 hbasereader 插件中,column 配置中的一个单元项实体 */ public class HbaseColumnCell extends BaseObject { private ColumnType columnType; // columnName 格式为:列族:列名 private String columnName; private byte[] cf; private byte[] qualifier; //对于常量类型,其常量值放到 columnValue 里 private String columnValue; //当配置了 columnValue 时,isConstant=true(这个成员变量是用于方便使用本类的地方判断是否是常量类型字段) private boolean isConstant; // 只在类型是时间类型时,才会设置该值,无默认值。形式如:yyyy-MM-dd HH:mm:ss private String dateformat; private HbaseColumnCell(Builder builder) { this.columnType = builder.columnType; //columnName 和 columnValue 必须有一个为 null Validate.isTrue(builder.columnName == null || builder.columnValue == null, "In obhbasereader, column cannot configure both column name and column value. Choose one of them."); //columnName 和 columnValue 不能都为 null Validate.isTrue(builder.columnName != null || builder.columnValue != null, "In obhbasereader, column cannot configure both column name and column value. Choose one of them."); if (builder.columnName != null) { this.isConstant = false; this.columnName = builder.columnName; // 如果 columnName 不是 rowkey,则必须配置为:列族:列名 格式 if (!ObHbaseReaderUtil.isRowkeyColumn(this.columnName)) { String promptInfo = "In obhbasereader, the column configuration format of column should be: 'family:column'. The column you configured is wrong:" + this.columnName; String[] cfAndQualifier = this.columnName.split(":"); Validate.isTrue(cfAndQualifier.length == 2 && StringUtils.isNotBlank(cfAndQualifier[0]) && StringUtils.isNotBlank(cfAndQualifier[1]), promptInfo); this.cf = Bytes.toBytes(cfAndQualifier[0].trim()); this.qualifier = Bytes.toBytes(cfAndQualifier[1].trim()); } } else { this.isConstant = true; this.columnValue = builder.columnValue; } if (builder.dateformat != null) { this.dateformat = builder.dateformat; } } public ColumnType getColumnType() { return columnType; } public String getColumnName() { return columnName; } public byte[] getCf() { return cf; } public byte[] getQualifier() { return qualifier; } public String getDateformat() { return dateformat; } public String getColumnValue() { return columnValue; } public boolean isConstant() { return isConstant; } // 内部 builder 类 public static class Builder { private ColumnType columnType; private String columnName; private String columnValue; private String dateformat; public Builder(ColumnType columnType) { this.columnType = columnType; } public Builder columnName(String columnName) { this.columnName = columnName; return this; } public Builder columnValue(String columnValue) { this.columnValue = columnValue; return this; } public Builder dateformat(String dateformat) { this.dateformat = dateformat; return this; } public HbaseColumnCell build() { return new HbaseColumnCell(this); } } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/HbaseReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader; import com.alibaba.datax.common.spi.ErrorCode; public enum HbaseReaderErrorCode implements ErrorCode { REQUIRED_VALUE("ObHbaseReader-00", "Missing required parameters."), ILLEGAL_VALUE("ObHbaseReader-01", "Illegal configuration."), PREPAR_READ_ERROR("ObHbaseReader-02", "Preparing to read ObHBase error."), SPLIT_ERROR("ObHbaseReader-03", "Splitting ObHBase table error."), INIT_TABLE_ERROR("ObHbaseReader-04", "Initializing ObHBase extraction table error"), PARSE_COLUMN_ERROR("ObHbaseReader-05", "Parse column failed."), READ_ERROR("ObHbaseReader-06", "Read ObHBase error."); private final String code; private final String description; private HbaseReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/Key.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader; public final class Key { public final static String HBASE_CONFIG = "hbaseConfig"; /** * mode 可以取 normal 或者 multiVersionFixedColumn 或者 multiVersionDynamicColumn 三个值,无默认值。 *

* normal 配合 column(Map 结构的)使用 *

* multiVersionFixedColumn 配合 maxVersion,tetradType, column(List 结构的)使用 *

* multiVersionDynamicColumn 配合 maxVersion,tetradType, columnFamily(List 结构的)使用 */ public final static String MODE = "mode"; /** * 配合 mode = multiVersion 时使用,指明需要读取的版本个数。无默认值 * -1 表示去读全部版本 * 不能为0,1 * >1 表示最多读取对应个数的版本数(不能超过 Integer 的最大值) */ public final static String MAX_VERSION = "maxVersion"; /** * 多版本情况下,必须配置 四元组的类型(rowkey,column,timestamp,value) */ public final static String TETRAD_TYPE = "tetradType"; /** * 默认为 utf8 */ public final static String ENCODING = "encoding"; public final static String TABLE = "table"; public final static String USERNAME = "username"; public final static String OB_SYS_USERNAME = "obSysUser"; public final static String CONFIG_URL = "obConfigUrl"; public final static String ODP_HOST = "odpHost"; public final static String ODP_PORT = "odpPort"; public final static String DB_NAME = "dbName"; public final static String PASSWORD = "password"; public final static String OB_SYS_PASSWORD = "obSysPassword"; public final static String COLUMN_FAMILY = "columnFamily"; public final static String COLUMN = "column"; public final static String START_ROWKEY = "startRowkey"; public final static String END_ROWKEY = "endRowkey"; public final static String IS_BINARY_ROWKEY = "isBinaryRowkey"; public final static String SCAN_CACHE = "scanCache"; public final static String RS_URL = "rsUrl"; public final static String MAX_ACTIVE_CONNECTION = "maxActiveConnection"; public final static int DEFAULT_MAX_ACTIVE_CONNECTION = 2000; public final static String TIMEOUT = "timeout"; public final static long DEFAULT_TIMEOUT = 30; public final static String PARTITION_NAME = "partitionName"; public final static String JDBC_URL = "jdbcUrl"; public final static String TIMEZONE = "timezone"; public final static String FETCH_SIZE = "fetchSize"; public final static String READ_BATCH_SIZE = "readBatchSize"; public final static String SESSION = "session"; public final static String READER_HINT = "readerHint"; public final static String QUERY_SQL = "querySql"; public final static String SAMPLE_PERCENTAGE = "samplePercentage"; // 是否使用独立密码 public final static String USE_SPECIAL_SECRET = "useSpecialSecret"; public final static String USE_SQL_READER = "useSqlReader"; public final static String USE_ODP_MODE = "useOdpMode"; public final static String RANGE = "range"; public final static String READ_BY_PARTITION = "readByPartition"; } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/LocalStrings.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/LocalStrings_en_US.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/ObHbaseReader.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_OB_TABLE_CLIENT_LOG_LEVEL; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_OB_TABLE_HBASE_LOG_LEVEL; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_USE_ODPMODE; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_HBASE_LOG_PATH; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_TABLE_CLIENT_LOG_LEVEL; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_TABLE_CLIENT_PROPERTY; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_TABLE_HBASE_LOG_LEVEL; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_TABLE_HBASE_PROPERTY; import static org.apache.commons.lang3.StringUtils.EMPTY; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.util.ObVersion; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.TableExpandUtil; import com.alibaba.datax.plugin.reader.obhbasereader.enums.ModeType; import com.alibaba.datax.plugin.reader.obhbasereader.ext.ServerConnectInfo; import com.alibaba.datax.plugin.reader.obhbasereader.task.AbstractHbaseTask; import com.alibaba.datax.plugin.reader.obhbasereader.task.SQLNormalModeReader; import com.alibaba.datax.plugin.reader.obhbasereader.task.ScanMultiVersionReader; import com.alibaba.datax.plugin.reader.obhbasereader.task.ScanNormalModeReader; import com.alibaba.datax.plugin.reader.obhbasereader.util.HbaseSplitUtil; import com.alibaba.datax.plugin.reader.obhbasereader.util.ObHbaseReaderUtil; import com.alibaba.datax.plugin.reader.obhbasereader.util.SqlReaderSplitUtil; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; import com.google.common.base.Preconditions; import java.sql.PreparedStatement; import java.sql.ResultSet; import org.apache.commons.collections.CollectionUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.Set; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors; /** * ObHbaseReader 支持分库分表 * 仅支持ob3.x及以上版本 */ public class ObHbaseReader extends Reader { public static class Job extends Reader.Job { static private final String ACCESS_DENIED_ERROR = "Access denied for user"; private static Logger LOG = LoggerFactory.getLogger(ObHbaseReader.class); private Configuration originalConfig; @Override public void init() { if (System.getProperty(OB_TABLE_CLIENT_PROPERTY) == null) { LOG.info(OB_TABLE_CLIENT_PROPERTY + " not set"); System.setProperty(OB_TABLE_CLIENT_PROPERTY, OB_HBASE_LOG_PATH); } if (System.getProperty(OB_TABLE_HBASE_PROPERTY) == null) { LOG.info(OB_TABLE_HBASE_PROPERTY + " not set"); System.setProperty(OB_TABLE_HBASE_PROPERTY, OB_HBASE_LOG_PATH); } if (System.getProperty(OB_TABLE_CLIENT_LOG_LEVEL) == null) { LOG.info(OB_TABLE_CLIENT_LOG_LEVEL + " not set"); System.setProperty(OB_TABLE_CLIENT_LOG_LEVEL, DEFAULT_OB_TABLE_CLIENT_LOG_LEVEL); } if (System.getProperty(OB_TABLE_HBASE_LOG_LEVEL) == null) { LOG.info(OB_TABLE_HBASE_LOG_LEVEL + " not set"); System.setProperty(OB_TABLE_HBASE_LOG_LEVEL, DEFAULT_OB_TABLE_HBASE_LOG_LEVEL); } if (System.getProperty(OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL) == null) { LOG.info(OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL + " not set"); System.setProperty(OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL, DEFAULT_OB_TABLE_CLIENT_LOG_LEVEL); } if (System.getProperty(OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL) == null) { LOG.info(OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL + " not set"); System.setProperty(OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL, DEFAULT_OB_TABLE_HBASE_LOG_LEVEL); } LOG.info("{} is set to {}, {} is set to {}", OB_TABLE_CLIENT_PROPERTY, OB_HBASE_LOG_PATH, OB_TABLE_HBASE_PROPERTY, OB_HBASE_LOG_PATH); this.originalConfig = super.getPluginJobConf(); ObHbaseReaderUtil.doPretreatment(originalConfig); List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); // 逻辑表配置 Preconditions.checkArgument(CollectionUtils.isNotEmpty(conns), "connection information is empty."); dealLogicConnAndTable(conns); if (LOG.isDebugEnabled()) { LOG.debug("After init(), now originalConfig is:\n{}\n", this.originalConfig); } } @Override public void destroy() { } private void dealLogicConnAndTable(List conns) { String unifiedUsername = originalConfig.getString(Key.USERNAME); String unifiedPassword = originalConfig.getString(Key.PASSWORD); boolean useSqlReader = originalConfig.getBool(Key.USE_SQL_READER, com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_USE_SQLREADER); boolean checkSlave = originalConfig.getBool(com.alibaba.datax.plugin.rdbms.reader.Key.CHECK_SLAVE, false); Set keywords = Arrays.stream(com.alibaba.datax.plugin.reader.obhbasereader.Constant.OBMYSQL_KEYWORDS.split(",")).collect(Collectors.toSet()); List preSql = originalConfig.getList(com.alibaba.datax.plugin.rdbms.reader.Key.PRE_SQL, String.class); int tableNum = 0; for (int i = 0, len = conns.size(); i < len; i++) { Configuration connConf = Configuration.from(conns.get(i).toString()); String curUsername = connConf.getString(Key.USERNAME, unifiedUsername); Preconditions.checkArgument(StringUtils.isNotEmpty(curUsername), "username is empty."); String curPassword = connConf.getString(Key.PASSWORD, unifiedPassword); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.USERNAME), curUsername); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.PASSWORD), curPassword); List jdbcUrls = connConf.getList(Key.JDBC_URL, new ArrayList<>(), String.class); String jdbcUrl; if (useSqlReader) { // sql模式下,jdbcUrl必须配置,只有使用sql模式的情况才检查地址 Preconditions.checkArgument(CollectionUtils.isNotEmpty(jdbcUrls), "if using sql mode, jdbcUrl is needed"); jdbcUrl = DBUtil.chooseJdbcUrlWithoutRetry(DataBaseType.MySql, jdbcUrls, curUsername, curPassword, preSql, checkSlave); jdbcUrl = DataBaseType.MySql.appendJDBCSuffixForReader(jdbcUrl); // 回写到connection[i].jdbcUrl originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.JDBC_URL), jdbcUrl); LOG.info("Available jdbcUrl:{}.", jdbcUrl); } else { jdbcUrl = jdbcUrls.get(0); jdbcUrl = StringUtils.isNotBlank(jdbcUrl) ? DataBaseType.MySql.appendJDBCSuffixForReader(jdbcUrl) : EMPTY; checkAndSetHbaseConnConf(jdbcUrl, curUsername, curPassword, connConf, i); } // table 方式 // 对每一个connection 上配置的table 项进行解析(已对表名称进行了 ` 处理的) List tables = connConf.getList(Key.TABLE, String.class); List expandedTables = TableExpandUtil.expandTableConf(DataBaseType.MySql, tables); if (expandedTables.isEmpty()) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "The specified table list is empty."); } for (int ti = 0; ti < expandedTables.size(); ti++) { String tableName = expandedTables.get(ti); if (keywords.contains(tableName.toUpperCase())) { expandedTables.set(ti, "`" + tableName + "`"); } } tableNum += expandedTables.size(); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.TABLE), expandedTables); } if (tableNum == 0) { // 分库分表读,未匹配到可以抽取的表 LOG.error("sharding rule result is empty."); throw DataXException.asDataXException("No tables were matched"); } originalConfig.set(Constant.TABLE_NUMBER_MARK, tableNum); } /** * In public cloud, only odp mode can be used. * In private cloud, both odp mode and ocp mode can be used. * * @param jdbcUrl * @param curUsername * @param curPassword * @param connConf */ private void checkAndSetHbaseConnConf(String jdbcUrl, String curUsername, String curPassword, Configuration connConf, int curIndex) { ServerConnectInfo serverConnectInfo = new ServerConnectInfo(jdbcUrl, curUsername, curPassword); if (!originalConfig.getBool(Key.USE_ODP_MODE, false)) { // Normally, only need to query at first time // In ocp mode, dbName, configUrl, sysUser and sysPass are needed. String sysUser = connConf.getString(Key.OB_SYS_USERNAME, originalConfig.getString(Key.OB_SYS_USERNAME)); String sysPass = connConf.getString(Key.OB_SYS_PASSWORD, originalConfig.getString(Key.OB_SYS_PASSWORD)); serverConnectInfo.setSysUser(sysUser); serverConnectInfo.setSysPass(sysPass); String configUrl = connConf.getString(Key.CONFIG_URL, originalConfig.getString(Key.CONFIG_URL)); if (StringUtils.isBlank(configUrl)) { configUrl = queryRsUrl(serverConnectInfo); } originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, curIndex, Key.USERNAME), curUsername); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, curIndex, Key.OB_SYS_USERNAME), serverConnectInfo.sysUser); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, curIndex, Key.OB_SYS_PASSWORD), serverConnectInfo.sysPass); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, curIndex, Key.CONFIG_URL), configUrl); } else { // In odp mode, dbName, odp host and odp port are needed. String odpHost = connConf.getString(Key.ODP_HOST, serverConnectInfo.host); String odpPort = connConf.getString(Key.ODP_PORT, serverConnectInfo.port); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, curIndex, Key.ODP_HOST), odpHost); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, curIndex, Key.ODP_PORT), odpPort); } originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, curIndex, Key.DB_NAME), serverConnectInfo.databaseName); } private String queryRsUrl(ServerConnectInfo serverInfo) { Preconditions.checkArgument(checkVersionAfterV3(serverInfo.jdbcUrl, serverInfo.getFullUserName(), serverInfo.password), "ob before 3.x is not supported."); String configUrl = originalConfig.getString(Key.CONFIG_URL, null); if (configUrl == null) { try { Connection conn = null; int retry = 0; final String sysJDBCUrl = serverInfo.jdbcUrl.replace(serverInfo.databaseName, "oceanbase"); do { try { if (retry > 0) { int sleep = retry > 9 ? 500 : 1 << retry; try { TimeUnit.SECONDS.sleep(sleep); } catch (InterruptedException e) { } LOG.warn("retry fetch RsUrl the {} times", retry); } conn = DBUtil.getConnection(DataBaseType.OceanBase, sysJDBCUrl, serverInfo.sysUser, serverInfo.sysPass); String sql = "show parameters like 'obconfig_url'"; LOG.info("query param: {}", sql); PreparedStatement stmt = conn.prepareStatement(sql); ResultSet result = stmt.executeQuery(); if (result.next()) { configUrl = result.getString("Value"); } if (StringUtils.isNotBlank(configUrl)) { break; } } catch (Exception e) { ++retry; LOG.warn("fetch root server list(rsList) error {}", e.getMessage()); } finally { DBUtil.closeDBResources(null, conn); } } while (retry < 3); LOG.info("configure url is: " + configUrl); originalConfig.set(Key.CONFIG_URL, configUrl); } catch (Exception e) { LOG.error("Fail to get configure url: {}", e.getMessage(), e); throw DataXException.asDataXException(HbaseReaderErrorCode.REQUIRED_VALUE, "未配置obConfigUrl,且无法获取obConfigUrl"); } } return configUrl; } @Override public void prepare() { } @Override public void post() { } @Override public List split(int adviceNumber) { Map hbaseColumnCells = ObHbaseReaderUtil.parseColumn(originalConfig.getList(Key.COLUMN, Map.class)); if (hbaseColumnCells.size() == 0) { LOG.error("no column cells specified."); throw new RuntimeException("no column cells specified"); } String columnFamily = ObHbaseReaderUtil.parseColumnFamily(hbaseColumnCells.values()); Preconditions.checkArgument(StringUtils.isNotEmpty(columnFamily), "column family is empty."); List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); Preconditions.checkArgument(conns != null && !conns.isEmpty(), "connection information is necessary."); return splitLogicTables(adviceNumber, conns, columnFamily); } private List splitLogicTables(int adviceNumber, List conns, String columnFamily) { // adviceNumber这里是channel数量大小, 即datax并发task数量 // eachTableShouldSplittedNumber是单表应该切分的份数 int eachTableShouldSplittedNumber = (int) Math.ceil(1.0 * adviceNumber / originalConfig.getInt(Constant.TABLE_NUMBER_MARK)); boolean useSqlReader = originalConfig.getBool(Key.USE_SQL_READER, com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_USE_SQLREADER); boolean odpMode = originalConfig.getBool(Key.USE_ODP_MODE, DEFAULT_USE_ODPMODE); boolean readByPartition = originalConfig.getBool(Key.READ_BY_PARTITION, false); List splittedConfigs = new ArrayList<>(); for (int i = 0, len = conns.size(); i < len; i++) { Configuration sliceConfig = originalConfig.clone(); Configuration connConf = Configuration.from(conns.get(i).toString()); copyConnConfByMode(useSqlReader, odpMode, sliceConfig, connConf); // 说明是配置的 table 方式 // 已在之前进行了扩展和`处理,可以直接使用 List tables = connConf.getList(Key.TABLE, String.class); Validate.isTrue(null != tables && !tables.isEmpty(), "error in your configuration for the reading database table."); int tempEachTableShouldSplittedNumber = eachTableShouldSplittedNumber; if (tables.size() == 1) { Integer splitFactor = originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Key.SPLIT_FACTOR, Constant.SPLIT_FACTOR); tempEachTableShouldSplittedNumber = eachTableShouldSplittedNumber * splitFactor; } for (String table : tables) { Configuration tempSlice; tempSlice = sliceConfig.clone(); tempSlice.set(Key.TABLE, table); splittedConfigs.addAll( useSqlReader ? SqlReaderSplitUtil.splitSingleTable(tempSlice, table, columnFamily, tempEachTableShouldSplittedNumber, readByPartition) : HbaseSplitUtil.split(tempSlice)); } } return splittedConfigs; } private void copyConnConfByMode(boolean useSqlReader, boolean odpMode, Configuration targetConf, Configuration sourceConnConf) { String username = sourceConnConf.getNecessaryValue(Key.USERNAME, DBUtilErrorCode.REQUIRED_VALUE); targetConf.set(Key.USERNAME, username); String password = sourceConnConf.getNecessaryValue(Key.PASSWORD, DBUtilErrorCode.REQUIRED_VALUE); targetConf.set(Key.PASSWORD, password); if (useSqlReader) { String jdbcUrl = sourceConnConf.getNecessaryValue(Key.JDBC_URL, DBUtilErrorCode.REQUIRED_VALUE); targetConf.set(Key.JDBC_URL, jdbcUrl); } else if (odpMode) { String dbName = sourceConnConf.getNecessaryValue(Key.DB_NAME, DBUtilErrorCode.REQUIRED_VALUE); targetConf.set(Key.DB_NAME, dbName); String odpHost = sourceConnConf.getNecessaryValue(Key.ODP_HOST, DBUtilErrorCode.REQUIRED_VALUE); targetConf.set(Key.ODP_HOST, odpHost); String odpPort = sourceConnConf.getNecessaryValue(Key.ODP_PORT, DBUtilErrorCode.REQUIRED_VALUE); targetConf.set(Key.ODP_PORT, odpPort); } else { String dbName = sourceConnConf.getNecessaryValue(Key.DB_NAME, DBUtilErrorCode.REQUIRED_VALUE); targetConf.set(Key.DB_NAME, dbName); String sysUser = sourceConnConf.getNecessaryValue(Key.OB_SYS_USERNAME, DBUtilErrorCode.REQUIRED_VALUE); targetConf.set(Key.OB_SYS_USERNAME, sysUser); String sysPass = sourceConnConf.getString(Key.OB_SYS_PASSWORD); targetConf.set(Key.OB_SYS_PASSWORD, sysPass); } targetConf.remove(Constant.CONN_MARK); } private boolean checkVersionAfterV3(String jdbcUrl, String username, String password) { int retryLimit = 3; int retryCount = 0; Connection conn = null; while (retryCount++ <= retryLimit) { try { conn = DBUtil.getConnectionWithoutRetry(DataBaseType.MySql, jdbcUrl, username, password); ObVersion obVersion = ObReaderUtils.getObVersion(conn); return ObVersion.V3.compareTo(obVersion) <= 0; } catch (Exception e) { LOG.error("fail to check ob version, will retry: " + e.getMessage()); if (e.getMessage().contains(ACCESS_DENIED_ERROR)) { throw new RuntimeException(e); } try { TimeUnit.SECONDS.sleep(1); } catch (Exception ex) { LOG.error("interrupted while waiting for retry."); } } finally { DBUtil.closeDBResources(null, conn); } } return false; } } public static class Task extends Reader.Task { private static Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration taskConfig; private AbstractHbaseTask hbaseTaskProxy; @Override public void init() { this.taskConfig = super.getPluginJobConf(); String mode = this.taskConfig.getString(Key.MODE); ModeType modeType = ModeType.getByTypeName(mode); boolean useSqlReader = this.taskConfig.getBool(Key.USE_SQL_READER, com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_USE_SQLREADER); LOG.info("init reader with mode: " + modeType); switch (modeType) { case Normal: this.hbaseTaskProxy = useSqlReader ? new SQLNormalModeReader(this.taskConfig) : new ScanNormalModeReader(this.taskConfig); break; case MultiVersionFixedColumn: this.hbaseTaskProxy = new ScanMultiVersionReader(this.taskConfig); break; default: throw DataXException.asDataXException(HbaseReaderErrorCode.ILLEGAL_VALUE, "This type of mode is not supported by hbasereader:" + modeType); } } @Override public void destroy() { if (this.hbaseTaskProxy != null) { try { this.hbaseTaskProxy.close(); } catch (Exception e) { // } } } @Override public void prepare() { try { this.hbaseTaskProxy.prepare(); } catch (Exception e) { throw DataXException.asDataXException(HbaseReaderErrorCode.PREPAR_READ_ERROR, e); } } @Override public void post() { super.post(); } @Override public void startRead(RecordSender recordSender) { Record record = recordSender.createRecord(); boolean fetchOK; int retryTimes = 0; int maxRetryTimes = 3; while (true) { try { // TODO check exception fetchOK = this.hbaseTaskProxy.fetchLine(record); } catch (Exception e) { LOG.info("fetch record failed. reason: {}.", e.getMessage(), e); super.getTaskPluginCollector().collectDirtyRecord(record, e); if (retryTimes++ > maxRetryTimes) { throw DataXException.asDataXException(HbaseReaderErrorCode.READ_ERROR, "read from obhbase failed", e); } record = recordSender.createRecord(); continue; } if (fetchOK) { recordSender.sendToWriter(record); record = recordSender.createRecord(); } else { break; } } recordSender.flush(); } } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/enums/ColumnType.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.enums; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseReaderErrorCode; import java.util.Arrays; /** * 只对 normal 模式读取时有用,多版本读取时,不存在列类型的 */ public enum ColumnType { STRING("string"), BINARY_STRING("binarystring"), BYTES("bytes"), BOOLEAN("boolean"), SHORT("short"), INT("int"), LONG("long"), FLOAT("float"), DOUBLE("double"), DATE("date"); private String typeName; ColumnType(String typeName) { this.typeName = typeName; } public static ColumnType getByTypeName(String typeName) { for (ColumnType columnType : values()) { if (columnType.typeName.equalsIgnoreCase(typeName)) { return columnType; } } throw DataXException.asDataXException(HbaseReaderErrorCode.ILLEGAL_VALUE, String.format("The type %s is not supported by hbasereader, currently supported type is:%s .", typeName, Arrays.asList(values()))); } @Override public String toString() { return this.typeName; } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/enums/FetchVersion.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.enums; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseReaderErrorCode; import java.util.Arrays; import java.util.Optional; import java.util.stream.Stream; public enum FetchVersion { OLDEST("oldest"), LATEST("latest"); private final String version; FetchVersion(String version) { this.version = version; } public static FetchVersion getByDesc(String name) { Optional result = Stream.of(values()).filter(v -> v.version.equalsIgnoreCase(name)) .findFirst(); return result.orElseThrow(() -> { return DataXException.asDataXException(HbaseReaderErrorCode.ILLEGAL_VALUE, String.format("obHBasereader 不支持该类型:%s, 目前支持的类型是:%s", name, Arrays.asList(values()))); }); } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/enums/ModeType.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.enums; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseReaderErrorCode; import java.util.Arrays; public enum ModeType { Normal("normal"), MultiVersionFixedColumn("multiVersionFixedColumn"), MultiVersionDynamicColumn("multiVersionDynamicColumn"), ; private String mode; ModeType(String mode) { this.mode = mode.toLowerCase(); } public static ModeType getByTypeName(String modeName) { for (ModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException( HbaseReaderErrorCode.ILLEGAL_VALUE, String.format("The mode type is not supported by hbasereader:%s, and the currently supported mode type is:%s", modeName, Arrays.asList(values()))); } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/ext/ServerConnectInfo.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.ext; import com.google.common.base.Preconditions; import java.util.regex.Matcher; import java.util.regex.Pattern; import static org.apache.commons.lang3.StringUtils.EMPTY; public class ServerConnectInfo { public String clusterName; public String tenantName; // userName doesn't contain tenantName or clusterName public String userName; public String password; public String databaseName; public String ipPort; public String jdbcUrl; public String host; public String port; public boolean publicCloud; public int rpcPort; public String sysUser; public String sysPass; /** * * @param jdbcUrl format is jdbc:oceanbase//ip:port * @param username format is cluster:tenant:username or username@tenant#cluster or user@tenant or user * @param password */ public ServerConnectInfo(final String jdbcUrl, final String username, final String password) { this(jdbcUrl, username, password, null, null); } public ServerConnectInfo(final String jdbcUrl, final String username, final String password, final String sysUser, final String sysPass) { if (jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); Preconditions.checkArgument(ss.length == 3, "jdbc url format is not correct:" + jdbcUrl); this.userName = username; this.clusterName = ss[1].trim().split(":")[0]; this.tenantName = ss[1].trim().split(":")[1]; this.jdbcUrl = ss[2]; } else { this.jdbcUrl = jdbcUrl; } this.password = password; this.sysUser = sysUser; this.sysPass = sysPass; parseJdbcUrl(jdbcUrl); parseFullUserName(username); } private void parseJdbcUrl(final String jdbcUrl) { Pattern pattern = Pattern.compile("//([\\w\\.\\-]+:\\d+)/([\\w-]+)\\?"); Matcher matcher = pattern.matcher(jdbcUrl); if (matcher.find()) { String ipPort = matcher.group(1); String dbName = matcher.group(2); this.ipPort = ipPort; String[] hostPort = ipPort.split(":"); this.host = hostPort[0]; this.port = hostPort[1]; this.databaseName = dbName; this.publicCloud = host.endsWith("aliyuncs.com"); } else { throw new RuntimeException("Invalid argument:" + jdbcUrl); } } private void parseFullUserName(final String fullUserName) { int tenantIndex = fullUserName.indexOf("@"); int clusterIndex = fullUserName.indexOf("#"); // 适用于jdbcUrl以||_dsc_ob10_dsc_开头的场景 if (fullUserName.contains(":") && tenantIndex < 0) { String[] names = fullUserName.split(":"); if (names.length != 3) { throw new RuntimeException("invalid argument: " + fullUserName); } else { this.clusterName = names[0]; this.tenantName = names[1]; this.userName = names[2]; } } else if (tenantIndex < 0) { // 适用于short jdbcUrl,且username中不含租户名(主要是公有云场景,此场景下不计算分区) this.userName = fullUserName; this.clusterName = EMPTY; this.tenantName = EMPTY; } else { // 适用于short jdbcUrl,且username中含租户名 this.userName = fullUserName.substring(0, tenantIndex); if (clusterIndex < 0) { this.clusterName = EMPTY; this.tenantName = fullUserName.substring(tenantIndex + 1); } else { this.clusterName = fullUserName.substring(clusterIndex + 1); this.tenantName = fullUserName.substring(tenantIndex + 1, clusterIndex); } } } @Override public String toString() { return "ServerConnectInfo{" + "clusterName='" + clusterName + '\'' + ", tenantName='" + tenantName + '\'' + ", userName='" + userName + '\'' + ", password='" + password + '\'' + ", databaseName='" + databaseName + '\'' + ", ipPort='" + ipPort + '\'' + ", jdbcUrl='" + jdbcUrl + '\'' + ", publicCloud=" + publicCloud + ", rpcPort=" + rpcPort + '}'; } public String getFullUserName() { StringBuilder builder = new StringBuilder(); builder.append(userName); if (publicCloud || (rpcPort != 0 && EMPTY.equals(clusterName))) { return builder.toString(); } if (!EMPTY.equals(tenantName)) { builder.append("@").append(tenantName); } if (!EMPTY.equals(clusterName)) { builder.append("#").append(clusterName); } if (EMPTY.equals(this.clusterName) && EMPTY.equals(this.tenantName)) { return this.userName; } return builder.toString(); } public void setRpcPort(int rpcPort) { this.rpcPort = rpcPort; } public void setSysUser(String sysUser) { this.sysUser = sysUser; } public void setSysPass(String sysPass) { this.sysPass = sysPass; } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/task/AbstractHbaseTask.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.task; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.obhbasereader.Constant; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseColumnCell; import com.alibaba.datax.plugin.reader.obhbasereader.Key; import com.alibaba.datax.plugin.reader.obhbasereader.enums.ModeType; import com.alibaba.datax.plugin.reader.obhbasereader.util.ObHbaseReaderUtil; import java.io.IOException; import java.util.HashMap; import java.util.Map; public abstract class AbstractHbaseTask { protected String encoding; protected String timezone = null; protected Map hbaseColumnCellMap; // 常量字段 protected Map constantMap; protected ModeType modeType; public AbstractHbaseTask() { } public AbstractHbaseTask(Configuration configuration) { this.timezone = configuration.getString(Key.TIMEZONE, Constant.DEFAULT_TIMEZONE); this.encoding = configuration.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); String mode = configuration.getString(Key.MODE, "Normal"); this.modeType = ModeType.getByTypeName(mode); this.constantMap = new HashMap<>(); this.hbaseColumnCellMap = ObHbaseReaderUtil.parseColumn(configuration.getList(Key.COLUMN, Map.class), constantMap, encoding, timezone); } public abstract void prepare() throws Exception; public abstract boolean fetchLine(Record record) throws Exception; public abstract void close() throws IOException; } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/task/AbstractScanReader.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.task; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.obhbasereader.Constant; import com.alibaba.datax.plugin.reader.obhbasereader.HTableManager; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseColumnCell; import com.alibaba.datax.plugin.reader.obhbasereader.Key; import com.alibaba.datax.plugin.reader.obhbasereader.util.ObHbaseReaderUtil; import com.alipay.oceanbase.hbase.OHTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; public abstract class AbstractScanReader extends AbstractHbaseTask { private static Logger LOG = LoggerFactory.getLogger(AbstractScanReader.class); protected OHTable ohtable; protected Result lastResult = null; protected Scan scan; protected ResultScanner resultScanner; protected int maxVersion; private int scanCache; private byte[] startKey = null; private byte[] endKey = null; public AbstractScanReader(Configuration configuration) { super(configuration); this.maxVersion = configuration.getInt(Key.MAX_VERSION, 1); this.scanCache = configuration.getInt(Key.SCAN_CACHE, Constant.DEFAULT_SCAN_CACHE); this.ohtable = ObHbaseReaderUtil.initOHtable(configuration); this.startKey = ObHbaseReaderUtil.convertInnerStartRowkey(configuration); this.endKey = ObHbaseReaderUtil.convertInnerEndRowkey(configuration); LOG.info("The task set startRowkey=[{}], endRowkey=[{}].", Bytes.toStringBinary(this.startKey), Bytes.toStringBinary(this.endKey)); } @Override public void prepare() throws Exception { this.scan = new Scan(); this.scan.setSmall(false); this.scan.setCacheBlocks(false); this.scan.setStartRow(startKey); this.scan.setStopRow(endKey); LOG.info("The task set startRowkey=[{}], endRowkey=[{}].", Bytes.toStringBinary(this.startKey), Bytes.toStringBinary(this.endKey)); this.scan.setCaching(this.scanCache); if (this.maxVersion == -1 || this.maxVersion == Integer.MAX_VALUE) { this.scan.setMaxVersions(); } else { this.scan.setMaxVersions(this.maxVersion); } initScanColumns(); this.resultScanner = this.ohtable.getScanner(this.scan); } @Override public void close() throws IOException { if (this.resultScanner != null) { this.resultScanner.close(); } HTableManager.closeHTable(this.ohtable); } protected void initScanColumns() { boolean isConstant; boolean isRowkeyColumn; for (HbaseColumnCell cell : this.hbaseColumnCellMap.values()) { isConstant = cell.isConstant(); isRowkeyColumn = ObHbaseReaderUtil.isRowkeyColumn(cell.getColumnName()); if (!isConstant && !isRowkeyColumn) { LOG.info("columnFamily: " + new String(cell.getCf()) + ", qualifier: " + new String(cell.getQualifier())); this.scan.addColumn(cell.getCf(), cell.getQualifier()); } } } protected Result getNextHbaseRow() throws Exception { Result result = null; try { result = resultScanner.next(); } catch (Exception e) { LOG.error("failed to get result", e); if (lastResult != null) { scan.setStartRow(lastResult.getRow()); } resultScanner = this.ohtable.getScanner(scan); result = resultScanner.next(); if (lastResult != null && Bytes.equals(lastResult.getRow(), result.getRow())) { result = resultScanner.next(); } } lastResult = result; // may be null return result; } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/task/SQLNormalModeReader.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.task; import static com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_READ_HINT; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.reader.obhbasereader.Constant; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseColumnCell; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseReaderErrorCode; import com.alibaba.datax.plugin.reader.obhbasereader.Key; import com.alibaba.datax.plugin.reader.obhbasereader.enums.FetchVersion; import com.alibaba.datax.plugin.reader.obhbasereader.util.ObHbaseReaderUtil; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; import com.google.common.collect.Lists; import com.google.common.collect.Maps; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.Set; import java.util.function.Predicate; import java.util.stream.Collectors; public class SQLNormalModeReader extends AbstractHbaseTask { private final static String QUERY_SQL_TEMPLATE = "select %s K, Q, T, V, hex(K) as `hex` from %s %s"; private static Logger LOG = LoggerFactory.getLogger(SQLNormalModeReader.class); private final Map columnMap; private final Map versionMap; private final FetchVersion fetchVersion; private Set columnNames; private boolean noMoreData = false; private String querySQL = null; private Connection conn = null; private PreparedStatement stmt = null; private ResultSet rs = null; private String jdbcUrl = null; private String columnFamily = null; private String username = null; private String password = null; private int fetchSize = com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_FETCH_SIZE; private long readBatchSize = com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_READ_BATCH_SIZE; private Configuration configuration; private boolean hasRange = false; private String[] savepoint = new String[3]; // only used by unit test protected boolean reuseConn = false; public SQLNormalModeReader(Configuration configuration) { this.configuration = configuration; this.hbaseColumnCellMap = ObHbaseReaderUtil.parseColumn(configuration.getList(Key.COLUMN, Map.class)); if (hbaseColumnCellMap.size() == 0) { LOG.error("no column cells specified."); throw new RuntimeException("no column cells specified"); } columnFamily = ObHbaseReaderUtil.parseColumnFamily(hbaseColumnCellMap.values()); this.columnNames = hbaseColumnCellMap.keySet().stream().map(e -> ObHbaseReaderUtil.isRowkeyColumn(e) ? Constant.ROWKEY_FLAG : e.substring((columnFamily + ":").length())).collect(Collectors.toSet()); String partInfo = ""; String partName = configuration.getString(Key.PARTITION_NAME, null); if (partName != null) { partInfo = "partition(" + partName + ")"; } String tableName = configuration.getString(Key.TABLE, null); String hint = configuration.getString(Key.READER_HINT, OB_READ_HINT); this.hasRange = !StringUtils.isEmpty(configuration.getString(Key.RANGE, null)); this.querySQL = String.format(QUERY_SQL_TEMPLATE, hint, tableName + "$" + columnFamily, partInfo); if (hasRange) { this.querySQL = querySQL + " where (" + configuration.getString(Key.RANGE) + ")"; } this.jdbcUrl = configuration.getString(Key.JDBC_URL, null); this.username = configuration.getString(Key.USERNAME, null); this.password = configuration.getString(Key.PASSWORD, null); this.columnMap = Maps.newHashMap(); this.versionMap = Maps.newHashMap(); this.fetchVersion = FetchVersion.getByDesc(configuration.getString("version", FetchVersion.LATEST.name())); this.timezone = configuration.getString(Key.TIMEZONE, "UTC"); this.encoding = configuration.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); this.fetchSize = configuration.getInt(Key.FETCH_SIZE, com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_FETCH_SIZE); this.readBatchSize = configuration.getLong(Key.READ_BATCH_SIZE, com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_READ_BATCH_SIZE); LOG.info("read from jdbcUrl {} with fetchSize {}, readBatchSize {}", jdbcUrl, fetchSize, readBatchSize); } private boolean notFinished(String currentKey) throws SQLException { boolean updateSuccess = updateResultSet(); if (updateSuccess) { String newKey = rs.getString("K"); return newKey.equals(currentKey); } else { noMoreData = true; Arrays.fill(savepoint, null); return false; } } private boolean updateResultSet() throws SQLException { if (rs != null && rs.next()) { return true; } if (savepoint[0] != null) { int retryLimit = 10; int retryCount = 0; String tempQuery = querySQL + (hasRange ? " and " : " where ") + "(K,Q,T) > (unhex(?),?,?) order by K,Q,T limit " + readBatchSize; while (retryCount < retryLimit) { retryCount++; try { resetConnection(); DBUtil.closeDBResources(rs, stmt, null); stmt = conn.prepareStatement(tempQuery, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(fetchSize); for (int i = 0; i < savepoint.length; i++) { stmt.setObject(i + 1, savepoint[i]); } rs = stmt.executeQuery(); if (rs.next()) { LOG.info("execute sql: {}, savepoint:[{}]", tempQuery, Arrays.stream(savepoint).map(e -> "'" + e + "'").collect(Collectors.joining(","))); return true; } // All data in this task are read break; } catch (Exception ex) { LOG.error("failed to query sql, will retry {} times", retryCount, ex); DBUtil.closeDBResources(rs, stmt, conn); if (retryCount > retryLimit) { LOG.error("Sql: [{}] executed failed, savepoint:[{}], reason: {}", tempQuery, Arrays.stream(savepoint).map(e -> "'" + e + "'").collect(Collectors.joining(",")), ex.getMessage()); throw new RuntimeException(ex); } } } } return false; } @Override public void prepare() { int retryLimit = 10; int retryCount = 0; while (true) { retryCount++; try { resetConnection(); String tempQuery = querySQL + " order by K,Q,T limit " + readBatchSize; stmt = conn.prepareStatement(tempQuery, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(fetchSize); LOG.info("execute sql : {}", tempQuery); rs = stmt.executeQuery(); if (!rs.next()) { noMoreData = true; } break; } catch (Exception e) { LOG.error("failed to query sql, will retry {} times", retryCount, e); DBUtil.closeDBResources(rs, stmt, conn); if (retryCount > retryLimit) { LOG.error("Sql: [{}] executed failed, reason: {}", querySQL, e.getMessage()); throw new RuntimeException(e); } } } } @Override public boolean fetchLine(Record record) throws Exception { try { if (noMoreData) { return false; } String currentKey = rs.getString("K"); savepoint[0] = rs.getString("hex"); columnMap.put(Constant.ROWKEY_FLAG, currentKey.getBytes()); do { String columnName = rs.getString("Q"); savepoint[1] = columnName; if (!this.columnNames.contains(columnName)) { continue; } Long version = rs.getLong("T"); savepoint[2] = String.valueOf(version); byte[] value = rs.getBytes("V"); Predicate predicate; switch (this.fetchVersion) { case OLDEST: predicate = v -> v.compareTo(versionMap.getOrDefault(columnName, Long.MIN_VALUE)) > 0; break; case LATEST: predicate = v -> v.compareTo(versionMap.getOrDefault(columnName, Long.MAX_VALUE)) < 0; break; default: throw DataXException.asDataXException(HbaseReaderErrorCode.ILLEGAL_VALUE, "Not support version: " + this.fetchVersion); } if (predicate.test(version)) { versionMap.put(columnName, version); columnMap.put(columnName, value); } } while (notFinished(currentKey)); for (HbaseColumnCell cell : this.hbaseColumnCellMap.values()) { Column column = null; if (cell.isConstant()) { // 对常量字段的处理 column = this.constantMap.get(cell.getColumnName()); } else { String columnName = ObHbaseReaderUtil.isRowkeyColumn(cell.getColumnName()) ? Constant.ROWKEY_FLAG : cell.getColumnName().substring((columnFamily + ":").length()); byte[] value = null; if (!columnMap.containsKey(columnName)) { LOG.debug("{} is not contained in the record with K value={}. consider this record as null record.", columnName, currentKey); } else { value = columnMap.get(columnName); } column = ObHbaseReaderUtil.buildColumn(value, cell.getColumnType(), encoding, cell.getDateformat(), timezone); } record.addColumn(column); } } finally { this.columnMap.clear(); this.versionMap.clear(); } return true; } @Override public void close() throws IOException { DBUtil.closeDBResources(rs, stmt, conn); } private void resetConnection() throws SQLException { if (reuseConn && conn != null && !conn.isClosed()) { return; } // set ob_query_timeout and ob_trx_timeout to a large time in case timeout int queryTimeoutSeconds = 60 * 60 * 48; String setQueryTimeout = "set ob_query_timeout=" + (queryTimeoutSeconds * 1000 * 1000L); String setTrxTimeout = "set ob_trx_timeout=" + ((queryTimeoutSeconds + 5) * 1000 * 1000L); List newSessionConfig = Lists.newArrayList(setQueryTimeout, setTrxTimeout); List sessionConfig = configuration.getList(Key.SESSION, new ArrayList<>(), String.class); newSessionConfig.addAll(sessionConfig); configuration.set(Key.SESSION, newSessionConfig); conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, this.username, this.password); } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/task/ScanMultiVersionReader.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.task; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.obhbasereader.Constant; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseColumnCell; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseReaderErrorCode; import com.alibaba.datax.plugin.reader.obhbasereader.enums.ColumnType; import com.alibaba.datax.plugin.reader.obhbasereader.util.ObHbaseReaderUtil; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.UnsupportedEncodingException; import java.util.ArrayList; import java.util.List; public class ScanMultiVersionReader extends AbstractScanReader { private final static Logger LOG = LoggerFactory.getLogger(ScanMultiVersionReader.class); private static byte[] COLON_BYTE; private List kvList = new ArrayList<>(); private int currentReadPosition = 0; // rowKey类型 private ColumnType rowkeyReadoutType = null; public ScanMultiVersionReader(Configuration configuration) { super(configuration); HbaseColumnCell rowKey = hbaseColumnCellMap.get(Constant.ROWKEY_FLAG); if (rowKey != null && rowKey.getColumnType() != null) { this.rowkeyReadoutType = rowKey.getColumnType(); } else { this.rowkeyReadoutType = ColumnType.BYTES; } try { ScanMultiVersionReader.COLON_BYTE = ":".getBytes(encoding); } catch (UnsupportedEncodingException e) { throw DataXException.asDataXException(HbaseReaderErrorCode.PREPAR_READ_ERROR, "Failed to get binary of column family and column name colon separator inside the system.", e); } } private void convertKVToLine(KeyValue keyValue, Record record) throws Exception { byte[] rawRowkey = keyValue.getRow(); long timestamp = keyValue.getTimestamp(); byte[] cfAndQualifierName = Bytes.add(keyValue.getFamily(), ScanMultiVersionReader.COLON_BYTE, keyValue.getQualifier()); record.addColumn(convertBytesToAssignType(this.rowkeyReadoutType, rawRowkey)); record.addColumn(convertBytesToAssignType(ColumnType.STRING, cfAndQualifierName)); // 直接忽略了用户配置的 timestamp 的类型 record.addColumn(new LongColumn(timestamp)); String cfAndQualifierNameStr = Bytes.toString(cfAndQualifierName); HbaseColumnCell currentCell = hbaseColumnCellMap.get(cfAndQualifierNameStr); ColumnType valueReadoutType = currentCell != null ? currentCell.getColumnType() : ColumnType.BYTES; String dateFormat = currentCell != null ? currentCell.getDateformat() : null; record.addColumn(convertBytesToAssignType(valueReadoutType, keyValue.getValue(), dateFormat)); } private Column convertBytesToAssignType(ColumnType columnType, byte[] byteArray) throws Exception { return convertBytesToAssignType(columnType, byteArray, null); } private Column convertBytesToAssignType(ColumnType columnType, byte[] byteArray, String dateFormat) throws Exception { return ObHbaseReaderUtil.buildColumn(byteArray, columnType, encoding, dateFormat, timezone); } @Override public boolean fetchLine(Record record) throws Exception { Result result; if (this.kvList.size() == this.currentReadPosition) { result = getNextHbaseRow(); if (result == null) { return false; } this.kvList = result.list(); if (this.kvList == null) { return false; } this.currentReadPosition = 0; } try { KeyValue keyValue = this.kvList.get(this.currentReadPosition); convertKVToLine(keyValue, record); } finally { this.currentReadPosition++; } return true; } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/task/ScanNormalModeReader.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.task; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseColumnCell; import com.alibaba.datax.plugin.reader.obhbasereader.enums.ColumnType; import com.alibaba.datax.plugin.reader.obhbasereader.util.ObHbaseReaderUtil; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class ScanNormalModeReader extends AbstractScanReader { private static Logger LOG = LoggerFactory.getLogger(ScanNormalModeReader.class); public ScanNormalModeReader(Configuration configuration) { super(configuration); this.maxVersion = 1; } @Override public boolean fetchLine(Record record) throws Exception { Result result = getNextHbaseRow(); if (null == result) { return false; } try { byte[] hbaseColumnValue; String columnName; ColumnType columnType; byte[] cf; byte[] qualifier; for (HbaseColumnCell cell : this.hbaseColumnCellMap.values()) { columnType = cell.getColumnType(); Column column = null; if (cell.isConstant()) { // 对常量字段的处理 column = constantMap.get(cell.getColumnName()); } else { // 根据列名称获取值 columnName = cell.getColumnName(); if (ObHbaseReaderUtil.isRowkeyColumn(columnName)) { hbaseColumnValue = result.getRow(); } else { cf = cell.getCf(); qualifier = cell.getQualifier(); hbaseColumnValue = result.getValue(cf, qualifier); } column = ObHbaseReaderUtil.buildColumn(hbaseColumnValue, columnType, super.encoding, cell.getDateformat(), timezone); } record.addColumn(column); } } catch (Exception e) { // 注意,这里catch的异常,期望是byte数组转换失败的情况。而实际上,string的byte数组,转成整数类型是不容易报错的。但是转成double类型容易报错。 record.setColumn(0, new StringColumn(Bytes.toStringBinary(result.getRow()))); throw e; } return true; } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/HbaseSplitUtil.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseReaderErrorCode; import com.alibaba.datax.plugin.reader.obhbasereader.Key; import com.google.common.collect.Lists; import org.apache.commons.collections.CollectionUtils; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public final class HbaseSplitUtil { private final static Logger LOG = LoggerFactory.getLogger(HbaseSplitUtil.class); public static List split(Configuration configuration) { final List ranges = configuration.getListConfiguration(Key.RANGE); if (CollectionUtils.isEmpty(ranges)) { return Lists.newArrayList(configuration); } //TODO(yuez) 后续hbase api具备查询region的功能后,这里需要添加查询table region的逻辑,并且取table region和用户指定的range的交集 List sliceConfs = new ArrayList<>(ranges.size()); for (Configuration range : ranges) { byte[] startRowKey = convertUserRowkey(range, true); byte[] endRowKey = convertUserRowkey(range, false); if (startRowKey.length != 0 && endRowKey.length != 0 && Bytes.compareTo(startRowKey, endRowKey) > 0) { throw DataXException.asDataXException(HbaseReaderErrorCode.ILLEGAL_VALUE, "The startRowkey in obhbasereader must not be greater than the endRowkey."); } Configuration sliceConf = configuration.clone(); sliceConf.remove(Key.RANGE); String startKeyStr = Bytes.toStringBinary(startRowKey); String endRowKeyStr = Bytes.toStringBinary(endRowKey); sliceConf.set(Key.START_ROWKEY, startKeyStr); sliceConf.set(Key.END_ROWKEY, endRowKeyStr); sliceConfs.add(sliceConf); } return sliceConfs; } public static byte[] convertUserRowkey(Configuration configuration, boolean isStart) { String keyName = isStart ? Key.START_ROWKEY : Key.END_ROWKEY; String startRowkey = configuration.getString(keyName); if (StringUtils.isBlank(startRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } else { boolean isBinaryRowkey = configuration.getBool(Key.IS_BINARY_ROWKEY, false); return stringToBytes(startRowkey, isBinaryRowkey); } } private static byte[] stringToBytes(String rowkey, boolean isBinaryRowkey) { if (isBinaryRowkey) { return Bytes.toBytesBinary(rowkey); } else { return Bytes.toBytes(rowkey); } } /** * 后续hbase api具备查询region的功能后才用得到此方法 * * @param config * @param startRowkeyByte * @param endRowkeyByte * @param regionRanges * @return */ private static List doSplit(Configuration config, byte[] startRowkeyByte, byte[] endRowkeyByte, Pair regionRanges) { List configurations = new ArrayList(); for (int i = 0; i < regionRanges.getFirst().length; i++) { byte[] regionStartKey = regionRanges.getFirst()[i]; byte[] regionEndKey = regionRanges.getSecond()[i]; // 当前的region为最后一个region // 如果最后一个region的start Key大于用户指定的userEndKey,则最后一个region,应该不包含在内 // 注意如果用户指定userEndKey为"",则此判断应该不成立。userEndKey为""表示取得最大的region if (Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) == 0 && (endRowkeyByte.length != 0 && (Bytes.compareTo(regionStartKey, endRowkeyByte) > 0))) { continue; } // 如果当前的region不是最后一个region, // 用户配置的userStartKey大于等于region的endkey,则这个region不应该含在内 if ((Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) != 0) && (Bytes.compareTo(startRowkeyByte, regionEndKey) >= 0)) { continue; } // 如果用户配置的userEndKey小于等于 region的startkey,则这个region不应该含在内 // 注意如果用户指定的userEndKey为"",则次判断应该不成立。userEndKey为""表示取得最大的region if (endRowkeyByte.length != 0 && (Bytes.compareTo(endRowkeyByte, regionStartKey) <= 0)) { continue; } String thisStartKey = getStartKey(startRowkeyByte, regionStartKey); String thisEndKey = getEndKey(endRowkeyByte, regionEndKey); Configuration p = config.clone(); p.set(Key.START_ROWKEY, thisStartKey); p.set(Key.END_ROWKEY, thisEndKey); LOG.debug("startRowkey:[{}], endRowkey:[{}] .", thisStartKey, thisEndKey); configurations.add(p); } return configurations; } private static String getEndKey(byte[] endRowkeyByte, byte[] regionEndKey) { if (endRowkeyByte == null) { // 由于之前处理过,所以传入的userStartKey不可能为null throw new IllegalArgumentException("userEndKey should not be null!"); } byte[] tempEndRowkeyByte; if (endRowkeyByte.length == 0) { tempEndRowkeyByte = regionEndKey; } else if (Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) == 0) { // 为最后一个region tempEndRowkeyByte = endRowkeyByte; } else { if (Bytes.compareTo(endRowkeyByte, regionEndKey) > 0) { tempEndRowkeyByte = regionEndKey; } else { tempEndRowkeyByte = endRowkeyByte; } } return Bytes.toStringBinary(tempEndRowkeyByte); } private static String getStartKey(byte[] startRowkeyByte, byte[] regionStarKey) { if (startRowkeyByte == null) { // 由于之前处理过,所以传入的userStartKey不可能为null throw new IllegalArgumentException("userStartKey should not be null!"); } byte[] tempStartRowkeyByte; if (Bytes.compareTo(startRowkeyByte, regionStarKey) < 0) { tempStartRowkeyByte = regionStarKey; } else { tempStartRowkeyByte = startRowkeyByte; } return Bytes.toStringBinary(tempStartRowkeyByte); } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/LocalStrings.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/LocalStrings_en_US.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/ObHbaseReaderUtil.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.util; import static com.alibaba.datax.plugin.reader.obhbasereader.enums.ModeType.MultiVersionFixedColumn; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_DATABASE; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_FULL_USER_NAME; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_ODP_ADDR; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_ODP_MODE; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_ODP_PORT; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_PARAM_URL; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_PASSWORD; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_SYS_PASSWORD; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_SYS_USER_NAME; import com.alibaba.datax.common.element.BoolColumn; import com.alibaba.datax.common.element.BytesColumn; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.obhbasereader.Constant; import com.alibaba.datax.plugin.reader.obhbasereader.HTableManager; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseColumnCell; import com.alibaba.datax.plugin.reader.obhbasereader.HbaseReaderErrorCode; import com.alibaba.datax.plugin.reader.obhbasereader.Key; import com.alibaba.datax.plugin.reader.obhbasereader.enums.ColumnType; import com.alibaba.datax.plugin.reader.obhbasereader.enums.ModeType; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.TypeReference; import com.alipay.oceanbase.hbase.OHTable; import org.apache.commons.collections.MapUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.commons.lang3.time.DateUtils; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.util.Bytes; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.nio.charset.Charset; import java.text.SimpleDateFormat; import java.util.Collection; import java.util.Date; import java.util.LinkedHashMap; import java.util.List; import java.util.Map; import java.util.regex.Matcher; import java.util.regex.Pattern; public final class ObHbaseReaderUtil { private static Logger LOG = LoggerFactory.getLogger(ObHbaseReaderUtil.class); public static void doPretreatment(Configuration originalConfig) { String mode = ObHbaseReaderUtil.dealMode(originalConfig); originalConfig.set(Key.MODE, mode); String encoding = originalConfig.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); if (!Charset.isSupported(encoding)) { throw DataXException.asDataXException(HbaseReaderErrorCode.ILLEGAL_VALUE, String.format("The encoding you configured is not supported by hbasereader:[%s]", encoding)); } originalConfig.set(Key.ENCODING, encoding); // 此处增强一个检查:isBinaryRowkey 配置不能出现在与 hbaseConfig 等配置平级地位 Boolean isBinaryRowkey = originalConfig.getBool(Key.IS_BINARY_ROWKEY); if (isBinaryRowkey != null) { throw DataXException.asDataXException(HbaseReaderErrorCode.ILLEGAL_VALUE, String.format("%s cannot be configured here. It should be configured in range.", Key.IS_BINARY_ROWKEY)); } } /** * 对模式以及与模式进行配对的配置进行检查 */ private static String dealMode(Configuration originalConfig) { String mode = originalConfig.getString(Key.MODE); ModeType modeType = ModeType.getByTypeName(mode); List column = originalConfig.getList(Key.COLUMN, Map.class); if (column == null || column.isEmpty()) { throw DataXException.asDataXException(HbaseReaderErrorCode.REQUIRED_VALUE, "You have configured the normal mode to read the data in HBase, so you must configure the column in the form of:column:[{\"name\": \"cf0:column0\",\"type\": \"string\"}," + "{\"name\": \"cf1:column1\",\"type\": \"long\"}]"); } // 通过 parse 进行 column 格式的进一步检查 ObHbaseReaderUtil.parseColumn(column); if (MultiVersionFixedColumn.equals(modeType)) { Integer maxVersion = originalConfig.getInt(Key.MAX_VERSION); Validate.notNull(maxVersion, String.format("You have configured thw mode %s to read the data in HBase, so you must configure: maxVersion", mode)); boolean isMaxVersionValid = maxVersion == -1 || maxVersion > 1; Validate.isTrue(isMaxVersionValid, String.format( "You have configured the mode %s to read the data in HBase, but the configured maxVersion value is wrong. maxVersion specifies that: - 1 is to read all versions, and cannot be " + "configured as 0 or 1 (because 0 or 1, we think the user wants to read the data in normal mode instead of reading in mode %s, the difference is big). If it is greater " + "than" + " 1, it means to read the latest corresponding number of versions.", mode, mode)); } return mode; } /** * 注意:convertUserStartRowkey 和 convertInnerStartRowkey,前者会受到 isBinaryRowkey 的影响,只用于第一次对用户配置的 String 类型的 rowkey 转为二进制时使用。而后者约定:切分时得到的二进制的 rowkey 回填到配置中时采用 */ public static byte[] convertInnerStartRowkey(Configuration configuration) { String startRowkey = configuration.getString(Key.START_ROWKEY); if (StringUtils.isBlank(startRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } return Bytes.toBytesBinary(startRowkey); } public static byte[] convertInnerEndRowkey(Configuration configuration) { String endRowkey = configuration.getString(Key.END_ROWKEY); if (StringUtils.isBlank(endRowkey)) { return HConstants.EMPTY_BYTE_ARRAY; } return Bytes.toBytesBinary(endRowkey); } private static void setObHBaseConfig(com.alibaba.datax.common.util.Configuration confFile, org.apache.hadoop.conf.Configuration oHbaseConf) { boolean odpMode = confFile.getBool(Key.USE_ODP_MODE); String username = confFile.getString(Key.USERNAME); String password = confFile.getString(Key.PASSWORD); String dbName = confFile.getString(Key.DB_NAME); // oHbaseConf.set(RS_LIST_ACQUIRE_CONNECT_TIMEOUT.getKey(), "500"); // oHbaseConf.set(RS_LIST_ACQUIRE_READ_TIMEOUT.getKey(), "5000"); oHbaseConf.set(HBASE_OCEANBASE_FULL_USER_NAME, username); oHbaseConf.set(HBASE_OCEANBASE_PASSWORD, password); // oHbaseConf.set(HBASE_, META_SCANNER_CACHING); if (odpMode) { oHbaseConf.setBoolean(HBASE_OCEANBASE_ODP_MODE, true); oHbaseConf.set(HBASE_OCEANBASE_DATABASE, dbName); oHbaseConf.set(HBASE_OCEANBASE_ODP_ADDR, confFile.getString(Key.ODP_HOST)); oHbaseConf.setInt(HBASE_OCEANBASE_ODP_PORT, confFile.getInt(Key.ODP_PORT)); } else { String clusterName = null; final Pattern pattern = Pattern.compile("([\\w]+)@([\\w]+)#([\\w]+)"); Matcher matcher = pattern.matcher(username); if (matcher.find()) { clusterName = matcher.group(3); } else { throw new RuntimeException("user name is not in the correct format: user@tenant#cluster"); } String configUrl = confFile.getString(Key.CONFIG_URL); if (!configUrl.contains("ObRegion")) { if (configUrl.contains("?")) { configUrl += "&ObRegion=" + clusterName; } else { configUrl += "?ObRegion=" + clusterName; } } if (!configUrl.contains("database")) { configUrl += "&database=" + dbName; } oHbaseConf.set(HBASE_OCEANBASE_PARAM_URL, configUrl); oHbaseConf.set(HBASE_OCEANBASE_SYS_USER_NAME, confFile.getString(Key.OB_SYS_USERNAME)); oHbaseConf.set(HBASE_OCEANBASE_SYS_PASSWORD, confFile.getString(Key.OB_SYS_PASSWORD)); } String hbaseConf = confFile.getString(Key.HBASE_CONFIG); Map map = JSON.parseObject(hbaseConf, new TypeReference>() { }); if (MapUtils.isNotEmpty(map)) { for (Map.Entry entry : map.entrySet()) { oHbaseConf.set(entry.getKey(), entry.getValue()); } } } /** * 每次都获取一个新的HTable 注意:HTable 本身是线程不安全的 */ public static OHTable initOHtable(com.alibaba.datax.common.util.Configuration configuration) { String tableName = configuration.getString(Key.TABLE); try { org.apache.hadoop.conf.Configuration oHbaseConf = new org.apache.hadoop.conf.Configuration(); setObHBaseConfig(configuration, oHbaseConf); return HTableManager.createHTable(oHbaseConf, tableName); } catch (Exception e) { LOG.error("init ohTable error, reason: {}", e.getMessage(), e); throw DataXException.asDataXException(HbaseReaderErrorCode.INIT_TABLE_ERROR, e); } } public static boolean isRowkeyColumn(String columnName) { return Constant.ROWKEY_FLAG.equalsIgnoreCase(columnName); } public static String parseColumnFamily(Collection hbaseColumnCells) { for (HbaseColumnCell columnCell : hbaseColumnCells) { if (ObHbaseReaderUtil.isRowkeyColumn(columnCell.getColumnName())) { continue; } if (columnCell.getColumnName() == null || columnCell.getColumnName().split(":").length != 2) { LOG.error("column cell format is unknown: {}", columnCell); throw new RuntimeException("Column cell format is unknown: " + columnCell); } return columnCell.getColumnName().split(":")[0]; } throw new RuntimeException("parse column family failed."); } /** * 用于解析列配置 */ public static LinkedHashMap parseColumn(List column) { return parseColumn(column, null, Constant.DEFAULT_ENCODING, Constant.DEFAULT_TIMEZONE); } public static LinkedHashMap parseColumn(List column, Map constantMap, String encoding, String timezone) { LinkedHashMap hbaseColumnCells = new LinkedHashMap<>(column.size()); boolean cacheConstantValue = constantMap != null; HbaseColumnCell oneColumnCell; try { for (Map aColumn : column) { ColumnType type = ColumnType.getByTypeName(aColumn.get("type")); boolean isRowKey = isRowkeyColumn(aColumn.get("name")); String columnName = isRowKey ? Constant.ROWKEY_FLAG : aColumn.get("name"); String columnValue = aColumn.get("value"); String dateFormat = aColumn.getOrDefault("format", Constant.DEFAULT_DATE_FORMAT); Validate.isTrue(StringUtils.isNotBlank(columnName) || StringUtils.isNotBlank(columnValue), "It is either a combination of type + name + format or a combination of type + value + format. Your configuration is neither of the two. Please check and modify it."); if (type == ColumnType.DATE) { if (StringUtils.isBlank(dateFormat)) { LOG.warn("date format for {} is empty, use default date format 'yyyy-MM-dd HH:mm:ss' instead.", columnName); } oneColumnCell = new HbaseColumnCell.Builder(type).columnName(columnName).columnValue(columnValue).dateformat(dateFormat).build(); } else { oneColumnCell = new HbaseColumnCell.Builder(type).columnName(columnName).columnValue(columnValue).build(); } hbaseColumnCells.put(columnName, oneColumnCell); if (cacheConstantValue && oneColumnCell.isConstant()) { constantMap.put(columnName, buildColumn(columnValue, type, encoding, dateFormat, timezone)); } } return hbaseColumnCells; } catch (Exception e) { LOG.error("parse column failed, reason:{}", e.getMessage(), e); throw DataXException.asDataXException(HbaseReaderErrorCode.PARSE_COLUMN_ERROR, e.getMessage()); } } public static Column buildColumn(String columnValue, ColumnType columnType, String encoding, String dateformat, String timezone) throws Exception { return buildColumn(columnValue.getBytes(encoding), columnType, encoding, dateformat, timezone); } public static Column buildColumn(byte[] columnValue, ColumnType columnType, String encoding, String dateformat, String timezone) throws Exception { switch (columnType) { case BOOLEAN: return new BoolColumn(columnValue == null ? null : Bytes.toBoolean(columnValue)); case SHORT: return new LongColumn(columnValue == null ? null : String.valueOf(Bytes.toShort(columnValue))); case INT: return new LongColumn(columnValue == null ? null : Bytes.toInt(columnValue)); case LONG: return new LongColumn(columnValue == null ? null : Bytes.toLong(columnValue)); case BYTES: return new BytesColumn(columnValue == null ? null : columnValue); case FLOAT: return new DoubleColumn(columnValue == null ? null : Bytes.toFloat(columnValue)); case DOUBLE: return new DoubleColumn(columnValue == null ? null : Bytes.toDouble(columnValue)); case STRING: return new StringColumn(columnValue == null ? null : new String(columnValue, encoding)); case BINARY_STRING: return new StringColumn(columnValue == null ? null : Bytes.toStringBinary(columnValue)); case DATE: String dateValue = Bytes.toStringBinary(columnValue); String timestamp = null; try { long milliSec = Long.parseLong(dateValue); Date date = new java.util.Date(milliSec); SimpleDateFormat sdf = new java.text.SimpleDateFormat(dateformat); sdf.setTimeZone(java.util.TimeZone.getTimeZone(timezone)); timestamp = sdf.format(date); } catch (Exception e) { // this is already formatted timestamp timestamp = dateValue; } return columnValue == null ? null : new DateColumn(DateUtils.parseDate(timestamp, dateformat)); default: throw DataXException.asDataXException(HbaseReaderErrorCode.ILLEGAL_VALUE, "obHbasereader 不支持您配置的列类型:" + columnType); } } } ================================================ FILE: obhbasereader/src/main/java/com/alibaba/datax/plugin/reader/obhbasereader/util/SqlReaderSplitUtil.java ================================================ package com.alibaba.datax.plugin.reader.obhbasereader.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.SplitedSlice; import com.alibaba.datax.plugin.reader.obhbasereader.Key; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ExecutorTemplate; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.PartInfo; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.PartitionSplitUtil; import com.google.common.base.Preconditions; import com.google.common.collect.Lists; import java.sql.Connection; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.util.ArrayList; import java.util.Collection; import java.util.HashSet; import java.util.List; import java.util.Set; import java.util.stream.Collectors; import org.apache.commons.collections.CollectionUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class SqlReaderSplitUtil { public static final String SAMPLE_SQL_TEMPLATE = "SELECT `hex` FROM (SELECT `hex`,K , bucket, ROW_NUMBER() OVER (PARTITION BY bucket ORDER BY K) rn FROM(SELECT %s `hex`, K ,NTILE(%s) OVER " + "(ORDER BY K ) bucket FROM (SELECT hex(K) as `hex`, K FROM %s SAMPLE BLOCK(%s)) a) b) c WHERE rn = 1 GROUP BY K ORDER BY K"; public static final String MIDDLE_RANGE_TEMPLATE = "((K) > (unhex('%s'))) AND ((K) <= (unhex('%s')))"; public static final String MIN_MAX_RANGE_TEMPLATE = "((K)<= (unhex('%s'))) or ((K) > (unhex('%s')))"; private static final Logger LOG = LoggerFactory.getLogger(SqlReaderSplitUtil.class); public static List splitSingleTable(Configuration configuration, String tableName, String columnFamily, int eachTableShouldSplittedNumber, boolean readByPartition) { List partitionList = Lists.newArrayList(); String tableNameWithCf = tableName + "$" + columnFamily; PartInfo partInfo = PartitionSplitUtil.getObMySQLPartInfoBySQL(configuration, tableNameWithCf); if (partInfo.isPartitionTable()) { partitionList.addAll(partInfo.getPartList()); } // read all partitions and split job only by partition if (readByPartition) { LOG.info("table: [{}] will read only by partition", tableNameWithCf); return splitSingleTableByPartition(configuration, partitionList); } if (eachTableShouldSplittedNumber <= 1) { LOG.info("total enable splitted number of table: [{}] is {}, no need to split", tableNameWithCf, eachTableShouldSplittedNumber); return Lists.newArrayList(configuration); } // If user specified some partitions to be read, List userSetPartitions = configuration.getList(Key.PARTITION_NAME, String.class); if (CollectionUtils.isNotEmpty(userSetPartitions)) { Set partSet = new HashSet<>(partitionList); // If partition name does not exist in the table, throw exception directly. Case is sensitive. userSetPartitions.forEach(e -> Preconditions.checkArgument(partSet.contains(e), "partition %s does not exist in table: %s", e, tableNameWithCf)); partitionList.clear(); partitionList.addAll(userSetPartitions); } if (partitionList.isEmpty()) { LOG.info("table: [{}] is not partitioned, just split table by rowKey.", tableNameWithCf); List splitConfs = splitSingleTableByRowKey(configuration, tableNameWithCf, eachTableShouldSplittedNumber); LOG.info("total split count of non-partitioned table :[{}] is {}", tableNameWithCf, splitConfs.size()); return splitConfs; } else { ExecutorTemplate> template = new ExecutorTemplate<>("split-rows-by-rowkey-" + tableNameWithCf + "-", eachTableShouldSplittedNumber); int splitNumPerPartition = (int) Math.ceil(1.0d * eachTableShouldSplittedNumber / partitionList.size()); LOG.info("table: [{}] is partitioned, split table by rowKey in parallel. splitNumPerPartition is {}", tableNameWithCf, splitNumPerPartition); for (String partName : partitionList) { try { template.submit(() -> { Configuration tempConf = configuration.clone(); tempConf.set(Key.PARTITION_NAME, partName); return splitSingleTableByRowKey(tempConf, tableNameWithCf, splitNumPerPartition); }); } catch (Throwable th) { LOG.error("submit split task of table: [{}-{}] failed, reason: {}", tableNameWithCf, partName, th.getMessage(), th); } } List splitConfs = template.waitForResult().stream().flatMap(Collection::stream).collect(Collectors.toList()); LOG.info("total split count of partitioned table :[{}] is {}", tableNameWithCf, splitConfs.size()); return splitConfs; } } private static List splitSingleTableByPartition(Configuration configuration, List partList) { if (partList == null || partList.isEmpty()) { return Lists.newArrayList(configuration); } List confList = new ArrayList<>(); for (String partName : partList) { LOG.info("read sub task: reading from partition " + partName); Configuration conf = configuration.clone(); conf.set(Key.PARTITION_NAME, partName); confList.add(conf); } return confList; } /** * @param configuration * @param tableNameWithCf * @param eachTableShouldSplittedNumber * @return */ public static List splitSingleTableByRowKey(Configuration configuration, String tableNameWithCf, int eachTableShouldSplittedNumber) { String jdbcURL = configuration.getString(Key.JDBC_URL); String username = configuration.getString(Key.USERNAME); String password = configuration.getString(Key.PASSWORD); String hint = configuration.getString(Key.READER_HINT, com.alibaba.datax.plugin.reader.obhbasereader.Constant.OB_READ_HINT); String partInfo = ""; String partName = configuration.getString(Key.PARTITION_NAME, null); if (partName != null) { partInfo = " partition(" + partName + ")"; } tableNameWithCf += partInfo; int fetchSize = configuration.getInt(Constant.FETCH_SIZE, com.alibaba.datax.plugin.reader.obhbasereader.Constant.DEFAULT_FETCH_SIZE); Double percentage = configuration.getDouble(Key.SAMPLE_PERCENTAGE, 0.1); List slices = new ArrayList<>(); List pluginParams = new ArrayList<>(); // set ob_query_timeout and ob_trx_timeout to a large time in case timeout int queryTimeoutSeconds = 60 * 60 * 48; try (Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcURL, username, password)) { String setQueryTimeout = "set ob_query_timeout=" + (queryTimeoutSeconds * 1000 * 1000L); String setTrxTimeout = "set ob_trx_timeout=" + ((queryTimeoutSeconds + 5) * 1000 * 1000L); try (Statement stmt = conn.createStatement()) { stmt.execute(setQueryTimeout); stmt.execute(setTrxTimeout); } catch (Exception e) { LOG.warn("set ob_query_timeout and set ob_trx_timeout failed. reason: {}", e.getMessage(), e); } slices = getSplitSqlBySample(conn, tableNameWithCf, fetchSize, percentage, eachTableShouldSplittedNumber, hint); } catch (Throwable e) { LOG.warn("query rowkey range failed of table: {}. reason: {}. the table will not be splitted.", tableNameWithCf, e.getMessage(), e); } if (!slices.isEmpty()) { for (SplitedSlice slice : slices) { Configuration tempConfig = configuration.clone(); tempConfig.set(Key.RANGE, slice.getRange()); pluginParams.add(tempConfig); } } else { Configuration tempConfig = configuration.clone(); pluginParams.add(tempConfig); } return pluginParams; } /** * 按照采样方法切分,不能直接顺序切分否则可能导致原本属于一行的数据被切分为两行 * * @param conn * @param tableName * @param fetchSize * @param percentage * @param adviceNum * @param hint * @return List * @throws SQLException */ private static List getSplitSqlBySample(Connection conn, String tableName, int fetchSize, double percentage, int adviceNum, String hint) throws SQLException { String splitSql = String.format(SAMPLE_SQL_TEMPLATE, hint, adviceNum, tableName, percentage); LOG.info("split pk [sql={}] is running... ", splitSql); List boundList = new ArrayList<>(); try (ResultSet rs = DBUtil.query(conn, splitSql, fetchSize)) { while (rs.next()) { boundList.add(rs.getString(1)); } } if (boundList.size() == 0) { return new ArrayList<>(); } List rangeSql = new ArrayList<>(); for (int i = 0; i < boundList.size() - 1; i++) { String range = String.format(MIDDLE_RANGE_TEMPLATE, boundList.get(i), boundList.get(i + 1)); SplitedSlice slice = new SplitedSlice(boundList.get(i), boundList.get(i + 1), range); rangeSql.add(slice); } String range = String.format(MIN_MAX_RANGE_TEMPLATE, boundList.get(0), boundList.get(boundList.size() - 1)); SplitedSlice slice = new SplitedSlice(null, null, range); rangeSql.add(slice); return rangeSql; } } ================================================ FILE: obhbasereader/src/main/resources/plugin.json ================================================ { "name": "obhbasereader", "class": "com.alibaba.datax.plugin.reader.obhbasereader.ObHbaseReader", "description": "useScene: prod. mechanism: Scan to read data.", "developer": "alibaba" } ================================================ FILE: obhbasereader/src/main/resources/plugin_job_template.json ================================================ { "name": "obhbasereader", "parameter": { "hbaseConfig": {}, "table": "", "encoding": "", "mode": "", "column": [], "range": { "startRowkey": "", "endRowkey": "" }, "isBinaryRowkey": true } } ================================================ FILE: obhbasewriter/doc/obhbasewriter.md ================================================ OceanBase的table api为应用提供了ObHBase的访问接口,因此,OceanBase table api的reader与HBase writer的结构和配置方法类似。 1 快速介绍 obhbaseWriter 插件实现了从向ObHbase中写取数据。在底层实现上,obhbaseWriter 通过 HBase 的 Java 客户端连接远程 HBase 服务,并通过 put 方式写入obHbase。 1.1支持功能 1、目前obhbasewriter支持的obHbase版本为OceanBase3.x以及4.x版本。 2、目前obhbasewriter支持源端多个字段拼接作为ObHbase 表的 rowkey,具体配置参考:rowkeyColumn配置; 3、写入obhbase的时间戳(版本)支持:用当前时间作为版本,指定源端列作为版本,指定一个时间 三种方式作为版本; #### 脚本配置 ```json { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": "/normal.txt", "charset": "UTF-8", "column": [ { "index": 0, "type": "String" }, { "index": 1, "type": "string" }, { "index": 2, "type": "string" }, { "index": 3, "type": "string" }, { "index": 4, "type": "string" }, { "index": 5, "type": "string" }, { "index": 6, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "obhbasewriter", "parameter": { "username": "username", "password": "password", "writerThreadCount": "20", "writeBufferHighMark": "2147483647", "rpcExecuteTimeout": "30000", "useOdpMode": "false", "obSysUser": "root", "obSysPassword": "", "column": [ { "index": 0, "name": "family1:c1", "type": "string" }, { "index": 1, "name": "family1:c2", "type": "string" }, { "index": 2, "name": "family1:c3", "type": "string" }, { "index": 3, "name": "family1:c4", "type": "string" }, { "index": 4, "name": "family1:c5", "type": "string" }, { "index": 5, "name": "family1:c6", "type": "string" }, { "index": 6, "name": "family1:c7", "type": "string" } ], "mode": "normal", "rowkeyColumn": [ { "index": 0, "type": "string" }, { "index": 3, "type": "string" }, { "index": 2, "type": "string" }, { "index": 1, "type": "string" } ], "table": "htable3", "batchSize": "200", "dbName": "database", "jdbcUrl": "jdbc:mysql://ip:port/database?" } } } ] } } ``` ##### 参数解释 - **connection** 公有云和私有云需要配置的信息不同,具体如下: 公有云: - 数据库用户名;(在外层统一配置) - 用户密码;(在外层统一配置) - proxy的jdbc地址 - 数据库名称; 私有云: - 数据库用户名;(在外层统一配置) - 用户密码;(在外层统一配置) - proxy的jdbc地址 - obSysUser:sys租户的用户名; - obSysPass:sys租户的密码; - configUrl; - 描述:可以通过show parameters like 'obConfigUrl' 获得。 - 必须:是 - 默认值:无 - **jdbcUrl** - 描述:连接ob使用的jdbc url,支持如下两种格式: - jdbc:mysql://obproxyIp:obproxyPort/db - 此格式下username需要写成三段式格式 - ||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/db - 此格式下username仅填写用户名本身,无需三段式写法 - 必选:是 - 默认值:无 - **table** - 描述:所选取的需要同步的表。无需增加列族信息。 - 必选:是 - 默认值:无 - **username** - 描述:访问OceanBase的用户名 - 必选:是 - 默认值:无 - **useOdpMode** - 描述:是否通过proxy连接。无法提供sys租户帐密时需要设置为true - 必须:否 - 默认值:false - **column** - 描述:要写入的hbase字段。index:指定该列对应reader端column的索引,从0开始;name:指定hbase表中的列,必须为 列族:列名 的格式;type:指定写入数据类型,用于转换HBase byte[]。配置格式如下: ```json "column": [ { "index":1, "name": "cf1:q1", "type": "string" }, { "index":2, "name": "cf1:q2", "type": "string" } ] ``` - 必选:是 - 默认值:无 - **rowkeyColumn** - 描述:要写入的ObHbase的rowkey列。index:指定该列对应reader端column的索引,从0开始,若为常量index为-1;type:指定写入数据类型,用于转换HBase byte[];value:配置常量,常作为多个字段的拼接符。obhbasewriter会将rowkeyColumn中所有列按照配置顺序进行拼接作为写入hbase的rowkey,不能全为常量。配置格式如下: ```json "rowkeyColumn": [ { "index":0, "type":"string" }, { "index":-1, "type":"string", "value":"_" } ] ``` - 必选:是 - 默认值:无 - **versionColumn** - 描述:指定写入obhbase的时间戳。支持:当前时间、指定时间列,指定时间,三者选一。若不配置表示用当前时间。index:指定对应reader端column的索引,从0开始,需保证能转换为long,若是Date类型,会尝试用yyyy-MM-dd HH:mm:ss和yyyy-MM-dd HH:mm:ss SSS去解析;若为指定时间index为-1;value:指定时间的值,long值。配置格式如下: ```json "versionColumn":{ "index":1 } ``` 或者 ```json "versionColumn":{ "index":-1, "value":123456789 } ``` - 必选:否 - 默认值:无 ================================================ FILE: obhbasewriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 obhbasewriter com.alibaba.datax 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-rdbms-util ${datax-project-version} guava com.google.guava org.slf4j slf4j-api ch.qos.logback logback-classic org.springframework spring-test 4.0.4.RELEASE test com.google.guava guava 33.1.0-jre log4j log4j 1.2.16 org.json json 20160810 junit junit 4.11 test org.powermock powermock-module-junit4 1.4.10 test org.powermock powermock-api-mockito 1.4.10 test org.mockito mockito-core 1.8.5 test com.oceanbase obkv-hbase-client 0.1.4.2 guava com.google.guava org.apache.hadoop hadoop-core 1.0.3 src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: obhbasewriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/obhbasewriter target/ obhbasewriter-0.0.1-SNAPSHOT.jar plugin/writer/obhbasewriter false plugin/writer/obhbasewriter/libs runtime ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ColumnType.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.MessageSource; import java.util.Arrays; import org.apache.commons.lang.StringUtils; /** * 只对 normal 模式读取时有用,多版本读取时,不存在列类型的 */ public enum ColumnType { STRING("string"), BINARY_STRING("binarystring"), BYTES("bytes"), BOOLEAN("boolean"), SHORT("short"), INT("int"), LONG("long"), FLOAT("float"), DOUBLE("double"), DATE("date"), BINARY("binary"); private String typeName; ColumnType(String typeName) { this.typeName = typeName; } public static ColumnType getByTypeName(String typeName) { if (StringUtils.isBlank(typeName)) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MessageSource.loadResourceBundle(ColumnType.class).message("columntype.1", typeName, Arrays.asList(values()))); } for (ColumnType columnType : values()) { if (StringUtils.equalsIgnoreCase(columnType.typeName, typeName.trim())) { return columnType; } } throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MessageSource.loadResourceBundle(ColumnType.class).message("columntype.1", typeName, Arrays.asList(values()))); } @Override public String toString() { return this.typeName; } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/Config.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; public interface Config { String MEMSTORE_THRESHOLD = "memstoreThreshold"; double DEFAULT_MEMSTORE_THRESHOLD = 0.9d; String MEMSTORE_CHECK_INTERVAL_SECOND = "memstoreCheckIntervalSecond"; long DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND = 30; String FAIL_TRY_COUNT = "failTryCount"; int DEFAULT_FAIL_TRY_COUNT = 10000; String WRITER_THREAD_COUNT = "writerThreadCount"; int DEFAULT_WRITER_THREAD_COUNT = 5; String CONCURRENT_WRITE = "concurrentWrite"; boolean DEFAULT_CONCURRENT_WRITE = true; String RS_URL = "rsUrl"; String OB_VERSION = "obVersion"; String TIMEOUT = "timeout"; String PRINT_COST = "printCost"; boolean DEFAULT_PRINT_COST = false; String COST_BOUND = "costBound"; long DEFAULT_COST_BOUND = 20; String MAX_ACTIVE_CONNECTION = "maxActiveConnection"; int DEFAULT_MAX_ACTIVE_CONNECTION = 2000; } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ConfigKey.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; public final class ConfigKey { public final static String HBASE_CONFIG = "hbaseConfig"; public final static String TABLE = "table"; public final static String DBNAME = "dbName"; public final static String OBCONFIG_URL = "obConfigUrl"; public final static String JDBC_URL = "jdbcUrl"; /** * mode 可以取 normal 或者 multiVersionFixedColumn 或者 multiVersionDynamicColumn 三个值,无默认值。 *

* normal 配合 column(Map 结构的)使用 *

* multiVersion */ public final static String MODE = "mode"; public final static String ROWKEY_COLUMN = "rowkeyColumn"; public final static String VERSION_COLUMN = "versionColumn"; /** * 默认为 utf8 */ public final static String ENCODING = "encoding"; public final static String COLUMN = "column"; public static final String INDEX = "index"; public static final String NAME = "name"; public static final String TYPE = "type"; public static final String VALUE = "value"; public static final String FORMAT = "format"; /** * 默认为 EMPTY_BYTES */ public static final String NULL_MODE = "nullMode"; public static final String TRUNCATE = "truncate"; public static final String AUTO_FLUSH = "autoFlush"; public static final String WAL_FLAG = "walFlag"; public static final String WRITE_BUFFER_SIZE = "writeBufferSize"; public static final String MAX_RETRY_COUNT = "maxRetryCount"; public static final String USE_ODP_MODE = "useOdpMode"; public static final String OB_SYS_USER = "obSysUser"; public static final String OB_SYS_PASSWORD = "obSysPassword"; public static final String ODP_HOST = "odpHost"; public static final String ODP_PORT = "odpPort"; public static final String OBHBASE_HTABLE_CLIENT_WRITE_BUFFER = "obhbaseClientWriteBuffer"; public static final String OBHBASE_HTABLE_PUT_WRITE_BUFFER_CHECK = "obhbaseHtablePutWriteBufferCheck"; public static final String WRITE_BUFFER_LOW_MARK = "writeBufferLowMark"; public static final String WRITE_BUFFER_HIGH_MARK = "writeBufferHighMark"; public static final String TABLE_CLIENT_RPC_EXECUTE_TIMEOUT = "rpcExecuteTimeout"; } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ConfigValidator.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.nio.charset.Charset; import java.util.List; /** * Created by johnxu.xj on Sept 30 2018 */ public class ConfigValidator { private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(ConfigValidator.class); public static void validateParameter(com.alibaba.datax.common.util.Configuration originalConfig) { originalConfig.getNecessaryValue(Key.USERNAME, Hbase094xWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.PASSWORD, Hbase094xWriterErrorCode.REQUIRED_VALUE); // originalConfig.getNecessaryValue(ConfigKey.OBCONFIG_URL, Hbase094xWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(ConfigKey.TABLE, Hbase094xWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(ConfigKey.DBNAME, Hbase094xWriterErrorCode.REQUIRED_VALUE); ConfigValidator.validateMode(originalConfig); String encoding = originalConfig.getString(ConfigKey.ENCODING, Constant.DEFAULT_ENCODING); if (!Charset.isSupported(encoding)) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.9", encoding)); } originalConfig.set(ConfigKey.ENCODING, encoding); } public static void validateMode(com.alibaba.datax.common.util.Configuration originalConfig) { String mode = originalConfig.getNecessaryValue(ConfigKey.MODE, Hbase094xWriterErrorCode.REQUIRED_VALUE); ModeType modeType = ModeType.getByTypeName(mode); if (ModeType.Normal.equals(modeType)) { validateRowkeyColumn(originalConfig); validateColumn(originalConfig); validateVersionColumn(originalConfig); } if (originalConfig.getBool(ConfigKey.USE_ODP_MODE)) { originalConfig.getNecessaryValue(ConfigKey.ODP_HOST, Hbase094xWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(ConfigKey.ODP_PORT, Hbase094xWriterErrorCode.REQUIRED_VALUE); } else { originalConfig.getNecessaryValue(ConfigKey.OBCONFIG_URL, Hbase094xWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(ConfigKey.OB_SYS_USER, Hbase094xWriterErrorCode.REQUIRED_VALUE); } } public static void validateColumn(com.alibaba.datax.common.util.Configuration originalConfig) { List columns = originalConfig.getListConfiguration(ConfigKey.COLUMN); if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.11")); } for (Configuration aColumn : columns) { Integer index = aColumn.getInt(ConfigKey.INDEX); String type = aColumn.getNecessaryValue(ConfigKey.TYPE, Hbase094xWriterErrorCode.REQUIRED_VALUE); String name = aColumn.getNecessaryValue(ConfigKey.NAME, Hbase094xWriterErrorCode.REQUIRED_VALUE); ColumnType.getByTypeName(type); if (name.split(":").length != 2) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.12", name)); } if (index == null || index < 0) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.13")); } } } public static void validateRowkeyColumn(com.alibaba.datax.common.util.Configuration originalConfig) { List rowkeyColumn = originalConfig.getListConfiguration(ConfigKey.ROWKEY_COLUMN); if (rowkeyColumn == null || rowkeyColumn.isEmpty()) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.14")); } int rowkeyColumnSize = rowkeyColumn.size(); //包含{"index":0,"type":"string"} 或者 {"index":-1,"type":"string","value":"_"} for (Configuration aRowkeyColumn : rowkeyColumn) { Integer index = aRowkeyColumn.getInt(ConfigKey.INDEX); String type = aRowkeyColumn.getNecessaryValue(ConfigKey.TYPE, Hbase094xWriterErrorCode.REQUIRED_VALUE); ColumnType.getByTypeName(type); if (index == null) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.15")); } //不能只有-1列,即rowkey连接串 if (rowkeyColumnSize == 1 && index == -1) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.16")); } if (index == -1) { aRowkeyColumn.getNecessaryValue(ConfigKey.VALUE, Hbase094xWriterErrorCode.REQUIRED_VALUE); } } } public static void validateVersionColumn(com.alibaba.datax.common.util.Configuration originalConfig) { Configuration versionColumn = originalConfig.getConfiguration(ConfigKey.VERSION_COLUMN); //为null,表示用当前时间;指定列,需要index if (versionColumn != null) { Integer index = versionColumn.getInt(ConfigKey.INDEX); if (index == null) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.17")); } if (index == -1) { //指定时间,需要index=-1,value versionColumn.getNecessaryValue(ConfigKey.VALUE, Hbase094xWriterErrorCode.REQUIRED_VALUE); } else if (index < 0) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("hbase094xhelper.18")); } } } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; import ch.qos.logback.classic.Level; public final class Constant { public static final String DEFAULT_ENCODING = "UTF-8"; public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String DEFAULT_NULL_MODE = "skip"; public static final long DEFAULT_WRITE_BUFFER_SIZE = 8 * 1024 * 1024; public static final long DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND = 30; public static final double DEFAULT_MEMSTORE_THRESHOLD = 0.9d; public static final int DEFAULT_FAIL_TRY_COUNT = 10000; public static final String OB_TABLE_CLIENT_PROPERTY = "logging.path.com.alipay.oceanbase-table-client"; public static final String OB_TABLE_HBASE_PROPERTY = "logging.path.com.alipay.oceanbase-table-hbase"; public static final String OB_TABLE_CLIENT_LOG_LEVEL = "logging.level.oceanbase-table-client"; public static final String OB_TABLE_HBASE_LOG_LEVEL = "logging.level.oceanbase-table-hbase"; public static final String OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL = "logging.level.com.alipay.oceanbase-table-client"; public static final String OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL = "logging.level.com.alipay.oceanbase-table-hbase"; public static final String OB_HBASE_LOG_PATH = System.getProperty("datax.home") + "/log/"; public static final String DEFAULT_OB_TABLE_CLIENT_LOG_LEVEL = Level.OFF.toString(); public static final String DEFAULT_OB_TABLE_HBASE_LOG_LEVEL = Level.OFF.toString(); public static final String DEFAULT_NETTY_BUFFER_LOW_WATERMARK = Integer.toString(512 * 1024); public static final String DEFAULT_NETTY_BUFFER_HIGH_WATERMARK = Integer.toString(1024 * 1024); public static final String DEFAULT_HBASE_HTABLE_CLIENT_WRITE_BUFFER = "2097152"; public static final String DEFAULT_HBASE_HTABLE_PUT_WRITE_BUFFER_CHECK = "10"; public static final String DEFAULT_RPC_EXECUTE_TIMEOUT = "3000"; } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/Hbase094xWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; import com.alibaba.datax.common.spi.ErrorCode; import com.alibaba.datax.common.util.MessageSource; /** * Created by shf on 16/3/8. */ public enum Hbase094xWriterErrorCode implements ErrorCode { REQUIRED_VALUE("Hbasewriter-00", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.required_value")), ILLEGAL_VALUE("Hbasewriter-01", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.illegal_value")), GET_HBASE_CONFIG_ERROR("Hbasewriter-02", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.get_hbase_config_error")), GET_HBASE_TABLE_ERROR("Hbasewriter-03", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.get_hbase_table_error")), CLOSE_HBASE_AMIN_ERROR("Hbasewriter-05", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.close_hbase_amin_error")), CLOSE_HBASE_TABLE_ERROR("Hbasewriter-06", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.close_hbase_table_error")), PUT_HBASE_ERROR("Hbasewriter-07", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.put_hbase_error")), DELETE_HBASE_ERROR("Hbasewriter-08", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.delete_hbase_error")), TRUNCATE_HBASE_ERROR("Hbasewriter-09", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.truncate_hbase_error")), CONSTRUCT_ROWKEY_ERROR("Hbasewriter-10", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.construct_rowkey_error")), CONSTRUCT_VERSION_ERROR("Hbasewriter-11", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.construct_version_error")), INIT_ERROR("Hbasewriter-12", MessageSource.loadResourceBundle(Hbase094xWriterErrorCode.class).message("errorcode.init_error")); private final String code; private final String description; private Hbase094xWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/LocalStrings.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/LocalStrings_en_US.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ModeType.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; import java.util.Arrays; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.MessageSource; public enum ModeType { Normal("normal"), MultiVersion("multiVersion"); private String mode; ModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static ModeType getByTypeName(String modeName) { for (ModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MessageSource.loadResourceBundle(ModeType.class).message("modetype.1", modeName, Arrays.asList(values()))); } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/NullModeType.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; import java.util.Arrays; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.MessageSource; public enum NullModeType { Skip("skip"), Empty("empty"); private String mode; NullModeType(String mode) { this.mode = mode.toLowerCase(); } public String getMode() { return mode; } public static NullModeType getByTypeName(String modeName) { for (NullModeType modeType : values()) { if (modeType.mode.equalsIgnoreCase(modeName)) { return modeType; } } throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MessageSource.loadResourceBundle(NullModeType.class).message("nullmodetype.1", modeName, Arrays.asList(values()))); } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ObHTableInfo.java ================================================ /* * Copyright (c) 2021 OceanBase ob-loader-dumper is licensed under Mulan PSL v2. You can use this software according to * the terms and conditions of the Mulan PSL v2. You may obtain a copy of Mulan PSL v2 at: * * http://license.coscl.org.cn/MulanPSL2 * * THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING * BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE. See the Mulan PSL v2 for more * details. */ package com.alibaba.datax.plugin.writer.obhbasewriter; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Key; import java.util.ArrayList; import java.util.LinkedHashMap; import java.util.List; import java.util.Map; import org.apache.commons.lang3.tuple.Triple; /** * @author cjyyz * @date 2023/03/24 * @since */ public class ObHTableInfo { /** * 不带列族的表名,用于构建OHTable */ String tableName; /** * 带列族的表名,用于分区计算 */ String fullHbaseTableName; NullModeType nullModeType; String encoding; List columns; /** * 记录配置文件中的columns的列族名,字段名,字段类型,避免每次执行插入都解析 * Triple left : 列族名;middle : 字段名;right:字段类型 */ LinkedHashMap> indexColumnInfoMap; /** * 记录配置文件中rowKey的Index,常量值,字段类型,避免每次执行插入都解析 * Triple left : Index;middle : 常量值;right:字段类型 */ List> rowKeyElementList; public ObHTableInfo(Configuration configuration) { this.nullModeType = NullModeType.getByTypeName(configuration.getString(ConfigKey.NULL_MODE, Constant.DEFAULT_NULL_MODE)); this.encoding = configuration.getString(ConfigKey.ENCODING, Constant.DEFAULT_ENCODING); this.columns = configuration.getListConfiguration(ConfigKey.COLUMN); this.indexColumnInfoMap = new LinkedHashMap<>(); configuration.getListConfiguration(ConfigKey.COLUMN).forEach(e -> { String[] name = e.getString(ConfigKey.NAME).split(":"); indexColumnInfoMap.put(e.getInt(ConfigKey.INDEX), Triple.of(name[0], name[1], ColumnType.getByTypeName(e.getString(ConfigKey.TYPE))) ); }); this.rowKeyElementList = new ArrayList<>(); configuration.getListConfiguration(ConfigKey.ROWKEY_COLUMN).forEach(e -> { Integer index = e.getInt(ConfigKey.INDEX); String constantValue = e.getString(ConfigKey.VALUE); ColumnType columnType = ColumnType.getByTypeName(e.getString(ConfigKey.TYPE)); rowKeyElementList.add(Triple.of(index, constantValue, columnType)); }); this.tableName = configuration.getString(Key.TABLE); this.fullHbaseTableName = tableName; if (!fullHbaseTableName.contains("$")) { String name = columns.get(0).getString(ConfigKey.NAME); String familyName = name.split(":")[0]; fullHbaseTableName = fullHbaseTableName + "$" + familyName; } } public String getTableName() { return tableName; } public String getFullHbaseTableName() { return fullHbaseTableName; } public NullModeType getNullModeType() { return nullModeType; } public String getEncoding() { return encoding; } public Map> getIndexColumnInfoMap() { return indexColumnInfoMap; } public List> getRowKeyElementList() { return rowKeyElementList; } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ObHbaseWriter.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.DEFAULT_OB_TABLE_CLIENT_LOG_LEVEL; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.DEFAULT_OB_TABLE_HBASE_LOG_LEVEL; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.OB_HBASE_LOG_PATH; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.OB_TABLE_CLIENT_LOG_LEVEL; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.OB_TABLE_CLIENT_PROPERTY; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.OB_TABLE_HBASE_LOG_LEVEL; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.OB_TABLE_HBASE_PROPERTY; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.util.ObVersion; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.writer.obhbasewriter.ext.ServerConnectInfo; import com.alibaba.datax.plugin.writer.obhbasewriter.task.ObHBaseWriteTask; import com.google.common.base.Preconditions; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.util.List; import java.util.concurrent.TimeUnit; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; /** * */ public class ObHbaseWriter extends Writer { /** * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 *

* 整个 Writer 执行流程是: * *

     * Job类init-->prepare-->split
     *
     *                          Task类init-->prepare-->startWrite-->post-->destroy
     *                          Task类init-->prepare-->startWrite-->post-->destroy
     *
     *                                                                            Job类post-->destroy
     * 
*/ public static class Job extends Writer.Job { private Configuration originalConfig = null; private static final Logger LOG = LoggerFactory.getLogger(Job.class); /** * 注意:此方法仅执行一次。 最佳实践:通常在这里对用户的配置进行校验:是否缺失必填项?有无错误值?有没有无关配置项?... * 并给出清晰的报错/警告提示。校验通常建议采用静态工具类进行,以保证本类结构清晰。 */ @Override public void init() { if (System.getProperty(OB_TABLE_CLIENT_PROPERTY) == null) { LOG.info(OB_TABLE_CLIENT_PROPERTY + " not set"); System.setProperty(OB_TABLE_CLIENT_PROPERTY, OB_HBASE_LOG_PATH); } if (System.getProperty(OB_TABLE_HBASE_PROPERTY) == null) { LOG.info(OB_TABLE_HBASE_PROPERTY + " not set"); System.setProperty(OB_TABLE_HBASE_PROPERTY, OB_HBASE_LOG_PATH); } if (System.getProperty(OB_TABLE_CLIENT_LOG_LEVEL) == null) { LOG.info(OB_TABLE_CLIENT_LOG_LEVEL + " not set"); System.setProperty(OB_TABLE_CLIENT_LOG_LEVEL, DEFAULT_OB_TABLE_CLIENT_LOG_LEVEL); } if (System.getProperty(OB_TABLE_HBASE_LOG_LEVEL) == null) { LOG.info(OB_TABLE_HBASE_LOG_LEVEL + " not set"); System.setProperty(OB_TABLE_HBASE_LOG_LEVEL, DEFAULT_OB_TABLE_HBASE_LOG_LEVEL); } if (System.getProperty(OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL) == null) { LOG.info(OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL + " not set"); System.setProperty(OB_COM_ALIPAY_TABLE_CLIENT_LOG_LEVEL, DEFAULT_OB_TABLE_CLIENT_LOG_LEVEL); } if (System.getProperty(OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL) == null) { LOG.info(OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL + " not set"); System.setProperty(OB_COM_ALIPAY_TABLE_HBASE_LOG_LEVEL, DEFAULT_OB_TABLE_HBASE_LOG_LEVEL); } LOG.info("{} is set to {}, {} is set to {}", OB_TABLE_CLIENT_PROPERTY, OB_HBASE_LOG_PATH, OB_TABLE_HBASE_PROPERTY, OB_HBASE_LOG_PATH); this.originalConfig = super.getPluginJobConf(); boolean useOdpMode = originalConfig.getBool(ConfigKey.USE_ODP_MODE, false); String configUrl = originalConfig.getString(ConfigKey.OBCONFIG_URL, null); String jdbcUrl = originalConfig.getString(ConfigKey.JDBC_URL, null); jdbcUrl = DataBaseType.MySql.appendJDBCSuffixForReader(jdbcUrl); String user = originalConfig.getString(Key.USERNAME, null); String password = originalConfig.getString(Key.PASSWORD); ServerConnectInfo serverConnectInfo = new ServerConnectInfo(jdbcUrl, user, password); if (useOdpMode) { originalConfig.set(ConfigKey.ODP_HOST, serverConnectInfo.host); originalConfig.set(ConfigKey.ODP_PORT, serverConnectInfo.port); } else if (StringUtils.isBlank(configUrl)) { serverConnectInfo.setSysUser(originalConfig.getString(ConfigKey.OB_SYS_USER)); serverConnectInfo.setSysPass(originalConfig.getString(ConfigKey.OB_SYS_PASSWORD)); try { originalConfig.set(ConfigKey.OBCONFIG_URL, queryRsUrl(serverConnectInfo)); originalConfig.set(ConfigKey.OB_SYS_USER, serverConnectInfo.sysUser); originalConfig.set(ConfigKey.OB_SYS_PASSWORD, serverConnectInfo.sysPass); LOG.info("fetch configUrl success, configUrl is {}", configUrl); } catch (Exception e) { LOG.error("fail to get configure url: " + e.getMessage()); throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, "Missing obConfigUrl"); } } if (StringUtils.isBlank(originalConfig.getString(ConfigKey.DBNAME))) { originalConfig.set(ConfigKey.DBNAME, serverConnectInfo.databaseName); } ConfigValidator.validateParameter(this.originalConfig); } private String queryRsUrl(ServerConnectInfo serverInfo) { String configUrl = originalConfig.getString(ConfigKey.OBCONFIG_URL, null); if (configUrl == null) { try { Connection conn = null; int retry = 0; final String sysJDBCUrl = serverInfo.jdbcUrl.replace(serverInfo.databaseName, "oceanbase"); do { try { if (retry > 0) { int sleep = retry > 9 ? 500 : 1 << retry; try { TimeUnit.SECONDS.sleep(sleep); } catch (InterruptedException e) { } LOG.warn("retry fetch RsUrl the {} times", retry); } conn = DBUtil.getConnection(DataBaseType.OceanBase, sysJDBCUrl, serverInfo.sysUser, serverInfo.sysPass); String sql = "show parameters like 'obconfig_url'"; LOG.info("query param: {}", sql); PreparedStatement stmt = conn.prepareStatement(sql); ResultSet result = stmt.executeQuery(); if (result.next()) { configUrl = result.getString("Value"); } if (StringUtils.isNotBlank(configUrl)) { break; } } catch (Exception e) { ++retry; LOG.warn("fetch root server list(rsList) error {}", e.getMessage()); } finally { DBUtil.closeDBResources(null, conn); } } while (retry < 3); LOG.info("configure url is: " + configUrl); originalConfig.set(ConfigKey.OBCONFIG_URL, configUrl); } catch (Exception e) { LOG.error("Fail to get configure url: {}", e.getMessage(), e); throw DataXException.asDataXException(Hbase094xWriterErrorCode.REQUIRED_VALUE, "未配置obConfigUrl,且无法获取obConfigUrl"); } } return configUrl; } /** * 注意:此方法仅执行一次。 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) @Override public void prepare() { } /** * 注意:此方法仅执行一次。 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。 这里的 * mandatoryNumber 是强制必须切分的份数。 */ @Override public List split(int mandatoryNumber) { // This function does not need any change. Configuration simplifiedConf = this.originalConfig; List splitResultConfigs = new ArrayList(); for (int j = 0; j < mandatoryNumber; j++) { splitResultConfigs.add(simplifiedConf.clone()); } return splitResultConfigs; } /** * 注意:此方法仅执行一次。 最佳实践:如果 Job 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ @Override public void post() { // No post supported } /** * 注意:此方法仅执行一次。 最佳实践:通常配合 Job 中的 post() 方法一起完成 Job 的资源释放。 */ @Override public void destroy() { } } public static class Task extends Writer.Task { private Configuration taskConfig; private CommonRdbmsWriter.Task writerTask; /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 * startWrite()做准备。 */ @Override public void init() { this.taskConfig = super.getPluginJobConf(); String mode = this.taskConfig.getString(ConfigKey.MODE); ModeType modeType = ModeType.getByTypeName(mode); switch (modeType) { case Normal: try { this.writerTask = new ObHBaseWriteTask(this.taskConfig); } catch (Exception e) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.INIT_ERROR, "ObHbase writer init error:" + e.getMessage()); } break; default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, "ObHbase not support this mode type:" + modeType); } } /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:如果 Task * 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ @Override public void prepare() { this.writerTask.prepare(taskConfig); } /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:此处适当封装确保简洁清晰完成数据写入工作。 */ public void startWrite(RecordReceiver recordReceiver) { this.writerTask.startWrite(recordReceiver, taskConfig, super.getTaskPluginCollector()); } /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ @Override public void post() { this.writerTask.post(taskConfig); } /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。 */ @Override public void destroy() { this.writerTask.destroy(taskConfig); } } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/LocalStrings.properties ================================================ databasewriterbuffer.1=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684table\u4e0d\u5b58\u5728, \u7b97\u51fa\u7684tableName={0},db={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/LocalStrings_en_US.properties ================================================ databasewriterbuffer.1=The [table] calculated based on the rules does not exist. The calculated [tableName]={0}, [db]={1}. Please check the rules you configured. ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/LocalStrings_ja_JP.properties ================================================ databasewriterbuffer.1=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684table\u4e0d\u5b58\u5728, \u7b97\u51fa\u7684tableName={0},db={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/LocalStrings_zh_CN.properties ================================================ databasewriterbuffer.1=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684table\u4e0d\u5b58\u5728, \u7b97\u51fa\u7684tableName={0},db={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/LocalStrings_zh_HK.properties ================================================ databasewriterbuffer.1=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684table\u4e0d\u5b58\u5728, \u7b97\u51fa\u7684tableName={0},db={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219.databasewriterbuffer.1=通過規則計算出來的table不存在, 算出的tableName={0},db={1}, 請檢查您配置的規則. ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/LocalStrings_zh_TW.properties ================================================ databasewriterbuffer.1=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684table\u4e0d\u5b58\u5728, \u7b97\u51fa\u7684tableName={0},db={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219.databasewriterbuffer.1=通過規則計算出來的table不存在, 算出的tableName={0},db={1}, 請檢查您配置的規則. ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/ObDataSourceErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter.ext; import com.alibaba.datax.common.spi.ErrorCode; public enum ObDataSourceErrorCode implements ErrorCode { DESC("ObDataSourceError code", "connect error"); private final String code; private final String describe; private ObDataSourceErrorCode(String code, String describe) { this.code = code; this.describe = describe; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.describe; } @Override public String toString() { return String.format("Code:[%s], Describe:[%s]. ", this.code, this.describe); } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/ObHbaseTableHolder.java ================================================ /* * Copyright (c) 2021 OceanBase ob-loader-dumper is licensed under Mulan PSL v2. You can use this software according to * the terms and conditions of the Mulan PSL v2. You may obtain a copy of Mulan PSL v2 at: * * http://license.coscl.org.cn/MulanPSL2 * * THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING * BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE. See the Mulan PSL v2 for more * details. */ package com.alibaba.datax.plugin.writer.obhbasewriter.ext; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.writer.obhbasewriter.Hbase094xWriterErrorCode; import com.alipay.oceanbase.hbase.OHTable; import org.apache.hadoop.conf.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * @author cjyyz * @date 2023/03/16 * @since */ public class ObHbaseTableHolder { private static final Logger LOG = LoggerFactory.getLogger(ObHbaseTableHolder.class); private Configuration configuration; private String hbaseTableName; private OHTable ohTable; public ObHbaseTableHolder(Configuration configuration, String hbaseTableName) { this.configuration = configuration; this.hbaseTableName = hbaseTableName; } public OHTable getOhTable() { try { if (ohTable == null) { ohTable = new OHTable(configuration, hbaseTableName); } return ohTable; } catch (Exception e) { LOG.error("build obHTable: {} failed. reason: {}", hbaseTableName, e.getMessage()); throw DataXException.asDataXException(Hbase094xWriterErrorCode.GET_HBASE_TABLE_ERROR, Hbase094xWriterErrorCode.GET_HBASE_TABLE_ERROR.getDescription()); } } public void destroy() { try { if (ohTable != null) { ohTable.close(); } } catch (Exception e) { LOG.warn("error in closing htable: {}. Reason: {}", hbaseTableName, e.getMessage()); } } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/ext/ServerConnectInfo.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter.ext; import com.google.common.base.Preconditions; import java.util.regex.Matcher; import java.util.regex.Pattern; import static org.apache.commons.lang3.StringUtils.EMPTY; public class ServerConnectInfo { public String clusterName; public String tenantName; // userName doesn't contain tenantName or clusterName public String userName; public String password; public String databaseName; public String ipPort; public String jdbcUrl; public String host; public String port; public boolean publicCloud; public int rpcPort; public String sysUser; public String sysPass; /** * * @param jdbcUrl format is jdbc:oceanbase//ip:port * @param username format is cluster:tenant:username or username@tenant#cluster or user@tenant or user * @param password */ public ServerConnectInfo(final String jdbcUrl, final String username, final String password) { this(jdbcUrl, username, password, null, null); } public ServerConnectInfo(final String jdbcUrl, final String username, final String password, final String sysUser, final String sysPass) { if (jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); Preconditions.checkArgument(ss.length == 3, "jdbc url format is not correct:" + jdbcUrl); this.userName = username; this.clusterName = ss[1].trim().split(":")[0]; this.tenantName = ss[1].trim().split(":")[1]; this.jdbcUrl = ss[2]; } else { this.jdbcUrl = jdbcUrl; } this.password = password; this.sysUser = sysUser; this.sysPass = sysPass; parseJdbcUrl(jdbcUrl); parseFullUserName(username); } private void parseJdbcUrl(final String jdbcUrl) { Pattern pattern = Pattern.compile("//([\\w\\.\\-]+:\\d+)/([\\w-]+)\\?"); Matcher matcher = pattern.matcher(jdbcUrl); if (matcher.find()) { String ipPort = matcher.group(1); String dbName = matcher.group(2); this.ipPort = ipPort; String[] hostPort = ipPort.split(":"); this.host = hostPort[0]; this.port = hostPort[1]; this.databaseName = dbName; this.publicCloud = host.endsWith("aliyuncs.com"); } else { throw new RuntimeException("Invalid argument:" + jdbcUrl); } } private void parseFullUserName(final String fullUserName) { int tenantIndex = fullUserName.indexOf("@"); int clusterIndex = fullUserName.indexOf("#"); // 适用于jdbcUrl以||_dsc_ob10_dsc_开头的场景 if (fullUserName.contains(":") && tenantIndex < 0) { String[] names = fullUserName.split(":"); if (names.length != 3) { throw new RuntimeException("invalid argument: " + fullUserName); } else { this.clusterName = names[0]; this.tenantName = names[1]; this.userName = names[2]; } } else if (tenantIndex < 0) { // 适用于short jdbcUrl,且username中不含租户名(主要是公有云场景,此场景下不计算分区) this.userName = fullUserName; this.clusterName = EMPTY; this.tenantName = EMPTY; } else { // 适用于short jdbcUrl,且username中含租户名 this.userName = fullUserName.substring(0, tenantIndex); if (clusterIndex < 0) { this.clusterName = EMPTY; this.tenantName = fullUserName.substring(tenantIndex + 1); } else { this.clusterName = fullUserName.substring(clusterIndex + 1); this.tenantName = fullUserName.substring(tenantIndex + 1, clusterIndex); } } } @Override public String toString() { return "ServerConnectInfo{" + "clusterName='" + clusterName + '\'' + ", tenantName='" + tenantName + '\'' + ", userName='" + userName + '\'' + ", password='" + password + '\'' + ", databaseName='" + databaseName + '\'' + ", ipPort='" + ipPort + '\'' + ", jdbcUrl='" + jdbcUrl + '\'' + ", publicCloud=" + publicCloud + ", rpcPort=" + rpcPort + '}'; } public String getFullUserName() { StringBuilder builder = new StringBuilder(); builder.append(userName); if (publicCloud || (rpcPort != 0 && EMPTY.equals(clusterName))) { return builder.toString(); } if (!EMPTY.equals(tenantName)) { builder.append("@").append(tenantName); } if (!EMPTY.equals(clusterName)) { builder.append("#").append(clusterName); } if (EMPTY.equals(this.clusterName) && EMPTY.equals(this.tenantName)) { return this.userName; } return builder.toString(); } public void setRpcPort(int rpcPort) { this.rpcPort = rpcPort; } public void setSysUser(String sysUser) { this.sysUser = sysUser; } public void setSysPass(String sysPass) { this.sysPass = sysPass; } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/LocalStrings.properties ================================================ multitablewritertask.1=\u914d\u7f6e\u7684tableList\u4e3a\u591a\u8868\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.2=\u914d\u7f6e\u7684\u591a\u5e93\u4e2d\u7684\u8868\u540d\u6709\u91cd\u590d\u7684\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\u548c\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.3=\u914d\u7f6e\u7684\u6240\u6709\u8868\u540d\u90fd\u76f8\u540c\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.4=\u914d\u7f6e\u7684table\u548cdb\u540d\u79f0\u90fd\u76f8\u540c\uff0c\u6b64\u79cd\u56de\u6d41\u65b9\u5f0f\u4e0d\u652f\u6301 multitablewritertask.5=\u5217\u914d\u7f6e\u4fe1\u606f\u6709\u9519\u8bef. \u56e0\u4e3a\u60a8\u914d\u7f6e\u7684\u4efb\u52a1\u4e2d\uff0c\u6e90\u5934\u8bfb\u53d6\u5b57\u6bb5\u6570:{0} \u4e0e \u76ee\u7684\u8868\u8981\u5199\u5165\u7684\u5b57\u6bb5\u6570:{1} \u4e0d\u76f8\u7b49. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. multitablewritertask.6=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684tableName\u67e5\u627e\u5bf9\u5e94\u7684db\u4e0d\u5b58\u5728\uff0ctableName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.7=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u548ctable\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0},tableName={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.8=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.9=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684dbName[{0}], \u5b58\u5728\u591a\u5f20\u5206\u8868\uff0c\u8bf7\u914d\u7f6e\u60a8\u7684\u5206\u8868\u89c4\u5219. multitablewritertask.10=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0},ErrorCode:{1} multitablewritertask.11=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0},ErrorCode:{1} multitablewritertask.12=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0},ErrorCode:{1} multitablewritertask.13=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0},ErrorCode:{1} multitablewritertask.14=\u5199\u5165\u8868[{0}]\u5931\u8d25,\u4f11\u7720[{1}]\u6beb\u79d2,\u6570\u636e:{2} multitablewritertask.15=\u5199\u5165\u8868[{0}]\u5b58\u5728\u810f\u6570\u636e,record={1}, \u5199\u5165\u5f02\u5e38\u4e3a: singletablewritertask.1=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0},ErrorCode:{1} singletablewritertask.2=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0},ErrorCode:{1} singletablewritertask.3=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0},ErrorCode:{1} singletablewritertask.4=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0},ErrorCode:{1} ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/LocalStrings_en_US.properties ================================================ multitablewritertask.1=The configured [tableList] contains multiple tables but no table splitting rules have been configured. Please check your configuration. multitablewritertask.2=There are repeated table names in the multiple tables you configured, but no database or table splitting rules have been configured. Please check your configuration. multitablewritertask.3=All configured tables share the same name, but no database splitting rules have been configured. Please check your configuration. multitablewritertask.4=The configured table and database share the same name. This back-to-source method is not supported. multitablewritertask.5=Error in column configuration information. In your configured tasks, the number of source fields to be read: {0} and the number of fields to be written to the target table: {1} are not equivalent. Please check your configuration and make corrections. multitablewritertask.6=The database that corresponds to the [tableName] calculated based on the rules does not exist. The [tableName]={0}. Please check the rules you configured. multitablewritertask.7=The database and [table] calculated based on the rules do not exist. The calculated [dbName]={0}, and [tableName]={1}. Please check the rules you configured. multitablewritertask.8=The database calculated based on the rules does not exist. The calculated [dbName]={0}. Please check the rules you configured. multitablewritertask.9=The [dbName] [{0}] calculated based on the rules contains multiple sub-tables. Please configure your table splitting rules. multitablewritertask.10=Fatal exception in OB. Roll back this write and hibernate for five minutes. SQLState: {0}. ErrorCode: {1} multitablewritertask.11=Recoverable exception in OB. Roll back this write and hibernate for one minute. SQLState: {0}. ErrorCode: {1} multitablewritertask.12=Exception in OB. Roll back this write and hibernate for one second. Write and submit the records one by one. SQLState: {0}. ErrorCode: {1} multitablewritertask.13=Exception in OB. Roll back this write. Write and submit the records one by one. SQLState: {0}. ErrorCode: {1} multitablewritertask.14=Failed to write to table: [{0}]. Hibernate for [{1}] milliseconds. Data: {2} multitablewritertask.15=writing table [{0}] contains dirty data. Record={1}. Writing exception is: singletablewritertask.1=Fatal exception in OB. Roll back this write and hibernate for five minutes. SQLState: {0}. ErrorCode: {1} singletablewritertask.2=Recoverable exception in OB. Roll back this write and hibernate for one minute. SQLState: {0}. ErrorCode: {1} singletablewritertask.3=Exception in OB. Roll back this write and hibernate for one second. Write and submit the records one by one. SQLState: {0}. ErrorCode: {1} singletablewritertask.4=Exception in OB. Roll back this write. Write and submit the records one by one. SQLState: {0}. ErrorCode: {1} ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/LocalStrings_ja_JP.properties ================================================ multitablewritertask.1=\u914d\u7f6e\u7684tableList\u4e3a\u591a\u8868\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.2=\u914d\u7f6e\u7684\u591a\u5e93\u4e2d\u7684\u8868\u540d\u6709\u91cd\u590d\u7684\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\u548c\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.3=\u914d\u7f6e\u7684\u6240\u6709\u8868\u540d\u90fd\u76f8\u540c\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.4=\u914d\u7f6e\u7684table\u548cdb\u540d\u79f0\u90fd\u76f8\u540c\uff0c\u6b64\u79cd\u56de\u6d41\u65b9\u5f0f\u4e0d\u652f\u6301 multitablewritertask.5=\u5217\u914d\u7f6e\u4fe1\u606f\u6709\u9519\u8bef. \u56e0\u4e3a\u60a8\u914d\u7f6e\u7684\u4efb\u52a1\u4e2d\uff0c\u6e90\u5934\u8bfb\u53d6\u5b57\u6bb5\u6570:{0} \u4e0e \u76ee\u7684\u8868\u8981\u5199\u5165\u7684\u5b57\u6bb5\u6570:{1} \u4e0d\u76f8\u7b49. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. multitablewritertask.6=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684tableName\u67e5\u627e\u5bf9\u5e94\u7684db\u4e0d\u5b58\u5728\uff0ctableName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.7=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u548ctable\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0},tableName={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.8=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.9=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684dbName[{0}], \u5b58\u5728\u591a\u5f20\u5206\u8868\uff0c\u8bf7\u914d\u7f6e\u60a8\u7684\u5206\u8868\u89c4\u5219. multitablewritertask.10=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0} multitablewritertask.11=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0} multitablewritertask.12=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} multitablewritertask.13=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} multitablewritertask.14=\u5199\u5165\u8868[{0}]\u5931\u8d25,\u4f11\u7720[{1}]\u6beb\u79d2,\u6570\u636e:{2} multitablewritertask.15=\u5199\u5165\u8868[{0}]\u5b58\u5728\u810f\u6570\u636e,record={1}, \u5199\u5165\u5f02\u5e38\u4e3a: singletablewritertask.1=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0} singletablewritertask.2=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0} singletablewritertask.3=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} singletablewritertask.4=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/LocalStrings_zh_CN.properties ================================================ multitablewritertask.1=\u914d\u7f6e\u7684tableList\u4e3a\u591a\u8868\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.2=\u914d\u7f6e\u7684\u591a\u5e93\u4e2d\u7684\u8868\u540d\u6709\u91cd\u590d\u7684\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\u548c\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.3=\u914d\u7f6e\u7684\u6240\u6709\u8868\u540d\u90fd\u76f8\u540c\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.4=\u914d\u7f6e\u7684table\u548cdb\u540d\u79f0\u90fd\u76f8\u540c\uff0c\u6b64\u79cd\u56de\u6d41\u65b9\u5f0f\u4e0d\u652f\u6301 multitablewritertask.5=\u5217\u914d\u7f6e\u4fe1\u606f\u6709\u9519\u8bef. \u56e0\u4e3a\u60a8\u914d\u7f6e\u7684\u4efb\u52a1\u4e2d\uff0c\u6e90\u5934\u8bfb\u53d6\u5b57\u6bb5\u6570:{0} \u4e0e \u76ee\u7684\u8868\u8981\u5199\u5165\u7684\u5b57\u6bb5\u6570:{1} \u4e0d\u76f8\u7b49. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. multitablewritertask.6=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684tableName\u67e5\u627e\u5bf9\u5e94\u7684db\u4e0d\u5b58\u5728\uff0ctableName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.7=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u548ctable\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0},tableName={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.8=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.9=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684dbName[{0}], \u5b58\u5728\u591a\u5f20\u5206\u8868\uff0c\u8bf7\u914d\u7f6e\u60a8\u7684\u5206\u8868\u89c4\u5219. multitablewritertask.10=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0} multitablewritertask.11=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0} multitablewritertask.12=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} multitablewritertask.13=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} multitablewritertask.14=\u5199\u5165\u8868[{0}]\u5931\u8d25,\u4f11\u7720[{1}]\u6beb\u79d2,\u6570\u636e:{2} multitablewritertask.15=\u5199\u5165\u8868[{0}]\u5b58\u5728\u810f\u6570\u636e,record={1}, \u5199\u5165\u5f02\u5e38\u4e3a: singletablewritertask.1=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0} singletablewritertask.2=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0} singletablewritertask.3=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} singletablewritertask.4=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/LocalStrings_zh_HK.properties ================================================ multitablewritertask.1=\u914d\u7f6e\u7684tableList\u4e3a\u591a\u8868\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.2=\u914d\u7f6e\u7684\u591a\u5e93\u4e2d\u7684\u8868\u540d\u6709\u91cd\u590d\u7684\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\u548c\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.3=\u914d\u7f6e\u7684\u6240\u6709\u8868\u540d\u90fd\u76f8\u540c\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.4=\u914d\u7f6e\u7684table\u548cdb\u540d\u79f0\u90fd\u76f8\u540c\uff0c\u6b64\u79cd\u56de\u6d41\u65b9\u5f0f\u4e0d\u652f\u6301 multitablewritertask.5=\u5217\u914d\u7f6e\u4fe1\u606f\u6709\u9519\u8bef. \u56e0\u4e3a\u60a8\u914d\u7f6e\u7684\u4efb\u52a1\u4e2d\uff0c\u6e90\u5934\u8bfb\u53d6\u5b57\u6bb5\u6570:{0} \u4e0e \u76ee\u7684\u8868\u8981\u5199\u5165\u7684\u5b57\u6bb5\u6570:{1} \u4e0d\u76f8\u7b49. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. multitablewritertask.6=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684tableName\u67e5\u627e\u5bf9\u5e94\u7684db\u4e0d\u5b58\u5728\uff0ctableName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.7=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u548ctable\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0},tableName={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.8=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.9=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684dbName[{0}], \u5b58\u5728\u591a\u5f20\u5206\u8868\uff0c\u8bf7\u914d\u7f6e\u60a8\u7684\u5206\u8868\u89c4\u5219. multitablewritertask.10=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0} multitablewritertask.11=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0} multitablewritertask.12=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} multitablewritertask.13=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} multitablewritertask.14=\u5199\u5165\u8868[{0}]\u5931\u8d25,\u4f11\u7720[{1}]\u6beb\u79d2,\u6570\u636e:{2} multitablewritertask.15=\u5199\u5165\u8868[{0}]\u5b58\u5728\u810f\u6570\u636e,record={1}, \u5199\u5165\u5f02\u5e38\u4e3a: singletablewritertask.1=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0} singletablewritertask.2=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0} singletablewritertask.3=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} singletablewritertask.4=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0}multitablewritertask.1=配置的tableList為多表,但未配置分表規則,請檢查您的配置 multitablewritertask.2=配置的多庫中的表名有重複的,但未配置分庫規則和分表規則,請檢查您的配置 multitablewritertask.3=配置的所有表名都相同,但未配置分庫規則,請檢查您的配置 multitablewritertask.4=配置的table和db名稱都相同,此種回流方式不支援 multitablewritertask.5=列配置資訊有錯誤. 因為您配置的任務中,源頭讀取欄位數:{0}與 目的表要寫入的欄位數:{1}不相等. 請檢查您的配置並作出修改. multitablewritertask.6=通過規則計算出來的tableName查找對應的db不存在,tableName={0}, 請檢查您配置的規則. multitablewritertask.7=通過規則計算出來的db和table不存在,算出的dbName={0},tableName={1}, 請檢查您配置的規則. multitablewritertask.8=通過規則計算出來的db不存在,算出的dbName={0}, 請檢查您配置的規則. multitablewritertask.9=通過規則計算出來的dbName[{0}], 存在多張分表,請配置您的分表規則. multitablewritertask.10=遇到OB致命異常,回滾此次寫入, 休眠 5分鐘,SQLState:{0} multitablewritertask.11=遇到OB可恢復異常,回滾此次寫入, 休眠 1分鐘,SQLState:{0} multitablewritertask.12=遇到OB異常,回滾此次寫入, 休眠 1秒,採用逐條寫入提交,SQLState:{0} multitablewritertask.13=遇到OB異常,回滾此次寫入, 採用逐條寫入提交,SQLState:{0} multitablewritertask.14=寫入表[{0}]失敗,休眠[{1}]毫秒,數據:{2} multitablewritertask.15=寫入表[{0}]存在髒數據,record={1}, 寫入異常為: singletablewritertask.1=遇到OB致命異常,回滾此次寫入, 休眠 5分鐘,SQLState:{0} singletablewritertask.2=遇到OB可恢復異常,回滾此次寫入, 休眠 1分鐘,SQLState:{0} singletablewritertask.3=遇到OB異常,回滾此次寫入, 休眠 1秒,採用逐條寫入提交,SQLState:{0} singletablewritertask.4=遇到OB異常,回滾此次寫入, 採用逐條寫入提交,SQLState:{0} ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/LocalStrings_zh_TW.properties ================================================ multitablewritertask.1=\u914d\u7f6e\u7684tableList\u4e3a\u591a\u8868\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.2=\u914d\u7f6e\u7684\u591a\u5e93\u4e2d\u7684\u8868\u540d\u6709\u91cd\u590d\u7684\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\u548c\u5206\u8868\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.3=\u914d\u7f6e\u7684\u6240\u6709\u8868\u540d\u90fd\u76f8\u540c\uff0c\u4f46\u672a\u914d\u7f6e\u5206\u5e93\u89c4\u5219\uff0c\u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e multitablewritertask.4=\u914d\u7f6e\u7684table\u548cdb\u540d\u79f0\u90fd\u76f8\u540c\uff0c\u6b64\u79cd\u56de\u6d41\u65b9\u5f0f\u4e0d\u652f\u6301 multitablewritertask.5=\u5217\u914d\u7f6e\u4fe1\u606f\u6709\u9519\u8bef. \u56e0\u4e3a\u60a8\u914d\u7f6e\u7684\u4efb\u52a1\u4e2d\uff0c\u6e90\u5934\u8bfb\u53d6\u5b57\u6bb5\u6570:{0} \u4e0e \u76ee\u7684\u8868\u8981\u5199\u5165\u7684\u5b57\u6bb5\u6570:{1} \u4e0d\u76f8\u7b49. \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4f5c\u51fa\u4fee\u6539. multitablewritertask.6=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684tableName\u67e5\u627e\u5bf9\u5e94\u7684db\u4e0d\u5b58\u5728\uff0ctableName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.7=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u548ctable\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0},tableName={1}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.8=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684db\u4e0d\u5b58\u5728\uff0c\u7b97\u51fa\u7684dbName={0}, \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684\u89c4\u5219. multitablewritertask.9=\u901a\u8fc7\u89c4\u5219\u8ba1\u7b97\u51fa\u6765\u7684dbName[{0}], \u5b58\u5728\u591a\u5f20\u5206\u8868\uff0c\u8bf7\u914d\u7f6e\u60a8\u7684\u5206\u8868\u89c4\u5219. multitablewritertask.10=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0} multitablewritertask.11=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0} multitablewritertask.12=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} multitablewritertask.13=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} multitablewritertask.14=\u5199\u5165\u8868[{0}]\u5931\u8d25,\u4f11\u7720[{1}]\u6beb\u79d2,\u6570\u636e:{2} multitablewritertask.15=\u5199\u5165\u8868[{0}]\u5b58\u5728\u810f\u6570\u636e,record={1}, \u5199\u5165\u5f02\u5e38\u4e3a: singletablewritertask.1=\u9047\u5230OB\u81f4\u547d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 5\u5206\u949f,SQLState:{0} singletablewritertask.2=\u9047\u5230OB\u53ef\u6062\u590d\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u5206\u949f,SQLState:{0} singletablewritertask.3=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u4f11\u7720 1\u79d2,\u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0} singletablewritertask.4=\u9047\u5230OB\u5f02\u5e38,\u56de\u6eda\u6b64\u6b21\u5199\u5165, \u91c7\u7528\u9010\u6761\u5199\u5165\u63d0\u4ea4,SQLState:{0}multitablewritertask.1=配置的tableList為多表,但未配置分表規則,請檢查您的配置 multitablewritertask.2=配置的多庫中的表名有重複的,但未配置分庫規則和分表規則,請檢查您的配置 multitablewritertask.3=配置的所有表名都相同,但未配置分庫規則,請檢查您的配置 multitablewritertask.4=配置的table和db名稱都相同,此種回流方式不支援 multitablewritertask.5=列配置資訊有錯誤. 因為您配置的任務中,源頭讀取欄位數:{0}與 目的表要寫入的欄位數:{1}不相等. 請檢查您的配置並作出修改. multitablewritertask.6=通過規則計算出來的tableName查找對應的db不存在,tableName={0}, 請檢查您配置的規則. multitablewritertask.7=通過規則計算出來的db和table不存在,算出的dbName={0},tableName={1}, 請檢查您配置的規則. multitablewritertask.8=通過規則計算出來的db不存在,算出的dbName={0}, 請檢查您配置的規則. multitablewritertask.9=通過規則計算出來的dbName[{0}], 存在多張分表,請配置您的分表規則. multitablewritertask.10=遇到OB致命異常,回滾此次寫入, 休眠 5分鐘,SQLState:{0} multitablewritertask.11=遇到OB可恢復異常,回滾此次寫入, 休眠 1分鐘,SQLState:{0} multitablewritertask.12=遇到OB異常,回滾此次寫入, 休眠 1秒,採用逐條寫入提交,SQLState:{0} multitablewritertask.13=遇到OB異常,回滾此次寫入, 採用逐條寫入提交,SQLState:{0} multitablewritertask.14=寫入表[{0}]失敗,休眠[{1}]毫秒,數據:{2} multitablewritertask.15=寫入表[{0}]存在髒數據,record={1}, 寫入異常為: singletablewritertask.1=遇到OB致命異常,回滾此次寫入, 休眠 5分鐘,SQLState:{0} singletablewritertask.2=遇到OB可恢復異常,回滾此次寫入, 休眠 1分鐘,SQLState:{0} singletablewritertask.3=遇到OB異常,回滾此次寫入, 休眠 1秒,採用逐條寫入提交,SQLState:{0} singletablewritertask.4=遇到OB異常,回滾此次寫入, 採用逐條寫入提交,SQLState:{0} ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/MultiVersionWriteTask.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter.task; import com.alibaba.datax.common.util.Configuration; /** * TODO(yuez)升级hbase api之后再补充暂时用不到 */ public class MultiVersionWriteTask extends ObHBaseWriteTask{ public MultiVersionWriteTask(Configuration configuration) throws Exception { super(configuration); } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/NormalWriteTask.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter.task; import com.alibaba.datax.common.util.Configuration; /** * TODO(yuez) 升级hbase api之后再补充暂时用不到 */ public class NormalWriteTask extends ObHBaseWriteTask{ public NormalWriteTask(Configuration configuration) throws Exception { super(configuration); } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/ObHBaseWriteTask.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter.task; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.writer.obhbasewriter.Config; import com.alibaba.datax.plugin.writer.obhbasewriter.ConfigKey; import com.alibaba.datax.plugin.writer.obhbasewriter.Constant; import com.alibaba.datax.plugin.writer.obhbasewriter.NullModeType; import com.alibaba.datax.plugin.writer.obhbasewriter.ObHTableInfo; import com.alibaba.datax.plugin.writer.obhbasewriter.ext.ServerConnectInfo; import com.google.common.collect.Lists; import java.util.ArrayList; import java.util.List; import java.util.concurrent.BlockingQueue; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.ReentrantLock; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class ObHBaseWriteTask extends CommonRdbmsWriter.Task { private final static MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(ObHBaseWriteTask.class); private final static Logger LOG = LoggerFactory.getLogger(ObHBaseWriteTask.class); public NullModeType nullMode = null; private int maxRetryCount; public List columns; public List rowkeyColumn; public Configuration versionColumn; public String hbaseTableName; public String encoding; public Boolean walFlag; String configUrl; String dbName; String ip; String port; String fullUserName; boolean usdOdpMode; String sysUsername; String sysPassword; private ObHTableInfo obHTableInfo; private ConcurrentTableWriter concurrentWriter; private boolean allTaskInQueue = false; private long startTime = 0; private String threadName = Thread.currentThread().getName(); private Lock lock = new ReentrantLock(); private Condition condition = lock.newCondition(); public ObHBaseWriteTask(Configuration configuration) { super(DataBaseType.MySql); init(configuration); } @Override public void init(com.alibaba.datax.common.util.Configuration configuration) { this.obHTableInfo = new ObHTableInfo(configuration); this.hbaseTableName = configuration.getString(ConfigKey.TABLE); this.columns = configuration.getListConfiguration(ConfigKey.COLUMN); this.rowkeyColumn = configuration.getListConfiguration(ConfigKey.ROWKEY_COLUMN); this.versionColumn = configuration.getConfiguration(ConfigKey.VERSION_COLUMN); this.encoding = configuration.getString(ConfigKey.ENCODING, Constant.DEFAULT_ENCODING); this.nullMode = NullModeType.getByTypeName(configuration.getString(ConfigKey.NULL_MODE, Constant.DEFAULT_NULL_MODE)); // this.memstoreThreshold = configuration.getDouble(Config.MEMSTORE_THRESHOLD, Config.DEFAULT_MEMSTORE_THRESHOLD); this.walFlag = configuration.getBool(ConfigKey.WAL_FLAG, true); this.maxRetryCount = configuration.getInt(ConfigKey.MAX_RETRY_COUNT, 3); // default 1000 rows are committed together this.batchSize = com.alibaba.datax.plugin.rdbms.writer.Constant.DEFAULT_BATCH_SIZE; this.batchByteSize = com.alibaba.datax.plugin.rdbms.writer.Constant.DEFAULT_BATCH_BYTE_SIZE; this.configUrl = configuration.getString(ConfigKey.OBCONFIG_URL); this.jdbcUrl = configuration.getString(ConfigKey.JDBC_URL); this.username = configuration.getString(Key.USERNAME); this.password = configuration.getString(Key.PASSWORD); this.dbName = configuration.getString(Key.DBNAME); this.usdOdpMode = configuration.getBool(ConfigKey.USE_ODP_MODE); ServerConnectInfo connectInfo = new ServerConnectInfo(jdbcUrl, username, password); String clusterName = connectInfo.clusterName; this.fullUserName = connectInfo.getFullUserName(); final String[] ipPort = connectInfo.ipPort.split(":"); if (usdOdpMode) { this.ip = ipPort[0]; this.port = ipPort[1]; } else { this.sysUsername = configuration.getString(ConfigKey.OB_SYS_USER); this.sysPassword = configuration.getString(ConfigKey.OB_SYS_PASSWORD); connectInfo.setSysUser(sysUsername); connectInfo.setSysPass(sysPassword); if (!configUrl.contains("ObRegion")) { if (configUrl.contains("?")) { configUrl += "&ObRegion=" + clusterName; } else { configUrl += "?ObRegion=" + clusterName; } } if (!configUrl.contains("database")) { configUrl += "&database=" + dbName; } } if (null == concurrentWriter) { concurrentWriter = new ConcurrentTableWriter(configuration, connectInfo); allTaskInQueue = false; } } @Override public void prepare(Configuration configuration) { concurrentWriter.start(); } @Override public void startWrite(RecordReceiver recordReceiver, Configuration configuration, TaskPluginCollector taskPluginCollector) { this.taskPluginCollector = taskPluginCollector; int recordCount = 0; int bufferBytes = 0; List records = new ArrayList<>(); try { Record record; while ((record = recordReceiver.getFromReader()) != null) { recordCount++; bufferBytes += record.getMemorySize(); records.add(record); // 按照指定的批大小进行批量写入 if (records.size() >= batchSize || bufferBytes >= batchByteSize) { concurrentWriter.addBatchRecords(Lists.newArrayList(records)); records.clear(); bufferBytes = 0; } } if (!records.isEmpty()) { concurrentWriter.addBatchRecords(records); } } catch (Throwable e) { LOG.warn("startWrite error unexpected ", e); throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); } LOG.info(recordCount + " rows received."); waitTaskFinish(); } public void waitTaskFinish() { this.allTaskInQueue = true; LOG.info("ConcurrentTableWriter has put all task in queue, queueSize = {}, total = {}, finished = {}", concurrentWriter.getTaskQueueSize(), concurrentWriter.getTotalTaskCount(), concurrentWriter.getFinishTaskCount()); lock.lock(); try { while (!concurrentWriter.checkFinish()) { condition.await(50, TimeUnit.MILLISECONDS); // print statistic LOG.debug("Statistic total task {}, finished {}, queue Size {}", concurrentWriter.getTotalTaskCount(), concurrentWriter.getFinishTaskCount(), concurrentWriter.getTaskQueueSize()); concurrentWriter.printStatistics(); } } catch (InterruptedException e) { LOG.warn("Concurrent table writer wait task finish interrupt"); } finally { lock.unlock(); } LOG.debug("wait all InsertTask finished ..."); } public boolean isFinished() { return allTaskInQueue && concurrentWriter.checkFinish(); } public void singalTaskFinish() { lock.lock(); try { condition.signal(); } finally { lock.unlock(); } } public void collectDirtyRecord(Record record, Throwable throwable) { this.taskPluginCollector.collectDirtyRecord(record, throwable); } @Override public void post(Configuration configuration) { } @Override public void destroy(Configuration configuration) { if (concurrentWriter != null) { concurrentWriter.destory(); } super.destroy(configuration); } public class ConcurrentTableWriter { private BlockingQueue> queue; private List putTasks; private Configuration config; private AtomicLong totalTaskCount; private AtomicLong finishTaskCount; private ServerConnectInfo connectInfo; private ExecutorService executorService; private final int threadCount; public ConcurrentTableWriter(Configuration config, ServerConnectInfo connectInfo) { this.threadCount = config.getInt(Config.WRITER_THREAD_COUNT, Config.DEFAULT_WRITER_THREAD_COUNT); this.queue = new LinkedBlockingQueue>(threadCount << 1); this.putTasks = new ArrayList(threadCount); this.config = config; this.totalTaskCount = new AtomicLong(0); this.finishTaskCount = new AtomicLong(0); this.executorService = Executors.newFixedThreadPool(threadCount); this.connectInfo = connectInfo; } public long getTotalTaskCount() { return totalTaskCount.get(); } public long getFinishTaskCount() { return finishTaskCount.get(); } public int getTaskQueueSize() { return queue.size(); } public void increFinishCount() { finishTaskCount.incrementAndGet(); } // should check after put all the task in the queue public boolean checkFinish() { long finishCount = finishTaskCount.get(); long totalCount = totalTaskCount.get(); return finishCount == totalCount; } public synchronized void start() { for (int i = 0; i < threadCount; ++i) { LOG.info("start {} insert task.", (i + 1)); PutTask putTask = new PutTask(threadName, queue, config, connectInfo, obHTableInfo, ObHBaseWriteTask.this); putTask.setWriter(this); putTasks.add(putTask); } for (PutTask task : putTasks) { executorService.execute(task); } } public void printStatistics() { long insertTotalCost = 0; long insertTotalCount = 0; for (PutTask task : putTasks) { insertTotalCost += task.getTotalCost(); insertTotalCount += task.getPutCount(); } long avgCost = 0; if (insertTotalCount != 0) { avgCost = insertTotalCost / insertTotalCount; } ObHBaseWriteTask.LOG.debug("Put {} times, totalCost {} ms, average {} ms", insertTotalCount, insertTotalCost, avgCost); } public void addBatchRecords(final List records) throws InterruptedException { boolean isSucc = false; while (!isSucc) { isSucc = queue.offer(records, 5, TimeUnit.MILLISECONDS); } totalTaskCount.incrementAndGet(); } public synchronized void destory() { if (putTasks != null) { for (PutTask task : putTasks) { task.setStop(); task.destroy(); } } destroyExecutor(); } private void destroyExecutor() { if (executorService != null && !executorService.isShutdown()) { executorService.shutdown(); try { executorService.awaitTermination(0L, TimeUnit.SECONDS); } catch (InterruptedException var2) { } } } } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/task/PutTask.java ================================================ package com.alibaba.datax.plugin.writer.obhbasewriter.task; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.writer.obhbasewriter.ColumnType; import com.alibaba.datax.plugin.writer.obhbasewriter.Config; import com.alibaba.datax.plugin.writer.obhbasewriter.ConfigKey; import com.alibaba.datax.plugin.writer.obhbasewriter.Hbase094xWriterErrorCode; import com.alibaba.datax.plugin.writer.obhbasewriter.ObHTableInfo; import com.alibaba.datax.plugin.writer.obhbasewriter.ext.ObHbaseTableHolder; import com.alibaba.datax.plugin.writer.obhbasewriter.ext.ServerConnectInfo; import com.alipay.oceanbase.hbase.constants.OHConstants; import com.alipay.oceanbase.rpc.property.Property; import com.google.common.base.Stopwatch; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Date; import java.util.List; import java.util.Map; import java.util.Objects; import java.util.Queue; import java.util.concurrent.TimeUnit; import org.apache.commons.lang3.tuple.Triple; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.Put; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import static com.alibaba.datax.plugin.writer.obhbasewriter.ConfigKey.OBHBASE_HTABLE_CLIENT_WRITE_BUFFER; import static com.alibaba.datax.plugin.writer.obhbasewriter.ConfigKey.OBHBASE_HTABLE_PUT_WRITE_BUFFER_CHECK; import static com.alibaba.datax.plugin.writer.obhbasewriter.ConfigKey.TABLE_CLIENT_RPC_EXECUTE_TIMEOUT; import static com.alibaba.datax.plugin.writer.obhbasewriter.ConfigKey.WRITE_BUFFER_HIGH_MARK; import static com.alibaba.datax.plugin.writer.obhbasewriter.ConfigKey.WRITE_BUFFER_LOW_MARK; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.DEFAULT_HBASE_HTABLE_CLIENT_WRITE_BUFFER; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.DEFAULT_HBASE_HTABLE_PUT_WRITE_BUFFER_CHECK; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.DEFAULT_NETTY_BUFFER_HIGH_WATERMARK; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.DEFAULT_NETTY_BUFFER_LOW_WATERMARK; import static com.alibaba.datax.plugin.writer.obhbasewriter.Constant.DEFAULT_RPC_EXECUTE_TIMEOUT; import static com.alibaba.datax.plugin.writer.obhbasewriter.util.ObHbaseWriterUtils.getColumnByte; import static com.alibaba.datax.plugin.writer.obhbasewriter.util.ObHbaseWriterUtils.getRowkey; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_HTABLE_CLIENT_WRITE_BUFFER; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_HTABLE_PUT_WRITE_BUFFER_CHECK; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_DATABASE; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_FULL_USER_NAME; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_PARAM_URL; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_PASSWORD; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_SYS_USER_NAME; import static com.alipay.oceanbase.hbase.constants.OHConstants.HBASE_OCEANBASE_SYS_PASSWORD; public class PutTask implements Runnable { private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(PutTask.class); private static final Logger LOG = LoggerFactory.getLogger(PutTask.class); private ObHBaseWriteTask writerTask; private ObHBaseWriteTask.ConcurrentTableWriter writer; private long totalCost = 0; private long putCount = 0; private boolean isStop; private ObHTableInfo obHTableInfo; private final Configuration versionColumn; // 失败重试次数 private final int failTryCount; private String parentThreadName; private Queue> queue; private Configuration config; private ServerConnectInfo connInfo; private ObHbaseTableHolder tableHolder; private final SimpleDateFormat df_second = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); private final SimpleDateFormat df_ms = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss SSS"); public PutTask(String parentThreadName, Queue> recordsQueue, Configuration config, ServerConnectInfo connectInfo, ObHTableInfo obHTableInfo, ObHBaseWriteTask writerTask) { this.parentThreadName = parentThreadName; this.queue = recordsQueue; this.config = config; this.connInfo = connectInfo; this.obHTableInfo = obHTableInfo; this.writerTask = writerTask; this.versionColumn = config.getConfiguration(ConfigKey.VERSION_COLUMN); this.failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); this.isStop = false; initTableHolder(); } private void initTableHolder() { try { org.apache.hadoop.conf.Configuration c = new org.apache.hadoop.conf.Configuration(); c.set(HBASE_OCEANBASE_FULL_USER_NAME, writerTask.fullUserName); c.set(HBASE_OCEANBASE_PASSWORD, this.connInfo.password); c.set(HBASE_OCEANBASE_DATABASE, writerTask.dbName); // obkv-table-client is needed the code below if (writerTask.usdOdpMode) { c.setBoolean(OHConstants.HBASE_OCEANBASE_ODP_MODE, true); c.set(OHConstants.HBASE_OCEANBASE_ODP_ADDR, connInfo.host); c.set(OHConstants.HBASE_OCEANBASE_ODP_PORT, connInfo.port); LOG.info("sysUser and sysPassword is empty, build HTABLE in odp mode."); } else { c.set(HBASE_OCEANBASE_PARAM_URL, writerTask.configUrl); c.set(HBASE_OCEANBASE_SYS_USER_NAME, this.connInfo.sysUser); c.set(HBASE_OCEANBASE_SYS_PASSWORD, this.connInfo.sysPass); LOG.info("sysUser and sysPassword is not empty, build HTABLE in sys mode."); } c.set(HBASE_HTABLE_PUT_WRITE_BUFFER_CHECK, config.getString(OBHBASE_HTABLE_PUT_WRITE_BUFFER_CHECK, DEFAULT_HBASE_HTABLE_PUT_WRITE_BUFFER_CHECK)); c.set(HBASE_HTABLE_CLIENT_WRITE_BUFFER, config.getString(OBHBASE_HTABLE_CLIENT_WRITE_BUFFER, DEFAULT_HBASE_HTABLE_CLIENT_WRITE_BUFFER)); c.set(Property.RS_LIST_ACQUIRE_CONNECT_TIMEOUT.getKey(), "500"); c.set(Property.RS_LIST_ACQUIRE_READ_TIMEOUT.getKey(), "5000"); c.set(Property.RPC_EXECUTE_TIMEOUT.getKey(), config.getString(TABLE_CLIENT_RPC_EXECUTE_TIMEOUT, DEFAULT_RPC_EXECUTE_TIMEOUT)); c.set(Property.NETTY_BUFFER_LOW_WATERMARK.getKey(), config.getString(WRITE_BUFFER_LOW_MARK, DEFAULT_NETTY_BUFFER_LOW_WATERMARK)); c.set(Property.NETTY_BUFFER_HIGH_WATERMARK.getKey(), config.getString(WRITE_BUFFER_HIGH_MARK, DEFAULT_NETTY_BUFFER_HIGH_WATERMARK)); this.tableHolder = new ObHbaseTableHolder(c, obHTableInfo.getTableName()); } catch (Exception e) { LOG.error("init table holder failed, reason: {}", e.getMessage()); throw new IllegalStateException(e); } } private void batchWrite(final List buffer) { HTableInterface ohTable = null; Stopwatch stopwatch = Stopwatch.createStarted(); try { ohTable = this.tableHolder.getOhTable(); List puts = buildBatchPutList(buffer); ohTable.put(puts); } catch (Exception e) { if (Objects.isNull(ohTable)) { LOG.error("build obHTable: {} failed. reason: {}", obHTableInfo.getTableName(), e.getMessage()); throw DataXException.asDataXException(Hbase094xWriterErrorCode.GET_HBASE_TABLE_ERROR, Hbase094xWriterErrorCode.GET_HBASE_TABLE_ERROR.getDescription()); } // LOG.error("hbase batch error: " + e); // 出错了之后对该出错的batch逐条重试 for (Record record : buffer) { writeOneRecord(ohTable, record); } } finally { this.writer.increFinishCount(); putCount++; totalCost += stopwatch.elapsed(TimeUnit.MILLISECONDS); try { if (!Objects.isNull(ohTable)) { ohTable.close(); } } catch (Exception e) { LOG.warn("error in closing htable: {}. Reason: {}", obHTableInfo.getFullHbaseTableName(), e.getMessage()); } } } private void writeOneRecord(HTableInterface ohTable, Record record) { int retryCount = 0; while (retryCount < this.failTryCount) { try { byte[] rowkey = getRowkey(record, obHTableInfo); Put put = new Put(rowkey); // row key boolean hasValidValue = buildPut(put, record); if (hasValidValue) { ohTable.put(put); } break; } catch (Exception e) { retryCount++; LOG.error("error in writing: " + e.getMessage() + ", retry count: " + retryCount); if (retryCount == this.failTryCount) { LOG.warn("ERROR : record {}", record); this.writerTask.collectDirtyRecord(record, e); } } } } private List buildBatchPutList(List buffer) { List puts = new ArrayList<>(); for (Record record : buffer) { byte[] rowkey = getRowkey(record, obHTableInfo); Put put = new org.apache.hadoop.hbase.client.Put(rowkey); // row key boolean hasValidValue = buildPut(put, record); if (hasValidValue) { puts.add(put); } } return puts; } private boolean buildPut(Put put, Record record) { boolean hasValidValue = false; long timestamp = buildTimestamp(record); for (Map.Entry> columnInfo : obHTableInfo.getIndexColumnInfoMap().entrySet()) { Integer index = columnInfo.getKey(); if (index >= record.getColumnNumber()) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("normaltask.2", record.getColumnNumber(), index)); } ColumnType columnType = columnInfo.getValue().getRight(); String familyName = columnInfo.getValue().getLeft(); String columnName = columnInfo.getValue().getMiddle(); byte[] value = getColumnByte(columnType, record.getColumn(index), obHTableInfo); if (value != null) { hasValidValue = true; if (timestamp == -1) { put.add(familyName.getBytes(), // family columnName.getBytes(), // Q value); // V } else { put.add(familyName.getBytes(), // family columnName.getBytes(), // Q timestamp, // timestamp/version value); // V } } } return hasValidValue; } private long buildTimestamp(Record record) { if (versionColumn == null) { return -1; } int index = versionColumn.getInt(ConfigKey.INDEX); long timestamp; if (index == -1) { // user specified the constant as timestamp timestamp = versionColumn.getLong(ConfigKey.VALUE); if (timestamp < 0) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_VERSION_ERROR, MESSAGE_SOURCE.message("normaltask.4")); } } else { // 指定列作为版本,long/doubleColumn直接record.aslong, 其它类型尝试用yyyy-MM-dd HH:mm:ss, // yyyy-MM-dd HH:mm:ss SSS去format if (index >= record.getColumnNumber()) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_VERSION_ERROR, MESSAGE_SOURCE.message("normaltask.5", record.getColumnNumber(), index)); } if (record.getColumn(index).getRawData() == null) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_VERSION_ERROR, MESSAGE_SOURCE.message("normaltask.6")); } if (record.getColumn(index) instanceof LongColumn || record.getColumn(index) instanceof DoubleColumn) { timestamp = record.getColumn(index).asLong(); } else { Date date; try { date = df_ms.parse(record.getColumn(index).asString()); } catch (ParseException e) { try { date = df_second.parse(record.getColumn(index).asString()); } catch (ParseException e1) { LOG.info(MESSAGE_SOURCE.message("normaltask.7", index)); throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_VERSION_ERROR, e1); } } timestamp = date.getTime(); } } return timestamp; } public void setStop() {isStop = true;} public long getTotalCost() {return totalCost;} public long getPutCount() {return putCount;} public void destroy() { tableHolder.destroy(); } void setWriterTask(ObHBaseWriteTask writerTask) { this.writerTask = writerTask; } void setWriter(ObHBaseWriteTask.ConcurrentTableWriter writer) { this.writer = writer; } @Override public void run() { String currentThreadName = String.format("%s-putTask-%d", parentThreadName, Thread.currentThread().getId()); Thread.currentThread().setName(currentThreadName); LOG.debug("Task {} start to execute...", currentThreadName); int sleepTimes = 0; while (!isStop) { try { List records = queue.poll(); if (null != records) { batchWrite(records); } else if (writerTask.isFinished()) { writerTask.singalTaskFinish(); LOG.debug("not more task, thread exist ..."); break; } else { TimeUnit.MILLISECONDS.sleep(5); sleepTimes++; } } catch (InterruptedException e) { LOG.debug("TableWriter is interrupt"); } catch (Exception e) { LOG.warn("ERROR UNEXPECTED {}", e); } } LOG.debug("Thread exist..."); LOG.debug("sleep {} times, total sleep time: {}", sleepTimes, sleepTimes * 5); } } ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/util/LocalStrings.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/util/LocalStrings_en_US.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/util/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/util/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/util/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/util/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: obhbasewriter/src/main/java/com/alibaba/datax/plugin/writer/obhbasewriter/util/ObHbaseWriterUtils.java ================================================ /* * Copyright (c) 2021 OceanBase ob-loader-dumper is licensed under Mulan PSL v2. You can use this software according to * the terms and conditions of the Mulan PSL v2. You may obtain a copy of Mulan PSL v2 at: * * http://license.coscl.org.cn/MulanPSL2 * * THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING * BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE. See the Mulan PSL v2 for more * details. */ package com.alibaba.datax.plugin.writer.obhbasewriter.util; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.writer.obhbasewriter.ColumnType; import com.alibaba.datax.plugin.writer.obhbasewriter.Hbase094xWriterErrorCode; import com.alibaba.datax.plugin.writer.obhbasewriter.ObHTableInfo; import com.alibaba.datax.plugin.writer.obhbasewriter.task.PutTask; import java.nio.charset.Charset; import org.apache.commons.lang3.tuple.Triple; import org.apache.hadoop.hbase.HConstants; import org.apache.hadoop.hbase.util.Bytes; /** * @author cjyyz * @date 2023/03/23 * @since */ public class ObHbaseWriterUtils { private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(PutTask.class); public static byte[] getRowkey(Record record, ObHTableInfo obHTableInfo) { byte[] rowkeyBuffer = {}; for (Triple rowKeyElement : obHTableInfo.getRowKeyElementList()) { Integer index = rowKeyElement.getLeft(); ColumnType columnType = rowKeyElement.getRight(); if (index == -1) { String value = rowKeyElement.getMiddle(); rowkeyBuffer = Bytes.add(rowkeyBuffer, getValueByte(columnType, value, obHTableInfo.getEncoding())); } else { if (index >= record.getColumnNumber()) { throw DataXException.asDataXException(Hbase094xWriterErrorCode.CONSTRUCT_ROWKEY_ERROR, MESSAGE_SOURCE.message("normaltask.3", record.getColumnNumber(), index)); } byte[] value = getColumnByte(columnType, record.getColumn(index), obHTableInfo); rowkeyBuffer = Bytes.add(rowkeyBuffer, value); } } return rowkeyBuffer; } public static byte[] getColumnByte(ColumnType columnType, Column column, ObHTableInfo obHTableInfo) { byte[] bytes; if (column.getRawData() != null && !(columnType == ColumnType.STRING && column.asString().equals("null"))) { switch (columnType) { case INT: bytes = Bytes.toBytes(column.asLong().intValue()); break; case LONG: bytes = Bytes.toBytes(column.asLong()); break; case DOUBLE: bytes = Bytes.toBytes(column.asDouble()); break; case FLOAT: bytes = Bytes.toBytes(column.asDouble().floatValue()); break; case SHORT: bytes = Bytes.toBytes(column.asLong().shortValue()); break; case BOOLEAN: bytes = Bytes.toBytes(column.asBoolean()); break; case STRING: bytes = getValueByte(columnType, column.asString(), obHTableInfo.getEncoding()); break; case BINARY: bytes = Bytes.toBytesBinary(column.asString()); break; default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("hbaseabstracttask.2", columnType)); } } else { switch (obHTableInfo.getNullModeType()) { case Skip: bytes = null; break; case Empty: bytes = HConstants.EMPTY_BYTE_ARRAY; break; default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("hbaseabstracttask.3")); } } return bytes; } /** * @param columnType * @param value * @return byte[] */ private static byte[] getValueByte(ColumnType columnType, String value, String encoding) { byte[] bytes; if (value != null) { switch (columnType) { case INT: bytes = Bytes.toBytes(Integer.parseInt(value)); break; case LONG: bytes = Bytes.toBytes(Long.parseLong(value)); break; case DOUBLE: bytes = Bytes.toBytes(Double.parseDouble(value)); break; case FLOAT: bytes = Bytes.toBytes(Float.parseFloat(value)); break; case SHORT: bytes = Bytes.toBytes(Short.parseShort(value)); break; case BOOLEAN: bytes = Bytes.toBytes(Boolean.parseBoolean(value)); break; case STRING: bytes = value.getBytes(Charset.forName(encoding)); break; default: throw DataXException.asDataXException(Hbase094xWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("hbaseabstracttask.4", columnType)); } } else { bytes = HConstants.EMPTY_BYTE_ARRAY; } return bytes; } } ================================================ FILE: obhbasewriter/src/main/resources/plugin.json ================================================ { "name": "obhbasewriter", "class": "com.alibaba.datax.plugin.writer.obhbasewriter.ObHbaseWriter", "description": "适用于: 生产环境. 原理: TODO", "developer": "alibaba" } ================================================ FILE: oceanbasev10reader/doc/oceanbasev10reader.md ================================================ ## 1 快速介绍 OceanbaseV10Reader插件实现了从Oceanbase V1.0读取数据。在底层实现上,该读取插件通过java client(jdbc)连接远程Oceanbase 1.0数据库,并执行相应的sql语句将数据从库中SELECT出来。 注意,oceanbasev10reader只适用于ob1.0及以后版本的reader。 ## 2 实现原理 简而言之,Oceanbasev10Reader通过java client连接器连接到远程的Oceanbase数据库,并根据用户配置的信息生成查询SELECT SQL语句,然后发送到远程Oceanbase v1.0及更高版本数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。
对于用户配置Table、Column、Where的信息,OceanbaseV10Reader将其拼接为SQL语句发送到Oceanbase 数据库;对于用户配置querySql信息,Oceanbasev10Reader直接将其发送到Oceanbase数据库。 ## 3 功能说明 ### 3.1 配置样例 - 配置一个从Oceanbase数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte": 1048576 } //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { "name": "oceanbasev10reader", "parameter": { "where": "", "timeout": 5, "readBatchSize": 50000, "column": [ "id","name" ], "connection": [ { "jdbcUrl": ["||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/dbName"], "table": [ "table" ] } ] } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print":true, } } } ] } } ``` ``` { "job": { "setting": { "speed": { "channel": 3 }, "errorLimit": { "record": 0 } }, "content": [ { "reader": { "name": "oceanbasev10reader", "parameter": { "where": "", "timeout": 5, "fetchSize": 500, "column": [ "id", "name" ], "splitPk": "pk", "connection": [ { "jdbcUrl": ["||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/dbName"], "table": [ "table" ] } ], "username":"xxx", "password":"xxx" } }, "writer": { "name": "streamwriter", "parameter": { "print": true } } } ] } } ``` - 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "channel": 3 }, "content": [ { "reader": { "name": "oceanbasev10reader", "parameter": { "timeout": 5, "fetchSize": 500, "splitPk": "pk", "connection": [ { "jdbcUrl": ["||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/dbName"], "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;" ] } ], "username":"xxx", "password":"xxx" } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 - **jdbcUrl** - 描述:连接ob使用的jdbc url,支持两种格式: - ||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/db - 此格式下username仅填写用户名本身,无需三段式写法 - jdbc:mysql://ip:port/db - 此格式下username需要三段式写法 - 必选:是 - 默认值:无 - **table** - 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,OceanbaseReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。 - 必选:是 - 默认值:无 - **column** - 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。 - 支持列裁剪,即列可以挑选部分列进行导出。 ``` 支持列换序,即列可以不按照表schema信息进行导出,同时支持通配符*,在使用之前需仔细核对列信息。 ``` - 必选:是 - 默认值:无 - **where** - 描述:筛选条件,OceanbaseReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。这里gmt_create不可以是索引字段,也不可以是联合索引的第一个字段

where条件可以有效地进行业务增量同步。如果不填写where语句,包括不提供where的key或者value,DataX均视作同步全量数据 - 必选:否 - 默认值:无 - **splitPk** - 描述:OBReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 - 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 - 目前splitPk仅支持int数据切分,`不支持其他类型`。如果用户指定其他非支持类型将报错。
splitPk如果不填写,将视作用户不对单表进行切分,OBReader使用单通道同步全量数据。 - 必选:否 - 默认值:空 - **querySql** - 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选 - `当用户配置querySql时,OceanbaseReader直接忽略table、column、where条件的配置`,querySql优先级大于table、column、where选项。 - 必选:否 - 默认值:无 - **timeout** - 描述:sql执行的超时时间 单位分钟 - 必选:否 - 默认值:5 - **username** - 描述:访问oceanbase的用户名 - 必选:是 - 默认值:无 - ** password** - 描述:访问oceanbase的密码 - 必选:是 - 默认值:无 - **readByPartition** - 描述:对分区表是否按照分区切分任务 - 必选:否 - 默认值:fasle - **readBatchSize** - 描述:一次读取的行数,如果遇到内存不足的情况,可将该值调小 - 必选:否 - 默认值:10000 ### 3.3 类 ### 3.3 类型转换 下面列出OceanbaseReader针对Oceanbase类型转换列表: | DataX 内部类型 | Oceanbase 数据类型 | | --- | --- | | Long | int | | Double | numeric | | String | varchar | | Date | timestamp | | Boolean | bool | ## 4性能测试 ### 4.1 测试报告 影响速度的主要原因在于channel数量,channel值受限于分表的数量或者单个表的数据分片数量
单表导出时查看分片数量的办法,idb执行`select/*+query_timeout(150000000)*/ s.tablet_count from __all_table t,__table_stat s where t.table_id = s.table_id and t.table_name = '表名'` | 通道数 | DataX速度(Rec/s) | DataX流量(MB/s) | | --- | --- | --- | | 1 | 15001 | 4.7 | | 2 | 28169 | 11.66 | | 3 | 37076 | 14.77 | | 4 | 55862 | 17.60 | | 5 | 70860 | 22.31 | ## 5常见问题 ### 4.1 oracle模式下报错Invalid fatch size ``` Caused by: java.sql.SQLSyntaxErrorException: (conn=2498) invalid fetch size. in Oracle mode, extendOracleResultSetClass is ineffective if useOraclePrepareExecute is set to true or usePieceData is set to true at com.oceanbase.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:110) at com.oceanbase.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:211) at com.oceanbase.jdbc.OceanBaseStatement.setFetchSize(OceanBaseStatement.java:1599) at com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.ReaderTask.doRead(ReaderTask.java:270) ... 5 more ``` 该错误常发生更换了高版本的oceanbase-client.jar驱动,高版本的驱动未来提高效率,增加了oracle预处理语句行为。这个机制和setFetchSize冲突。 #### 解决方案 在jdbcUrl中配置extendOracleResultSetClass=true可解决这个冲突。 ================================================ FILE: oceanbasev10reader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 oceanbasev10reader com.alibaba.datax 0.0.1-SNAPSHOT jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java 8.0.28 log4j log4j 1.2.16 junit junit 4.11 test src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: oceanbasev10reader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/oceanbasev10reader target/ oceanbasev10reader-0.0.1-SNAPSHOT.jar plugin/reader/oceanbasev10reader src/main/libs/ *.jar plugin/reader/oceanbasev10reader/libs false plugin/reader/oceanbasev10reader/libs runtime ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/Config.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader; public interface Config { // queryTimeoutSecond String QUERY_TIMEOUT_SECOND = "memstoreCheckIntervalSecond"; int DEFAULT_QUERY_TIMEOUT_SECOND = 60 * 60 * 48;// 2天 // readBatchSize String READ_BATCH_SIZE = "readBatchSize"; int DEFAULT_READ_BATCH_SIZE = 100000;// 10万 String RETRY_LIMIT = "retryLimit"; int DEFAULT_RETRY_LIMIT = 10; } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/OceanBaseReader.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader; import java.sql.Connection; import java.util.List; import com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.ObReaderKey; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.ReaderJob; import com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.ReaderTask; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; public class OceanBaseReader extends Reader { public static class Job extends Reader.Job { private Configuration originalConfig = null; private ReaderJob readerJob; private static final Logger LOG = LoggerFactory.getLogger(Task.class); @Override public void init() { this.originalConfig = super.getPluginJobConf(); Integer userConfigedFetchSize = this.originalConfig.getInt(Constant.FETCH_SIZE); if (userConfigedFetchSize != null) { LOG.warn("The [fetchSize] is not recognized, please use readBatchSize instead."); } this.originalConfig.set(Constant.FETCH_SIZE, Integer.MIN_VALUE); setDatabaseType(originalConfig); this.readerJob = new ReaderJob(); this.readerJob.init(this.originalConfig); } @Override public void prepare() { //ObReaderUtils.DATABASE_TYPE获取当前数据库的语法模式 } @Override public void preCheck() { init(); this.readerJob.preCheck(this.originalConfig, ObReaderUtils.databaseType); } @Override public List split(int adviceNumber) { String splitPk = originalConfig.getString(Key.SPLIT_PK); List quotedColumns = originalConfig.getList(Key.COLUMN_LIST, String.class); if (splitPk != null && splitPk.length() > 0 && quotedColumns != null) { String escapeChar = ObReaderUtils.isOracleMode(originalConfig.getString(ObReaderKey.OB_COMPATIBILITY_MODE)) ? "\"" : "`"; if (!splitPk.startsWith(escapeChar) && !splitPk.endsWith(escapeChar)) { splitPk = escapeChar + splitPk + escapeChar; } for (String column : quotedColumns) { if (column.equals(splitPk)) { LOG.info("splitPk is an ob reserved keyword, set to {}", splitPk); originalConfig.set(Key.SPLIT_PK, splitPk); } } } return this.readerJob.split(this.originalConfig, adviceNumber); } @Override public void post() { this.readerJob.post(this.originalConfig); } @Override public void destroy() { this.readerJob.destroy(this.originalConfig); } private void setDatabaseType(Configuration config) { String username = config.getString(Key.USERNAME); String password = config.getString(Key.PASSWORD); List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); Configuration connConf = Configuration.from(conns.get(0).toString()); List jdbcUrls = connConf.getList(Key.JDBC_URL, String.class); String jdbcUrl = jdbcUrls.get(0); if (jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); if (ss.length != 3) { LOG.warn("unrecognized jdbc url: " + jdbcUrl); return; } username = ss[1].trim() + ":" + username; jdbcUrl = ss[2]; } // Use ob-client to get compatible mode. try { String obJdbcUrl = jdbcUrl.replace("jdbc:mysql:", "jdbc:oceanbase:"); Connection conn = DBUtil.getConnection(DataBaseType.OceanBase, obJdbcUrl, username, password); String compatibleMode = ObReaderUtils.getCompatibleMode(conn); config.set(ObReaderKey.OB_COMPATIBILITY_MODE, compatibleMode); if (ObReaderUtils.isOracleMode(compatibleMode)) { ObReaderUtils.compatibleMode = ObReaderUtils.OB_COMPATIBLE_MODE_ORACLE; } } catch (Exception e) { LOG.warn("error in get compatible mode, using mysql as default: " + e.getMessage()); } } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private ReaderTask commonRdbmsReaderTask; private static final Logger LOG = LoggerFactory.getLogger(Task.class); @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderTask = new ReaderTask(super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderTask.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(Constant.FETCH_SIZE); this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderTask.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); } } } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/Constant.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.ext; /** * @author johnrobbet */ public class Constant { public static String WEAK_READ_QUERY_SQL_TEMPLATE_WITHOUT_WHERE = "select /*+read_consistency(weak)*/ %s from %s "; public static String WEAK_READ_QUERY_SQL_TEMPLATE = "select /*+read_consistency(weak)*/ %s from %s where (%s)"; } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ObReaderKey.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.ext; /** * @author johnrobbet */ public class ObReaderKey { public final static String READ_BY_PARTITION = "readByPartition"; public final static String PARTITION_NAME = "partitionName"; public final static String PARTITION_TYPE = "partitionType"; public final static String OB_COMPATIBILITY_MODE = "obCompatibilityMode"; } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderJob.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.ext; import java.util.List; import com.alibaba.datax.common.constant.CommonConstant; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.reader.oceanbasev10reader.OceanBaseReader; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.PartitionSplitUtil; import com.alibaba.fastjson2.JSONObject; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class ReaderJob extends CommonRdbmsReader.Job { private Logger LOG = LoggerFactory.getLogger(OceanBaseReader.Task.class); public ReaderJob() { super(ObReaderUtils.databaseType); } @Override public void init(Configuration originalConfig) { //将config中的column和table中的关键字进行转义 List columns = originalConfig.getList(Key.COLUMN, String.class); ObReaderUtils.escapeDatabaseKeyword(columns); originalConfig.set(Key.COLUMN, columns); List conns = originalConfig.getList(Constant.CONN_MARK, JSONObject.class); for (int i = 0; i < conns.size(); i++) { JSONObject conn = conns.get(i); Configuration connConfig = Configuration.from(conn.toString()); List tables = connConfig.getList(Key.TABLE, String.class); // tables will be null when querySql is configured if (tables != null) { ObReaderUtils.escapeDatabaseKeyword(tables); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.TABLE), tables); } } super.init(originalConfig); } @Override public List split(Configuration originalConfig, int adviceNumber) { List list; // readByPartition is lower priority than splitPk. // and readByPartition only works in table mode. if (!isSplitPkValid(originalConfig) && originalConfig.getBool(Constant.IS_TABLE_MODE) && originalConfig.getBool(ObReaderKey.READ_BY_PARTITION, false)) { LOG.info("try to split reader job by partition."); list = PartitionSplitUtil.splitByPartition(originalConfig); } else { LOG.info("try to split reader job by splitPk."); list = super.split(originalConfig, adviceNumber); } for (Configuration config : list) { String jdbcUrl = config.getString(Key.JDBC_URL); String obRegionName = getObRegionName(jdbcUrl); config.set(CommonConstant.LOAD_BALANCE_RESOURCE_MARK, obRegionName); } return list; } private boolean isSplitPkValid(Configuration originalConfig) { String splitPk = originalConfig.getString(Key.SPLIT_PK); return splitPk != null && splitPk.trim().length() > 0; } private String getObRegionName(String jdbcUrl) { final String obJdbcDelimiter = com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING; if (jdbcUrl.startsWith(obJdbcDelimiter)) { String[] ss = jdbcUrl.split(obJdbcDelimiter); int elementCount = 2; if (ss.length >= elementCount) { String tenant = ss[1].trim(); String[] sss = tenant.split(":"); return sss[0]; } } return null; } } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/ext/ReaderTask.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.ext; import com.alibaba.datax.common.element.BoolColumn; import com.alibaba.datax.common.element.BytesColumn; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.statistics.PerfRecord; import com.alibaba.datax.common.statistics.PerfTrace; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.datax.plugin.reader.oceanbasev10reader.Config; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.ObReaderUtils; import com.alibaba.datax.plugin.reader.oceanbasev10reader.util.TaskContext; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.ArrayList; import java.util.List; public class ReaderTask extends CommonRdbmsReader.Task { private static final Logger LOG = LoggerFactory.getLogger(ReaderTask.class); private int taskGroupId = -1; private int taskId = -1; private String username; private String password; private String jdbcUrl; private String mandatoryEncoding; private int queryTimeoutSeconds;// 查询超时 默认48小时 private int readBatchSize; private int retryLimit = 0; private String compatibleMode = ObReaderUtils.OB_COMPATIBLE_MODE_MYSQL; private static final boolean IS_DEBUG = LOG.isDebugEnabled(); private boolean reuseConn = false; public ReaderTask(int taskGroupId, int taskId) { super(ObReaderUtils.databaseType, taskGroupId, taskId); this.taskGroupId = taskGroupId; this.taskId = taskId; } @Override public void init(Configuration readerSliceConfig) { /* for database connection */ username = readerSliceConfig.getString(Key.USERNAME); password = readerSliceConfig.getString(Key.PASSWORD); jdbcUrl = readerSliceConfig.getString(Key.JDBC_URL); queryTimeoutSeconds = readerSliceConfig.getInt(Config.QUERY_TIMEOUT_SECOND, Config.DEFAULT_QUERY_TIMEOUT_SECOND); // ob10的处理 if (jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); if (ss.length == 3) { LOG.info("this is ob1_0 jdbc url."); username = ss[1].trim() + ":" + username; jdbcUrl = ss[2]; } } jdbcUrl = jdbcUrl.replace("jdbc:mysql:", "jdbc:oceanbase:") + "&socketTimeout=1800000&connectTimeout=60000"; //socketTimeout 半个小时 if (ObReaderUtils.compatibleMode.equals(ObReaderUtils.OB_COMPATIBLE_MODE_ORACLE)) { compatibleMode = ObReaderUtils.OB_COMPATIBLE_MODE_ORACLE; } LOG.info("this is ob1_0 jdbc url. user=" + username + " :url=" + jdbcUrl); mandatoryEncoding = readerSliceConfig.getString(Key.MANDATORY_ENCODING, ""); retryLimit = readerSliceConfig.getInt(Config.RETRY_LIMIT, Config.DEFAULT_RETRY_LIMIT); LOG.info("retryLimit: " + retryLimit); } private void buildSavePoint(TaskContext context) { if (!ObReaderUtils.isUserSavePointValid(context)) { LOG.info("user save point is not valid, set to null."); context.setUserSavePoint(null); } } /** * 如果isTableMode && table有PK *

* 则支持断点续读 (若pk不在原始的columns中,则追加到尾部,但不传给下游) *

* 否则,则使用旧模式 */ @Override public void startRead(Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector, int fetchSize) { String querySql = readerSliceConfig.getString(Key.QUERY_SQL); String table = readerSliceConfig.getString(Key.TABLE); PerfTrace.getInstance().addTaskDetails(taskId, table + "," + jdbcUrl); List columns = readerSliceConfig.getList(Key.COLUMN_LIST, String.class); String where = readerSliceConfig.getString(Key.WHERE); boolean weakRead = readerSliceConfig.getBool(Key.WEAK_READ, true); // default true, using weak read String userSavePoint = readerSliceConfig.getString(Key.SAVE_POINT, null); reuseConn = readerSliceConfig.getBool(Key.REUSE_CONN, false); String partitionName = readerSliceConfig.getString(Key.PARTITION_NAME, null); // 从配置文件中取readBatchSize,若无则用默认值 readBatchSize = readerSliceConfig.getInt(Config.READ_BATCH_SIZE, Config.DEFAULT_READ_BATCH_SIZE); // 不能少于1万 if (readBatchSize < 10000) { readBatchSize = 10000; } TaskContext context = new TaskContext(table, columns, where, fetchSize); context.setQuerySql(querySql); context.setWeakRead(weakRead); context.setCompatibleMode(compatibleMode); if (partitionName != null) { context.setPartitionName(partitionName); } // Add the user save point into the context context.setUserSavePoint(userSavePoint); PerfRecord allPerf = new PerfRecord(taskGroupId, taskId, PerfRecord.PHASE.RESULT_NEXT_ALL); allPerf.start(); boolean isTableMode = readerSliceConfig.getBool(Constant.IS_TABLE_MODE); try { startRead0(isTableMode, context, recordSender, taskPluginCollector); } finally { ObReaderUtils.close(null, null, context.getConn()); } allPerf.end(context.getCost()); // 目前大盘是依赖这个打印,而之前这个Finish read record是包含了sql查询和result next的全部时间 LOG.info("finished read record by Sql: [{}\n] {}.", context.getQuerySql(), jdbcUrl); } private void startRead0(boolean isTableMode, TaskContext context, RecordSender recordSender, TaskPluginCollector taskPluginCollector) { // 不是table模式 直接使用原来的做法 if (!isTableMode) { doRead(recordSender, taskPluginCollector, context); return; } // check primary key index Connection conn = DBUtil.getConnection(ObReaderUtils.databaseType, jdbcUrl, username, password); ObReaderUtils.initConn4Reader(conn, queryTimeoutSeconds); context.setConn(conn); try { ObReaderUtils.initIndex(conn, context); ObReaderUtils.matchPkIndexs(conn, context); } catch (Throwable e) { LOG.warn("fetch PkIndexs fail,table=" + context.getTable(), e); } // 如果不是table 且 pk不存在 则仍然使用原来的做法 if (context.getPkIndexs() == null) { doRead(recordSender, taskPluginCollector, context); return; } // setup the user defined save point buildSavePoint(context); // 从这里开始就是 断点续读功能 // while(true) { // 正常读 (需 order by pk asc) // 如果遇到失败,分两种情况: // a)已读出记录,则开始走增量读逻辑 // b)未读出记录,则走正常读逻辑(仍然需要order by pk asc) // 正常结束 则 break // } context.setReadBatchSize(readBatchSize); String getFirstQuerySql = ObReaderUtils.buildFirstQuerySql(context); String appendQuerySql = ObReaderUtils.buildAppendQuerySql(conn, context); LOG.warn("start table scan key : {}", context.getIndexName() == null ? "primary" : context.getIndexName()); context.setQuerySql(getFirstQuerySql); boolean firstQuery = true; // 原来打算firstQuery时 limit 1 减少 // 后来经过对比发现其实是多余的,因为: // 1.假如走gmt_modified辅助索引,则直接索引扫描 不需要topN的order by // 2.假如不走辅助索引,而是pk table scan,则减少排序规模并没有好处,因为下一次仍然要排序 // 减少这个多余的优化tip 可以让代码更易读 int retryCount = 0; while (true) { try { boolean finish = doRead(recordSender, taskPluginCollector, context); if (finish) { break; } } catch (Throwable e) { if (retryLimit == ++retryCount) { throw RdbmsException.asQueryException(ObReaderUtils.databaseType, new Exception(e), context.getQuerySql(), context.getTable(), username); } LOG.error("read fail, retry count " + retryCount + ", sleep 60 second, save point:" + context.getSavePoint() + ", error: " + e.getMessage()); ObReaderUtils.sleep(60000); // sleep 10s } // 假如原来的查询有查出数据,则改成增量查询 if (firstQuery && context.getPkIndexs() != null && context.getSavePoint() != null) { context.setQuerySql(appendQuerySql); firstQuery = false; } } DBUtil.closeDBResources(null, context.getConn()); } private boolean isConnectionAlive(Connection conn) { if (conn == null) { return false; } Statement stmt = null; ResultSet rs = null; String sql = "select 1" + (compatibleMode == ObReaderUtils.OB_COMPATIBLE_MODE_ORACLE ? " from dual" : ""); try { stmt = conn.createStatement(); rs = stmt.executeQuery(sql); rs.next(); } catch (Exception ex) { LOG.info("connection is not alive: " + ex.getMessage()); return false; } finally { DBUtil.closeDBResources(rs, stmt, null); } return true; } private boolean doRead(RecordSender recordSender, TaskPluginCollector taskPluginCollector, TaskContext context) { LOG.info("exe sql: {}", context.getQuerySql()); Connection conn = context.getConn(); if (reuseConn && isConnectionAlive(conn)) { LOG.info("connection is alive, will reuse this connection."); } else { LOG.info("Create new connection for reader."); conn = DBUtil.getConnection(ObReaderUtils.databaseType, jdbcUrl, username, password); ObReaderUtils.initConn4Reader(conn, queryTimeoutSeconds); context.setConn(conn); } PreparedStatement ps = null; ResultSet rs = null; PerfRecord perfRecord = new PerfRecord(taskGroupId, taskId, PerfRecord.PHASE.SQL_QUERY); perfRecord.start(); try { ps = conn.prepareStatement(context.getQuerySql(), ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); if (context.getPkIndexs() != null && context.getSavePoint() != null) { Record savePoint = context.getSavePoint(); List point = ObReaderUtils.buildPoint(savePoint, context.getPkIndexs()); ObReaderUtils.binding(ps, point); if (LOG.isWarnEnabled()) { List pointForLog = new ArrayList(); for (Column c : point) { pointForLog.add(c.asString()); } LOG.warn("{} save point : {}", context.getTable(), StringUtils.join(pointForLog, ',')); } } // 打开流式接口 ps.setFetchSize(context.getFetchSize()); rs = ps.executeQuery(); ResultSetMetaData metaData = rs.getMetaData(); int columnNumber = metaData.getColumnCount(); long lastTime = System.nanoTime(); int count = 0; for (; rs.next(); count++) { context.addCost(System.nanoTime() - lastTime); Record row = buildRecord(recordSender, rs, metaData, columnNumber, mandatoryEncoding, taskPluginCollector); // // 如果第一个record重复了,则不需要发送 // if (count == 0 && // ObReaderUtils.isPkEquals(context.getSavePoint(), row, // context.getPkIndexs())) { // continue; // } // 如果是querySql if (context.getTransferColumnNumber() == -1 || row.getColumnNumber() == context.getTransferColumnNumber()) { recordSender.sendToWriter(row); } else { Record newRow = recordSender.createRecord(); for (int i = 0; i < context.getTransferColumnNumber(); i++) { newRow.addColumn(row.getColumn(i)); } recordSender.sendToWriter(newRow); } context.setSavePoint(row); lastTime = System.nanoTime(); } LOG.info("end of sql: {}, " + count + "rows are read.", context.getQuerySql()); return context.getReadBatchSize() <= 0 || count < readBatchSize; } catch (Exception e) { ObReaderUtils.close(null, null, context.getConn()); context.setConn(null); LOG.error("reader data fail", e); throw RdbmsException.asQueryException(ObReaderUtils.databaseType, e, context.getQuerySql(), context.getTable(), username); } finally { perfRecord.end(); if (reuseConn) { ObReaderUtils.close(rs, ps, null); } else { ObReaderUtils.close(rs, ps, conn); } } } //重写方法支持array类型 protected Record buildRecord(RecordSender recordSender,ResultSet rs, ResultSetMetaData metaData, int columnNumber, String mandatoryEncoding, TaskPluginCollector taskPluginCollector) { Record record = recordSender.createRecord(); try { for (int i = 1; i <= columnNumber; i++) { switch (metaData.getColumnType(i)) { case Types.CHAR: case Types.NCHAR: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: String rawData; if(StringUtils.isBlank(mandatoryEncoding)){ rawData = rs.getString(i); }else{ rawData = new String((rs.getBytes(i) == null ? EMPTY_CHAR_ARRAY : rs.getBytes(i)), mandatoryEncoding); } record.addColumn(new StringColumn(rawData)); break; case Types.CLOB: case Types.NCLOB: record.addColumn(new StringColumn(rs.getString(i))); break; case Types.SMALLINT: case Types.TINYINT: case Types.INTEGER: case Types.BIGINT: record.addColumn(new LongColumn(rs.getString(i))); break; case Types.NUMERIC: case Types.DECIMAL: case Types.FLOAT: case Types.REAL: case Types.DOUBLE: record.addColumn(new DoubleColumn(rs.getString(i))); break; case Types.TIME: record.addColumn(new DateColumn(rs.getTime(i))); break; // for mysql bug, see http://bugs.mysql.com/bug.php?id=35115 case Types.DATE: if (metaData.getColumnTypeName(i).equalsIgnoreCase("year")) { record.addColumn(new LongColumn(rs.getInt(i))); } else { record.addColumn(new DateColumn(rs.getDate(i))); } break; case Types.TIMESTAMP: record.addColumn(new DateColumn(rs.getTimestamp(i))); break; case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: record.addColumn(new BytesColumn(rs.getBytes(i))); break; case Types.BINARY: String isArray = rs.getString(i); if (isArray.startsWith("[")&& isArray.endsWith("]")){ record.addColumn(new StringColumn(rs.getString(i))); }else { record.addColumn(new BytesColumn(rs.getBytes(i))); } break; // warn: bit(1) -> Types.BIT 可使用BoolColumn // warn: bit(>1) -> Types.VARBINARY 可使用BytesColumn case Types.BOOLEAN: case Types.BIT: record.addColumn(new BoolColumn(rs.getBoolean(i))); break; case Types.NULL: String stringData = null; if(rs.getObject(i) != null) { stringData = rs.getObject(i).toString(); } record.addColumn(new StringColumn(stringData)); break; default: throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库读取这种字段类型. 字段名:[%s], 字段名称:[%s], 字段Java类型:[%s]. 请尝试使用数据库函数将其转换datax支持的类型 或者不同步该字段 .", metaData.getColumnName(i), metaData.getColumnType(i), metaData.getColumnClassName(i))); } } } catch (Exception e) { if (IS_DEBUG) { LOG.debug("read data " + record.toString() + " occur exception:", e); } //TODO 这里识别为脏数据靠谱吗? taskPluginCollector.collectDirtyRecord(record, e); if (e instanceof DataXException) { throw (DataXException) e; } } return record; } } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ExecutorTemplate.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.Callable; import java.util.concurrent.ExecutorCompletionService; import java.util.concurrent.ExecutorService; import java.util.concurrent.Future; import java.util.concurrent.ThreadFactory; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; public class ExecutorTemplate { /** * The default thread pool size. Set as the number of available processors by default. */ public static int DEFAULT_POOL_SIZE = Runtime.getRuntime().availableProcessors(); /** * Indicate whether the executor closes automatically. */ private final boolean autoClose; /** * */ private final List> futures; /** * */ private final ExecutorService internalExecutor; private final ExecutorCompletionService completionService; /** * Set pool size for ExecutorTemplate. */ public static void setPoolSize(int size) { DEFAULT_POOL_SIZE = size; } /** * Default: 1024 AutoClose: true * * @param poolName */ public ExecutorTemplate(String poolName) { this(defaultExecutor(poolName), true); } /** * Default: 1024 AutoClose: true * * @param poolName */ public ExecutorTemplate(String poolName, int poolSize) { this(defaultExecutor(poolName, poolSize), true); } public ExecutorTemplate(String poolName, int poolSize, boolean autoClose) { this(defaultExecutor(poolName, poolSize), autoClose); } /** * Default: 1024 * * @param poolName * @param autoClose */ public ExecutorTemplate(String poolName, boolean autoClose) { this(defaultExecutor(poolName), autoClose); } /** * Default: 1024 AutoClose: true * * @param executor */ public ExecutorTemplate(ExecutorService executor) { this(executor, true); } /** * @param executor */ public ExecutorTemplate(ExecutorService executor, boolean autoClose) { this.autoClose = autoClose; this.internalExecutor = executor; this.completionService = new ExecutorCompletionService<>(executor); this.futures = Collections.synchronizedList(new ArrayList<>()); } /** * @param poolName * @return ExecutorService */ public static ExecutorService defaultExecutor(String poolName) { return defaultExecutor(100000, poolName, DEFAULT_POOL_SIZE); } /** * @param poolName * @param poolSize * @return ExecutorService */ public static ExecutorService defaultExecutor(String poolName, int poolSize) { return defaultExecutor(100000, poolName, poolSize); } /** * @param capacity * @param poolName * @return ExecutorService */ public static ExecutorService defaultExecutor(int capacity, String poolName, int poolSize) { return new ThreadPoolExecutor(poolSize, poolSize, 30, TimeUnit.SECONDS, /* */ new ArrayBlockingQueue<>(capacity), new NamedThreadFactory(poolName)); } /** * Submit a callable task * * @param task */ public void submit(Callable task) { Future f = this.completionService.submit(task); futures.add(f); check(f); } /** * Submit a runnable task * * @param task */ public void submit(Runnable task) { Future f = this.completionService.submit(task, null); futures.add(f); check(f); } /** * Wait all the task run finished, and get all the results. * * @return List */ public List waitForResult() { try { int index = 0; Throwable ex = null; List result = new ArrayList(); while (index < futures.size()) { try { Future f = this.completionService.take(); result.add(f.get()); } catch (Throwable e) { ex = getRootCause(e); break; } index++; } if (ex != null) { cancelAll(); throw new RuntimeException(ex); } else { return result; } } finally { clearFutures(); if (autoClose) { destroyExecutor(); } } } /** * */ public void cancelAll() { for (Future f : futures) { if (!f.isDone() && !f.isCancelled()) { f.cancel(false); } } } /** * */ public void clearFutures() { this.futures.clear(); } /** * */ public void destroyExecutor() { if (internalExecutor != null && !internalExecutor.isShutdown()) { this.internalExecutor.shutdown(); try { this.internalExecutor.awaitTermination(0, TimeUnit.SECONDS); } catch (InterruptedException e) { } } } /** * Fast check the future * * @param f */ private void check(Future f) { if (f != null && f.isDone()) { try { f.get(); } catch (Throwable e) { cancelAll(); throw new RuntimeException(e); } } } /** * @param throwable * @return Throwable */ private Throwable getRootCause(Throwable throwable) { final Throwable holder = throwable; final List list = new ArrayList<>(); while (throwable != null && !list.contains(throwable)) { list.add(throwable); throwable = throwable.getCause(); } return list.size() < 2 ? holder : list.get(list.size() - 1); } /** * An internal named thread factory */ static class NamedThreadFactory implements ThreadFactory { /** * */ private final boolean daemon; /** * */ private final String name; /** * */ private final AtomicInteger seq = new AtomicInteger(0); /** * @param name */ public NamedThreadFactory(String name) { this(name, false); } /** * @param name * @param daemon */ public NamedThreadFactory(String name, boolean daemon) { this.name = name; this.daemon = daemon; } @Override public Thread newThread(Runnable r) { Thread t = new Thread(r); t.setDaemon(daemon); t.setPriority(Thread.NORM_PRIORITY); t.setName((name + seq.incrementAndGet())); return t; } } } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtils.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.rdbms.reader.util.ObVersion; import com.alibaba.datax.plugin.rdbms.reader.util.SingleTableSplitUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.Constant; import com.alibaba.druid.sql.SQLUtils; import com.alibaba.druid.sql.ast.SQLExpr; import com.alibaba.druid.sql.ast.expr.SQLBinaryOpExpr; import com.alibaba.druid.sql.ast.expr.SQLBinaryOperator; import org.apache.commons.lang3.ArrayUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.*; import java.util.Map.Entry; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * @author johnrobbet */ public class ObReaderUtils { private static final Logger LOG = LoggerFactory.getLogger(ObReaderUtils.class); private static final String MYSQL_KEYWORDS = "ACCESSIBLE,ACCOUNT,ACTION,ADD,AFTER,AGAINST,AGGREGATE,ALGORITHM,ALL,ALTER,ALWAYS,ANALYSE,AND,ANY,AS,ASC,ASCII,ASENSITIVE,AT,AUTO_INCREMENT,AUTOEXTEND_SIZE,AVG,AVG_ROW_LENGTH,BACKUP,BEFORE,BEGIN,BETWEEN,BIGINT,BINARY,BINLOG,BIT,BLOB,BLOCK,BOOL,BOOLEAN,BOTH,BTREE,BY,BYTE,CACHE,CALL,CASCADE,CASCADED,CASE,CATALOG_NAME,CHAIN,CHANGE,CHANGED,CHANNEL,CHAR,CHARACTER,CHARSET,CHECK,CHECKSUM,CIPHER,CLASS_ORIGIN,CLIENT,CLOSE,COALESCE,CODE,COLLATE,COLLATION,COLUMN,COLUMN_FORMAT,COLUMN_NAME,COLUMNS,COMMENT,COMMIT,COMMITTED,COMPACT,COMPLETION,COMPRESSED,COMPRESSION,CONCURRENT,CONDITION,CONNECTION,CONSISTENT,CONSTRAINT,CONSTRAINT_CATALOG,CONSTRAINT_NAME,CONSTRAINT_SCHEMA,CONTAINS,CONTEXT,CONTINUE,CONVERT,CPU,CREATE,CROSS,CUBE,CURRENT,CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,CURRENT_USER,CURSOR,CURSOR_NAME,DATA,DATABASE,DATABASES,DATAFILE,DATE,DATETIME,DAY,DAY_HOUR,DAY_MICROSECOND,DAY_MINUTE,DAY_SECOND,DEALLOCATE,DEC,DECIMAL,DECLARE,DEFAULT,DEFAULT_AUTH,DEFINER,DELAY_KEY_WRITE," + "DELAYED,DELETE,DES_KEY_FILE,DESC,DESCRIBE,DETERMINISTIC,DIAGNOSTICS,DIRECTORY,DISABLE,DISCARD,DISK,DISTINCT,DISTINCTROW,DIV,DO,DOUBLE,DROP,DUAL,DUMPFILE,DUPLICATE,DYNAMIC,EACH,ELSE,ELSEIF,ENABLE,ENCLOSED,ENCRYPTION,END,ENDS,ENGINE,ENGINES,ENUM,ERROR,ERRORS,ESCAPE,ESCAPED,EVENT,EVENTS,EVERY,EXCHANGE,EXECUTE,EXISTS,EXIT,EXPANSION,EXPIRE,EXPLAIN,EXPORT,EXTENDED,EXTENT_SIZE,FAST,FAULTS,FETCH,FIELDS,FILE,FILE_BLOCK_SIZE,FILTER,FIRST,FIXED,FLOAT,FLOAT4,FLOAT8,FLUSH,FOLLOWS,FOR,FORCE,FOREIGN,FORMAT,FOUND,FROM,FULL,FULLTEXT,FUNCTION,GENERAL,GENERATED,GEOMETRY,GEOMETRYCOLLECTION,GET,GET_FORMAT,GLOBAL,GRANT,GRANTS,GROUP,GROUP_REPLICATION,HANDLER,HASH,HAVING,HELP,HIGH_PRIORITY,HOST,HOSTS,HOUR,HOUR_MICROSECOND,HOUR_MINUTE,HOUR_SECOND,IDENTIFIED,IF,IGNORE,IGNORE_SERVER_IDS,IMPORT,IN,INDEX,INDEXES,INFILE,INITIAL_SIZE,INNER,INOUT,INSENSITIVE,INSERT,INSERT_METHOD,INSTALL,INSTANCE,INT,INT1,INT2,INT3,INT4,INT8,INTEGER,INTERVAL,INTO,INVOKER,IO,IO_AFTER_GTIDS,IO_BEFORE_GTIDS,IO_THREAD," + "IPC,IS,ISOLATION,ISSUER,ITERATE,JOIN,JSON,KEY,KEY_BLOCK_SIZE,KEYS,KILL,LANGUAGE,LAST,LEADING,LEAVE,LEAVES,LEFT,LESS,LEVEL,LIKE,LIMIT,LINEAR,LINES,LINESTRING,LIST,LOAD,LOCAL,LOCALTIME,LOCALTIMESTAMP,LOCK,LOCKS,LOGFILE,LOGS,LONG,LONGBLOB,LONGTEXT,LOOP,LOW_PRIORITY,MASTER,MASTER_AUTO_POSITION,MASTER_BIND,MASTER_CONNECT_RETRY,MASTER_DELAY,MASTER_HEARTBEAT_PERIOD,MASTER_HOST,MASTER_LOG_FILE,MASTER_LOG_POS,MASTER_PASSWORD,MASTER_PORT,MASTER_RETRY_COUNT,MASTER_SERVER_ID,MASTER_SSL,MASTER_SSL_CA,MASTER_SSL_CAPATH,MASTER_SSL_CERT,MASTER_SSL_CIPHER,MASTER_SSL_CRL,MASTER_SSL_CRLPATH,MASTER_SSL_KEY,MASTER_SSL_VERIFY_SERVER_CERT,MASTER_TLS_VERSION,MASTER_USER,MATCH,MAX_CONNECTIONS_PER_HOUR,MAX_QUERIES_PER_HOUR,MAX_ROWS,MAX_SIZE,MAX_STATEMENT_TIME,MAX_UPDATES_PER_HOUR,MAX_USER_CONNECTIONS,MAXVALUE,MEDIUM,MEDIUMBLOB,MEDIUMINT,MEDIUMTEXT,MEMORY,MERGE,MESSAGE_TEXT,MICROSECOND,MIDDLEINT,MIGRATE,MIN_ROWS,MINUTE,MINUTE_MICROSECOND,MINUTE_SECOND,MOD,MODE,MODIFIES,MODIFY,MONTH," + "MULTILINESTRING,MULTIPOINT,MULTIPOLYGON,MUTEX,MYSQL_ERRNO,NAME,NAMES,NATIONAL,NATURAL,NCHAR,NDB,NDBCLUSTER,NEVER,NEW,NEXT,NO,NO_WAIT,NO_WRITE_TO_BINLOG,NODEGROUP,NONBLOCKING,NONE,NOT,NULL,NUMBER,NUMERIC,NVARCHAR,OFFSET,OLD_PASSWORD,ON,ONE,ONLY,OPEN,OPTIMIZE,OPTIMIZER_COSTS,OPTION,OPTIONALLY,OPTIONS,OR,ORDER,OUT,OUTER,OUTFILE,OWNER,PACK_KEYS,PAGE,PARSE_GCOL_EXPR,PARSER,PARTIAL,PARTITION,PARTITIONING,PARTITIONS,PASSWORD,PHASE,PLUGIN,PLUGIN_DIR,PLUGINS,POINT,POLYGON,PORT,PRECEDES,PRECISION,PREPARE,PRESERVE,PREV,PRIMARY,PRIVILEGES,PROCEDURE,PROCESSLIST,PROFILE,PROFILES,PROXY,PURGE,QUARTER,QUERY,QUICK,RANGE,READ,READ_ONLY,READ_WRITE,READS,REAL,REBUILD,RECOVER,REDO_BUFFER_SIZE,REDOFILE,REDUNDANT,REFERENCES,REGEXP,RELAY,RELAY_LOG_FILE,RELAY_LOG_POS,RELAY_THREAD,RELAYLOG,RELEASE,RELOAD,REMOVE,RENAME,REORGANIZE,REPAIR,REPEAT,REPEATABLE,REPLACE,REPLICATE_DO_DB,REPLICATE_DO_TABLE,REPLICATE_IGNORE_DB,REPLICATE_IGNORE_TABLE,REPLICATE_REWRITE_DB,REPLICATE_WILD_DO_TABLE," + "REPLICATE_WILD_IGNORE_TABLE,REPLICATION,REQUIRE,RESET,RESIGNAL,RESTORE,RESTRICT,RESUME,RETURN,RETURNED_SQLSTATE,RETURNS,REVERSE,REVOKE,RIGHT,RLIKE,ROLLBACK,ROLLUP,ROTATE,ROUTINE,ROW,ROW_COUNT,ROW_FORMAT,ROWS,RTREE,SAVEPOINT,SCHEDULE,SCHEMA,SCHEMA_NAME,SCHEMAS,SECOND,SECOND_MICROSECOND,SECURITY,SELECT,SENSITIVE,SEPARATOR,SERIAL,SERIALIZABLE,SERVER,SESSION,SET,SHARE,SHOW,SHUTDOWN,SIGNAL,SIGNED,SIMPLE,SLAVE,SLOW,SMALLINT,SNAPSHOT,SOCKET,SOME,SONAME,SOUNDS,SOURCE,SPATIAL,SPECIFIC,SQL,SQL_AFTER_GTIDS,SQL_AFTER_MTS_GAPS,SQL_BEFORE_GTIDS,SQL_BIG_RESULT,SQL_BUFFER_RESULT,SQL_CACHE,SQL_CALC_FOUND_ROWS,SQL_NO_CACHE,SQL_SMALL_RESULT,SQL_THREAD,SQL_TSI_DAY,SQL_TSI_HOUR,SQL_TSI_MINUTE,SQL_TSI_MONTH,SQL_TSI_QUARTER,SQL_TSI_SECOND,SQL_TSI_WEEK,SQL_TSI_YEAR,SQLEXCEPTION,SQLSTATE,SQLWARNING,SSL,STACKED,START,STARTING,STARTS,STATS_AUTO_RECALC,STATS_PERSISTENT,STATS_SAMPLE_PAGES,STATUS,STOP,STORAGE,STORED,STRAIGHT_JOIN,STRING,SUBCLASS_ORIGIN,SUBJECT,SUBPARTITION,SUBPARTITIONS,SUPER," + "SUSPEND,SWAPS,SWITCHES,TABLE,TABLE_CHECKSUM,TABLE_NAME,TABLES,TABLESPACE,TEMPORARY,TEMPTABLE,TERMINATED,TEXT,THAN,THEN,TIME,TIMESTAMP,TIMESTAMPADD,TIMESTAMPDIFF,TINYBLOB,TINYINT,TINYTEXT,TO,TRAILING,TRANSACTION,TRIGGER,TRIGGERS,TRUNCATE,TYPE,TYPES,UNCOMMITTED,UNDEFINED,UNDO,UNDO_BUFFER_SIZE,UNDOFILE,UNICODE,UNINSTALL,UNION,UNIQUE,UNKNOWN,UNLOCK,UNSIGNED,UNTIL,UPDATE,UPGRADE,USAGE,USE,USE_FRM,USER,USER_RESOURCES,USING,UTC_DATE,UTC_TIME,UTC_TIMESTAMP,VALIDATION,VALUE,VALUES,VARBINARY,VARCHAR,VARCHARACTER,VARIABLES,VARYING,VIEW,VIRTUAL,WAIT,WARNINGS,WEEK,WEIGHT_STRING,WHEN,WHERE,WHILE,WITH,WITHOUT,WORK,WRAPPER,WRITE,X509,XA,XID,XML,XOR,YEAR,YEAR_MONTH,ZEROFILL,FALSE,TRUE"; private static final String ORACLE_KEYWORDS = "ACCESS,ADD,ALL,ALTER,AND,ANY,ARRAYLEN,AS,ASC,AUDIT,BETWEEN,BY,CHAR,CHECK,CLUSTER,COLUMN,COMMENT,COMPRESS,CONNECT,CREATE,CURRENT,DATE,DECIMAL,DEFAULT,DELETE,DESC,DISTINCT,DROP,ELSE,EXCLUSIVE,EXISTS,FILE,FLOAT,FOR,FROM,GRANT,GROUP,HAVING,IDENTIFIED,IMMEDIATE,IN,INCREMENT,INDEX,INITIAL,INSERT,INTEGER,INTERSECT,INTO,IS,LEVEL,LIKE,LOCK,LONG,MAXEXTENTS,MINUS,MODE,MODIFY,NOAUDIT,NOCOMPRESS,NOT,NOTFOUND,NOWAIT,NUMBER,OF,OFFLINE,ON,ONLINE,OPTION,OR,ORDER,PCTFREE,PRIOR,PRIVILEGES,PUBLIC,RAW,RENAME,RESOURCE,REVOKE,ROW,ROWID,ROWLABEL,ROWNUM,ROWS,SELECT,SESSION,SET,SHARE,SIZE,SMALLINT,SQLBUF,START,SUCCESSFUL,SYNONYM,TABLE,THEN,TO,TRIGGER,UID,UNION,UNIQUE,UPDATE,USER,VALIDATE,VALUES,VARCHAR,VARCHAR2,VIEW,WHENEVER,WHERE,WITH,KEY,NAME,VALUE,TYPE"; private static Set databaseKeywords; final static public String OB_COMPATIBLE_MODE = "obCompatibilityMode"; final static public String OB_COMPATIBLE_MODE_ORACLE = "ORACLE"; final static public String OB_COMPATIBLE_MODE_MYSQL = "MYSQL"; public static String compatibleMode = OB_COMPATIBLE_MODE_MYSQL; public static final DataBaseType databaseType = DataBaseType.OceanBase; private static final String TABLE_SCHEMA_DELIMITER = "."; private static final Pattern JDBC_PATTERN = Pattern.compile("jdbc:(oceanbase|mysql)://([\\w\\.-]+:\\d+)/([\\w\\.-]+)"); private static Set keywordsFromString2HashSet(final String keywords) { return new HashSet(Arrays.asList(keywords.split(","))); } public static String escapeDatabaseKeyword(String keyword) { if (databaseKeywords == null) { if (isOracleMode(compatibleMode)) { databaseKeywords = keywordsFromString2HashSet(ORACLE_KEYWORDS); } else { databaseKeywords = keywordsFromString2HashSet(MYSQL_KEYWORDS); } } char escapeChar = isOracleMode(compatibleMode) ? '"' : '`'; if (databaseKeywords.contains(keyword.toUpperCase())) { keyword = escapeChar + keyword + escapeChar; } return keyword; } public static void escapeDatabaseKeyword(List ids) { if (ids != null && ids.size() > 0) { for (int i = 0; i < ids.size(); i++) { ids.set(i, escapeDatabaseKeyword(ids.get(i))); } } } public static Boolean isEscapeMode(String keyword) { if (isOracleMode(compatibleMode)) { return keyword.startsWith("\"") && keyword.endsWith("\""); } else { return keyword.startsWith("`") && keyword.endsWith("`"); } } public static void initConn4Reader(Connection conn, long queryTimeoutSeconds) { String setQueryTimeout = "set ob_query_timeout=" + (queryTimeoutSeconds * 1000 * 1000L); String setTrxTimeout = "set ob_trx_timeout=" + ((queryTimeoutSeconds + 5) * 1000 * 1000L); Statement stmt = null; try { conn.setAutoCommit(true); stmt = conn.createStatement(); stmt.execute(setQueryTimeout); stmt.execute(setTrxTimeout); LOG.warn("setAutoCommit=true;" + setQueryTimeout + ";" + setTrxTimeout + ";"); } catch (Throwable e) { LOG.warn("initConn4Reader fail", e); } finally { DBUtil.closeDBResources(stmt, null); } } public static void sleep(int ms) { try { Thread.sleep(ms); } catch (InterruptedException e) { } } /** * @param conn * @param context */ public static void matchPkIndexs(Connection conn, TaskContext context) { String[] pkColumns = getPkColumns(conn, context); if (ArrayUtils.isEmpty(pkColumns)) { LOG.warn("table=" + context.getTable() + " has no primary key"); return; } List columns = context.getColumns(); // 最后参与排序的索引列 context.setPkColumns(pkColumns); final String escapeChar = isOracleMode(context.getCompatibleMode()) ? "\"" : "`"; int[] pkIndexs = new int[pkColumns.length]; for (int i = 0, n = pkColumns.length; i < n; i++) { String pkc = pkColumns[i]; String escapedPkc = String.format("%s%s%s", escapeChar, pkc, escapeChar); int j = 0; for (int k = columns.size(); j < k; j++) { // 如果用户定义的 columns中 带有 ``,也不影响, // 最多只是在select里多加了几列PK column if (StringUtils.equalsIgnoreCase(pkc, columns.get(j)) || StringUtils.equalsIgnoreCase(escapedPkc, columns.get(j))) { pkIndexs[i] = j; pkColumns[i] = columns.get(j); break; } } // 到这里 说明主键列不在columns中,则主动追加到尾部 if (j == columns.size()) { columns.add(pkc); pkIndexs[i] = columns.size() - 1; } } context.setPkIndexs(pkIndexs); } private static String[] getPkColumns(Connection conn, TaskContext context) { String tableName = context.getTable(); String sql = "show index from " + tableName + " where Key_name='PRIMARY'"; if (isOracleMode(context.getCompatibleMode())) { tableName = tableName.toUpperCase(); String schema; if (tableName.contains(TABLE_SCHEMA_DELIMITER)) { schema = String.format("'%s'", tableName.substring(0, tableName.indexOf("."))); tableName = tableName.substring(tableName.indexOf(".") + 1); } else { schema = "(select sys_context('USERENV','current_schema') from dual)"; } //OceanBase oracle模式下需要使用position排序获取正确的联合主键顺序 sql = String.format( "SELECT cols.column_name Column_name " + "FROM all_constraints cons, all_cons_columns cols " + "WHERE cols.table_name = '%s' AND cons.constraint_type = 'P' " + "AND cons.constraint_name = cols.constraint_name " + "AND cons.owner = cols.owner and cons.OWNER = %s " + "order by cols.position ", tableName, schema); } LOG.info("get primary key by sql: " + sql); Statement ps = null; ResultSet rs = null; List realIndex = new ArrayList(); realIndex.addAll(context.getSecondaryIndexColumns()); try { ps = conn.createStatement(); rs = ps.executeQuery(sql); boolean hasPk = false; while (rs.next()) { hasPk = true; String columnName = rs.getString("Column_name"); columnName = escapeDatabaseKeyword(columnName); if (!realIndex.contains(columnName)) { realIndex.add(columnName); } } if (hasPk) { String[] pks = new String[realIndex.size()]; realIndex.toArray(pks); return pks; } } catch (Throwable e) { LOG.error("show index from table fail :" + sql, e); } finally { close(rs, ps, null); } return null; } /** * 首次查的SQL * * @param context * @return */ public static String buildFirstQuerySql(TaskContext context) { String userSavePoint = context.getUserSavePoint(); String indexName = context.getIndexName(); String sql = "select "; boolean weakRead = context.getWeakRead(); if (StringUtils.isNotEmpty(indexName)) { String weakReadHint = weakRead ? "+READ_CONSISTENCY(WEAK)," : "+"; sql += " /*" + weakReadHint + "index(" + context.getTable() + " " + indexName + ")*/ "; } else if (weakRead) { sql += " /*+READ_CONSISTENCY(WEAK)*/ "; } sql += StringUtils.join(context.getColumns(), ','); sql += " from " + context.getTable(); if (context.getPartitionName() != null) { sql += String.format(" partition(%s) ", context.getPartitionName()); } if (StringUtils.isNotEmpty(context.getWhere())) { sql += " where " + context.getWhere(); } if (userSavePoint != null && userSavePoint.length() != 0) { userSavePoint = userSavePoint.replace("=", ">"); sql += (StringUtils.isNotEmpty(context.getWhere()) ? " and " : " where ") + userSavePoint; } if (context.getPkColumns() != null && context.getPkColumns().length > 0) { // 有主键 sql += " order by " + StringUtils.join(context.getPkColumns(), ',') + " asc"; } return sql; } /** * 增量查的SQL * * @param conn * @param context * @return sql */ public static String buildAppendQuerySql(Connection conn, TaskContext context) { String indexName = context.getIndexName(); boolean weakRead = context.getWeakRead(); String sql = "select "; if (StringUtils.isNotEmpty(indexName)) { String weakReadHint = weakRead ? "+READ_CONSISTENCY(WEAK)," : "+"; sql += " /*" + weakReadHint + "index(" + context.getTable() + " " + indexName + ")*/ "; } else if (weakRead) { sql += " /*+READ_CONSISTENCY(WEAK)*/ "; } sql += StringUtils.join(context.getColumns(), ',') + " from " + context.getTable(); if (context.getPartitionName() != null) { sql += String.format(" partition(%s) ", context.getPartitionName()); } String[] pkColumns = context.getPkColumns(); StringBuilder whereClause = new StringBuilder(); if (pkColumns != null && pkColumns.length > 0) { whereClause.append(" ("); for (int i = 0; i < pkColumns.length; i++) { if (i == 0) { whereClause.append(pkColumns[i]).append(" > ?"); } else { whereClause.append(" OR ("); for (int j = 0; j <= i; j++) { if (j > 0) { whereClause.append(" AND "); } if (j == i) { whereClause.append(pkColumns[j]).append(" > ? "); } else { whereClause.append(pkColumns[j]).append(" = ? "); } } whereClause.append(")"); } } whereClause.append(")"); // 如果有额外的 WHERE 条件,则拼接进去 if (StringUtils.isNotEmpty(context.getWhere())) { whereClause.insert(0, "(" + context.getWhere() + ") AND "); } sql += " where " + whereClause; // 添加 ORDER BY 子句 sql += " order by " + StringUtils.join(pkColumns, ",") + " asc"; } else { // 无主键 if (StringUtils.isNotEmpty(context.getWhere())) { sql += " where " + context.getWhere(); } } return sql; } /** * check if the userSavePoint is valid * * @param context * @return true - valid, false - invalid */ public static boolean isUserSavePointValid(TaskContext context) { String userSavePoint = context.getUserSavePoint(); if (userSavePoint == null || userSavePoint.length() == 0) { LOG.info("user save point is empty!"); return false; } LOG.info("validating user save point: " + userSavePoint); final String patternString = "(.+)=(.+)"; Pattern parttern = Pattern.compile(patternString); Matcher matcher = parttern.matcher(userSavePoint); if (!matcher.find()) { LOG.error("user save point format is not correct: " + userSavePoint); return false; } List columnsInUserSavePoint = getColumnsFromUserSavePoint(userSavePoint); List valuesInUserSavePoint = getValuesFromUserSavePoint(userSavePoint); if (columnsInUserSavePoint.size() == 0 || valuesInUserSavePoint.size() == 0 || columnsInUserSavePoint.size() != valuesInUserSavePoint.size()) { LOG.error("number of columns and values in user save point are different:" + userSavePoint); return false; } String where = context.getWhere(); if (StringUtils.isNotEmpty(where)) { for (String column : columnsInUserSavePoint) { if (where.contains(column)) { LOG.error("column " + column + " is conflict with where: " + where); return false; } } } // Columns in userSavePoint must be the selected index. String[] pkColumns = context.getPkColumns(); if (pkColumns.length != columnsInUserSavePoint.size()) { LOG.error("user save point is not on the selected index."); return false; } for (String column : columnsInUserSavePoint) { boolean found = false; for (String pkCol : pkColumns) { if (pkCol.equals(column)) { found = true; break; } } if (!found) { LOG.error("column " + column + " is not on the selected index."); return false; } } return true; } private static String removeBracket(String str) { final char leftBracket = '('; final char rightBracket = ')'; if (str != null && str.contains(String.valueOf(leftBracket)) && str.contains(String.valueOf(rightBracket)) && str.indexOf(leftBracket) < str.indexOf(rightBracket)) { return str.substring(str.indexOf(leftBracket) + 1, str.indexOf(rightBracket)); } return str; } private static List getColumnsFromUserSavePoint(String userSavePoint) { return Arrays.asList(removeBracket(userSavePoint.split("=")[0]).split(",")); } private static List getValuesFromUserSavePoint(String userSavePoint) { return Arrays.asList(removeBracket(userSavePoint.split("=")[1]).split(",")); } /** * 先解析成where *

* 再判断是否存在索引 * * @param conn * @param context * @return */ public static void initIndex(Connection conn, TaskContext context) { if (StringUtils.isEmpty(context.getWhere())) { return; } SQLExpr expr = SQLUtils.toSQLExpr(context.getWhere(), "mysql"); List allColumnsInTab = getAllColumnFromTab(conn, context.getTable()); List allColNames = getColNames(allColumnsInTab, expr); if (allColNames == null) { return; } // Remove the duplicated column names Set colNames = new TreeSet(); for (String colName : allColNames) { if (!colNames.contains(colName)) { colNames.add(colName); } } List indexNames = getIndexName(conn, context.getTable(), colNames, context.getCompatibleMode()); findBestIndex(conn, indexNames, context.getTable(), context); } private static List getAllColumnFromTab(Connection conn, String tableName) { String sql = "show columns from " + tableName; Statement stmt = null; ResultSet rs = null; List allColumns = new ArrayList(); try { stmt = conn.createStatement(); rs = stmt.executeQuery(sql); while (rs.next()) { allColumns.add(rs.getString("Field").toUpperCase()); } } catch (Exception e) { LOG.warn("fail to get all columns from table " + tableName, e); } finally { close(rs, stmt, null); } LOG.info("all columns in tab: " + String.join(",", allColumns)); return allColumns; } /** * 找出where条件中的列名,目前仅支持全部为and条件,并且操作符为大于、大约等于、等于、小于、小于等于和不等于的表达式。 *

* test coverage: - c6 = 20180710 OR c4 = 320: no index selected - 20180710 * = c6: correct index selected - 20180710 = c6 and c4 = 320 or c2 < 100: no * index selected * * @param expr * @return */ private static List getColNames(List allColInTab, SQLExpr expr) { List colNames = new ArrayList(); if (expr instanceof SQLBinaryOpExpr) { SQLBinaryOpExpr exp = (SQLBinaryOpExpr) expr; if (exp.getOperator() == SQLBinaryOperator.BooleanAnd) { List leftColumns = getColNames(allColInTab, exp.getLeft()); List rightColumns = getColNames(allColInTab, exp.getRight()); if (leftColumns == null || rightColumns == null) { return null; } colNames.addAll(leftColumns); colNames.addAll(rightColumns); } else if (exp.getOperator() == SQLBinaryOperator.GreaterThan || exp.getOperator() == SQLBinaryOperator.GreaterThanOrEqual || exp.getOperator() == SQLBinaryOperator.Equality || exp.getOperator() == SQLBinaryOperator.LessThan || exp.getOperator() == SQLBinaryOperator.LessThanOrEqual || exp.getOperator() == SQLBinaryOperator.NotEqual) { // only support simple comparison operators String left = SQLUtils.toMySqlString(exp.getLeft()).toUpperCase(); String right = SQLUtils.toMySqlString(exp.getRight()).toUpperCase(); LOG.debug("left: " + left + ", right: " + right); if (allColInTab.contains(left)) { colNames.add(left); } if (allColInTab.contains(right)) { colNames.add(right); } } else { // unsupported operators return null; } } return colNames; } private static Map> getAllIndex(Connection conn, String tableName, String compatibleMode) { Map> allIndex = new HashMap>(); String sql = "show index from " + tableName; if (isOracleMode(compatibleMode)) { String schema; tableName = tableName.toUpperCase(); if (tableName.contains(TABLE_SCHEMA_DELIMITER)) { schema = String.format("'%s'", tableName.substring(0, tableName.indexOf("."))); tableName = tableName.substring(tableName.indexOf(".") + 1); } else { schema = "(select sys_context('USERENV','current_schema') from dual)"; } sql = String.format( "SELECT INDEX_NAME Key_name, COLUMN_NAME Column_name " + "from all_ind_columns " + "where TABLE_NAME = '%s' and TABLE_OWNER = %s " + " union all " + "SELECT DISTINCT " + "CASE " + "WHEN cons.CONSTRAINT_TYPE = 'P' THEN 'PRIMARY' " + "WHEN cons.CONSTRAINT_TYPE = 'U' THEN cons.CONSTRAINT_NAME " + "ELSE '' " + "END AS Key_name, " + "cols.column_name Column_name " + "FROM all_constraints cons, all_cons_columns cols " + "WHERE cols.table_name = '%s' AND cons.constraint_type in('P', 'U') " + "AND cons.constraint_name = cols.constraint_name AND cons.owner = cols.owner " + "AND cons.owner = %s", tableName, schema, tableName, schema); } Statement stmt = null; ResultSet rs = null; try { LOG.info("running sql to get index: " + sql); stmt = conn.createStatement(); rs = stmt.executeQuery(sql); while (rs.next()) { String keyName = rs.getString("Key_name"); String colName = rs.getString("Column_name").toUpperCase(); if (allIndex.containsKey(keyName)) { allIndex.get(keyName).add(colName); } else { List allColumns = new ArrayList(); allColumns.add(colName); allIndex.put(keyName, allColumns); } } // add primary key to all index if (allIndex.containsKey("PRIMARY")) { List colsInPrimary = allIndex.get("PRIMARY"); Iterator>> iterator = allIndex.entrySet().iterator(); while (iterator.hasNext()) { Map.Entry> entry = iterator.next(); if ("PRIMARY".equals(entry.getKey())) { continue; } // remove the index which is identical with primary key List indexColumns = entry.getValue(); if (colsInPrimary.equals(indexColumns)) { iterator.remove(); } else { // add primary key to the index if the index is not on the column colsInPrimary.forEach( c -> { if (!indexColumns.contains(c)) { indexColumns.add(c); } }); } } } } catch (Exception e) { LOG.error("fail to get all keys from table" + sql, e); } finally { close(rs, stmt, null); } LOG.info("all index: " + allIndex.toString()); return allIndex; } /** * find out the indexes which contains all columns in where conditions * * @param conn * @param table * @param colNamesInCondition * @return */ private static List getIndexName(Connection conn, String table, Set colNamesInCondition, String compatibleMode) { List indexNames = new ArrayList(); if (colNamesInCondition == null || colNamesInCondition.size() == 0) { LOG.info("there is no qulified conditions in the where clause, skip index selection."); return indexNames; } LOG.info("columnNamesInConditions: " + String.join(",", colNamesInCondition)); Map> allIndex = getAllIndex(conn, table, compatibleMode); for (String keyName : allIndex.keySet()) { boolean indexNotMatch = false; // If the index does not have all the column in where conditions, it // can not be chosen // the selected index must start with the columns in where condition if (allIndex.get(keyName).size() < colNamesInCondition.size()) { indexNotMatch = true; } else { // the first number columns of this index int num = colNamesInCondition.size(); for (String colName : allIndex.get(keyName)) { if (!colNamesInCondition.contains(colName)) { indexNotMatch = true; break; } if (--num == 0) { break; } } } if (indexNotMatch) { continue; } else { indexNames.add(keyName); } } return indexNames; } /** * 以 column开头的索引,可能有多个,也可能存在多列的情形 *

* 所以,需要选择列数最少的 * * @param indexNames * @param context */ private static void findBestIndex(Connection conn, List indexNames, String table, TaskContext context) { if (indexNames.size() == 0) { LOG.warn("table has no index."); return; } Map> allIndexs = new HashMap>(); String sql = "show index from " + table + " where key_name in (" + buildPlaceHolder(indexNames.size()) + ")"; if (isOracleMode(context.getCompatibleMode())) { Map> allIndexInTab = getAllIndex(conn, table, context.getCompatibleMode()); for (String indexName : indexNames) { if (allIndexInTab.containsKey(indexName)) { Map index = new TreeMap(); List columnList = allIndexInTab.get(indexName); for (int i = 1; i <= columnList.size(); i++) { index.put(i, columnList.get(i - 1)); } allIndexs.put(indexName, index); } else { LOG.error("index does not exist: " + indexName); } } } else { PreparedStatement ps = null; ResultSet rs = null; try { ps = conn.prepareStatement(sql); for (int i = 0, n = indexNames.size(); i < n; i++) { ps.setString(i + 1, indexNames.get(i)); } rs = ps.executeQuery(); while (rs.next()) { String keyName = rs.getString("Key_name"); Map index = allIndexs.get(keyName); if (index == null) { index = new TreeMap(); allIndexs.put(keyName, index); } int keyInIndex = rs.getInt("Seq_in_index"); String column = rs.getString("Column_name"); index.put(keyInIndex, column); } } catch (Throwable e) { LOG.error("show index from table fail :" + sql, e); } finally { close(rs, ps, null); } } LOG.info("possible index:" + allIndexs + ",where:" + context.getWhere()); Entry> chooseIndex = null; int columnCount = Integer.MAX_VALUE; for (Entry> entry : allIndexs.entrySet()) { if (entry.getValue().size() < columnCount) { columnCount = entry.getValue().size(); chooseIndex = entry; } } if (chooseIndex != null) { LOG.info("choose index name:" + chooseIndex.getKey() + ",columns:" + chooseIndex.getValue()); context.setIndexName(chooseIndex.getKey()); context.setSecondaryIndexColumns(new ArrayList(chooseIndex.getValue().values())); } } /** * 由于ObProxy存在bug,事务超时或事务被杀时,conn的close是没有响应的 * * @param rs * @param stmt * @param conn */ public static void close(final ResultSet rs, final Statement stmt, final Connection conn) { DBUtil.closeDBResources(rs, stmt, conn); } /** * 判断是否重复record * * @param savePoint * @param row * @param pkIndexs * @return */ public static boolean isPkEquals(Record savePoint, Record row, int[] pkIndexs) { if (savePoint == null || row == null) { return false; } try { for (int index : pkIndexs) { Object left = savePoint.getColumn(index).getRawData(); Object right = row.getColumn(index).getRawData(); if (!left.equals(right)) { return false; } } } catch (Throwable e) { return false; } return true; } public static String buildPlaceHolder(int n) { if (n <= 0) { return ""; } StringBuilder str = new StringBuilder(2 * n); str.append('?'); for (int i = 1; i < n; i++) { str.append(",?"); } return str.toString(); } public static void binding(PreparedStatement ps, List list) throws SQLException { if (list.isEmpty()) { return; } List columns = buildFullParams(list); for (int i = 0; i < columns.size(); i++) { Column c = columns.get(i); if (c instanceof BoolColumn) { ps.setLong(i + 1, ((BoolColumn) c).asLong()); } else if (c instanceof BytesColumn) { ps.setBytes(i + 1, ((BytesColumn) c).asBytes()); } else if (c instanceof DateColumn) { ps.setTimestamp(i + 1, new Timestamp(((DateColumn) c).asDate().getTime())); } else if (c instanceof DoubleColumn) { //应该直接使用bigDecimal,asDouble会先转换成bigDecimal再转换成Double会导致精度丢失 ps.setBigDecimal(i + 1, ((DoubleColumn) c).asBigDecimal()); } else if (c instanceof LongColumn) { ps.setLong(i + 1, ((LongColumn) c).asLong()); } else if (c instanceof StringColumn) { ps.setString(i + 1, ((StringColumn) c).asString()); } else { ps.setObject(i + 1, c.getRawData()); } } } //增多检查点,上游的构建行为为A,AB,ABC,ABCD的组合,占位符的数量为n(n+1)/2,n为主键列的数量 public static List buildFullParams(List savePointColumns) { if (savePointColumns == null || savePointColumns.isEmpty()) { return new ArrayList<>(); } int n = savePointColumns.size(); List fullParams = new ArrayList<>(); for (int i = 0; i < n; i++) { for (int j = 0; j <= i; j++) { fullParams.add(savePointColumns.get(j)); } } return fullParams; } public static List buildPoint(Record savePoint, int[] pkIndexs) { List result = new ArrayList(pkIndexs.length); for (int i = 0, n = pkIndexs.length; i < n; i++) { result.add(savePoint.getColumn(pkIndexs[i])); } return result; } public static String getCompatibleMode(Connection conn) { String compatibleMode = OB_COMPATIBLE_MODE_MYSQL; String getCompatibleModeSql = "SHOW VARIABLES LIKE 'ob_compatibility_mode'"; Statement stmt = null; ResultSet rs = null; try { stmt = conn.createStatement(); rs = stmt.executeQuery(getCompatibleModeSql); if (rs.next()) { compatibleMode = rs.getString("VALUE"); } } catch (Exception e) { LOG.error("fail to get ob compatible mode, using mysql as default: " + e.getMessage()); } finally { DBUtil.closeDBResources(rs, stmt, conn); } LOG.info("ob compatible mode is " + compatibleMode); return compatibleMode; } public static boolean isOracleMode(String mode) { return (mode != null && OB_COMPATIBLE_MODE_ORACLE.equalsIgnoreCase(mode)); } public static String getDbNameFromJdbcUrl(String jdbcUrl) { Matcher matcher = JDBC_PATTERN.matcher(jdbcUrl); if (matcher.find()) { return matcher.group(3); } else { LOG.error("jdbc url {} is not valid.", jdbcUrl); } return null; } public static String buildQuerySql(boolean weakRead, String column, String table, String where) { if (weakRead) { return buildWeakReadQuerySql(column, table, where); } else { return SingleTableSplitUtil.buildQuerySql(column, table, where); } } public static String buildWeakReadQuerySql(String column, String table, String where) { String querySql; if (StringUtils.isBlank(where)) { querySql = String.format(Constant.WEAK_READ_QUERY_SQL_TEMPLATE_WITHOUT_WHERE, column, table); } else { querySql = String.format(Constant.WEAK_READ_QUERY_SQL_TEMPLATE, column, table, where); } return querySql; } /** * compare two ob versions * * @param version1 * @param version2 * @return 0 when the two versions are the same * -1 when version1 is smaller (earlier) than version2 * 1 when version is bigger (later) than version2 */ public static int compareObVersion(String version1, String version2) { if (version1 == null || version2 == null) { throw new RuntimeException("can not compare null version"); } ObVersion v1 = new ObVersion(version1); ObVersion v2 = new ObVersion(version2); return v1.compareTo(v2); } /** * @param conn * @param sql * @return */ public static List getResultsFromSql(Connection conn, String sql) { List list = new ArrayList(); Statement stmt = null; ResultSet rs = null; LOG.info("executing sql: " + sql); try { stmt = conn.createStatement(); rs = stmt.executeQuery(sql); while (rs.next()) { list.add(rs.getString(1)); } } catch (Exception e) { LOG.error("error when executing sql: " + e.getMessage()); } finally { DBUtil.closeDBResources(rs, stmt, null); } return list; } /** * get obversion, try ob_version first, and then try version if failed * * @param conn * @return */ public static ObVersion getObVersion(Connection conn) { List results = getResultsFromSql(conn, "select ob_version()"); if (results.size() == 0) { results = getResultsFromSql(conn, "select version()"); } ObVersion obVersion = new ObVersion(results.get(0)); LOG.info("obVersion: " + obVersion); return obVersion; } } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartInfo.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; import java.util.ArrayList; import java.util.List; /** * @author johnrobbet */ public class PartInfo { private PartType partType; List partList; public PartInfo(PartType partType) { this.partType = partType; this.partList = new ArrayList(); } public String getPartType () { return partType.getTypeString(); } public void addPart(List partList) { this.partList.addAll(partList); } public List getPartList() { return partList; } public boolean isPartitionTable() { return partType != PartType.NONPARTITION && partList.size() > 0; } } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartType.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; /** * @author johnrobbet */ public enum PartType { // Non partitioned table NONPARTITION("NONPARTITION"), // Partitioned table PARTITION("PARTITION"), // Subpartitioned table SUBPARTITION("SUBPARTITION"); private String typeString; PartType (String typeString) { this.typeString = typeString; } public String getTypeString() { return typeString; } } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/PartitionSplitUtil.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.reader.util.ObVersion; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.reader.oceanbasev10reader.ext.ObReaderKey; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.List; /** * @author johnrobbet */ public class PartitionSplitUtil { private static final Logger LOG = LoggerFactory.getLogger(PartitionSplitUtil.class); private static final String ORACLE_GET_SUBPART_TEMPLATE = "select subpartition_name " + "from dba_tab_subpartitions " + "where table_name = '%s' and table_owner = '%s'"; private static final String ORACLE_GET_PART_TEMPLATE = "select partition_name " + "from dba_tab_partitions " + "where table_name = '%s' and table_owner = '%s'"; private static final String MYSQL_GET_PART_TEMPLATE = "select p.part_name " + "from oceanbase.__all_part p, oceanbase.%s t, oceanbase.__all_database d " + "where p.table_id = t.table_id " + "and d.database_id = t.database_id " + "and d.database_name = '%s' " + "and t.table_name = '%s'"; private static final String MYSQL_GET_SUBPART_TEMPLATE = "select p.sub_part_name " + "from oceanbase.__all_sub_part p, oceanbase.%s t, oceanbase.__all_database d " + "where p.table_id = t.table_id " + "and d.database_id = t.database_id " + "and d.database_name = '%s' " + "and t.table_name = '%s'"; /** * get partition info from data dictionary in ob oracle mode * @param config * @param tableName * @return */ public static PartInfo getObOraclePartInfoBySQL(Configuration config, String tableName) { PartInfo partInfo; DataBaseType dbType = ObReaderUtils.databaseType; String jdbcUrl = config.getString(Key.JDBC_URL); String username = config.getString(Key.USERNAME); String password = config.getString(Key.PASSWORD); String dbname = ObReaderUtils.getDbNameFromJdbcUrl(jdbcUrl).toUpperCase(); Connection conn = DBUtil.getConnection(dbType, jdbcUrl, username, password); tableName = tableName.toUpperCase(); // check if the table has subpartitions or not String getSubPartSql = String.format(ORACLE_GET_SUBPART_TEMPLATE, tableName, dbname); List partList = ObReaderUtils.getResultsFromSql(conn, getSubPartSql); if (partList != null && partList.size() > 0) { partInfo = new PartInfo(PartType.SUBPARTITION); partInfo.addPart(partList); return partInfo; } String getPartSql = String.format(ORACLE_GET_PART_TEMPLATE, tableName, dbname); partList = ObReaderUtils.getResultsFromSql(conn, getPartSql); if (partList != null && partList.size() > 0) { partInfo = new PartInfo(PartType.PARTITION); partInfo.addPart(partList); return partInfo; } // table is not partitioned partInfo = new PartInfo(PartType.NONPARTITION); return partInfo; } public static List splitByPartition (Configuration configuration) { List allSlices = new ArrayList<>(); List connections = configuration.getList(Constant.CONN_MARK, Object.class); for (int i = 0, len = connections.size(); i < len; i++) { Configuration sliceConfig = configuration.clone(); Configuration connConf = Configuration.from(connections.get(i).toString()); String jdbcUrl = connConf.getString(Key.JDBC_URL); sliceConfig.set(Key.JDBC_URL, jdbcUrl); sliceConfig.remove(Constant.CONN_MARK); List tables = connConf.getList(Key.TABLE, String.class); for (String table : tables) { Configuration tempSlice = sliceConfig.clone(); tempSlice.set(Key.TABLE, table); allSlices.addAll(splitSinglePartitionTable(tempSlice)); } } return allSlices; } private static List splitSinglePartitionTable(Configuration configuration) { String table = configuration.getString(Key.TABLE); String where = configuration.getString(Key.WHERE, null); String column = configuration.getString(Key.COLUMN); final boolean weakRead = configuration.getBool(Key.WEAK_READ, true); List slices = new ArrayList(); PartInfo partInfo = getObPartInfoBySQL(configuration, table); if (partInfo != null && partInfo.isPartitionTable()) { String partitionType = partInfo.getPartType(); for (String partitionName : partInfo.getPartList()) { LOG.info(String.format("add %s %s for table %s", partitionType, partitionName, table)); Configuration slice = configuration.clone(); slice.set(ObReaderKey.PARTITION_NAME, partitionName); slice.set(ObReaderKey.PARTITION_TYPE, partitionType); slice.set(Key.QUERY_SQL, ObReaderUtils.buildQuerySql(weakRead, column, String.format("%s partition(%s)", table, partitionName), where)); slices.add(slice); } } else { LOG.info("table is not partitioned."); Configuration slice = configuration.clone(); slice.set(Key.QUERY_SQL, ObReaderUtils.buildQuerySql(weakRead, column, table, where)); slices.add(slice); } return slices; } public static PartInfo getObPartInfoBySQL(Configuration config, String table) { boolean isOracleMode = config.getString(ObReaderKey.OB_COMPATIBILITY_MODE).equals("ORACLE"); if (isOracleMode) { return getObOraclePartInfoBySQL(config, table); } else { return getObMySQLPartInfoBySQL(config, table); } } public static PartInfo getObMySQLPartInfoBySQL(Configuration config, String table) { PartInfo partInfo = new PartInfo(PartType.NONPARTITION); List partList; Connection conn = null; try { String jdbcUrl = config.getString(Key.JDBC_URL); String username = config.getString(Key.USERNAME); String password = config.getString(Key.PASSWORD); String dbname = ObReaderUtils.getDbNameFromJdbcUrl(jdbcUrl); String allTable = "__all_table"; conn = DBUtil.getConnection(DataBaseType.OceanBase, jdbcUrl, username, password); ObVersion obVersion = ObReaderUtils.getObVersion(conn); if (obVersion.compareTo(ObVersion.V2276) >= 0 && obVersion.compareTo(ObVersion.V4000) < 0) { allTable = "__all_table_v2"; } String querySubPart = String.format(MYSQL_GET_SUBPART_TEMPLATE, allTable, dbname, table); PartType partType = PartType.SUBPARTITION; // try subpartition first partList = ObReaderUtils.getResultsFromSql(conn, querySubPart); // if table is not sub-partitioned, the try partition if (partList.isEmpty()) { String queryPart = String.format(MYSQL_GET_PART_TEMPLATE, allTable, dbname, table); partList = ObReaderUtils.getResultsFromSql(conn, queryPart); partType = PartType.PARTITION; } if (!partList.isEmpty()) { partInfo = new PartInfo(partType); partInfo.addPart(partList); } } catch (Exception ex) { LOG.error("error when get partition list: " + ex.getMessage()); } finally { DBUtil.closeDBResources(null, conn); } return partInfo; } } ================================================ FILE: oceanbasev10reader/src/main/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/TaskContext.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; import java.sql.Connection; import java.util.Collections; import java.util.List; import com.alibaba.datax.common.element.Record; public class TaskContext { private Connection conn; private final String table; private String indexName; // 辅助索引的字段列表 private List secondaryIndexColumns = Collections.emptyList(); private String querySql; private final String where; private final int fetchSize; private long readBatchSize = -1; private boolean weakRead = true; private String userSavePoint; private String compatibleMode = ObReaderUtils.OB_COMPATIBLE_MODE_MYSQL; public String getPartitionName() { return partitionName; } public void setPartitionName(String partitionName) { this.partitionName = partitionName; } private String partitionName; // 断点续读的保存点 private volatile Record savePoint; // pk在column中的index,用于绑定变量时从savePoint中读取值 // 如果这个值为null,则表示 不是断点续读的场景 private int[] pkIndexs; private final List columns; private String[] pkColumns; private long cost; private final int transferColumnNumber; public TaskContext(String table, List columns, String where, int fetchSize) { super(); this.table = table; this.columns = columns; // 针对只有querySql的场景 this.transferColumnNumber = columns == null ? -1 : columns.size(); this.where = where; this.fetchSize = fetchSize; } public Connection getConn() { return conn; } public void setConn(Connection conn) { this.conn = conn; } public String getIndexName() { return indexName; } public void setIndexName(String indexName) { this.indexName = indexName; } public List getSecondaryIndexColumns() { return secondaryIndexColumns; } public void setSecondaryIndexColumns(List secondaryIndexColumns) { this.secondaryIndexColumns = secondaryIndexColumns; } public String getQuerySql() { if (readBatchSize == -1) { return querySql; } else if (ObReaderUtils.isOracleMode(compatibleMode)) { return String.format("select * from (%s) where rownum <= %d", querySql, readBatchSize); } else { return querySql + " limit " + readBatchSize; } } public void setQuerySql(String querySql) { this.querySql = querySql; } public String getWhere() { return where; } public Record getSavePoint() { return savePoint; } public void setSavePoint(Record savePoint) { this.savePoint = savePoint; } public int[] getPkIndexs() { return pkIndexs; } public void setPkIndexs(int[] pkIndexs) { this.pkIndexs = pkIndexs; } public List getColumns() { return columns; } public String[] getPkColumns() { return pkColumns; } public void setPkColumns(String[] pkColumns) { this.pkColumns = pkColumns; } public String getTable() { return table; } public int getFetchSize() { return fetchSize; } public long getCost() { return cost; } public void addCost(long cost) { this.cost += cost; } public int getTransferColumnNumber() { return transferColumnNumber; } public long getReadBatchSize() { return readBatchSize; } public void setReadBatchSize(long readBatchSize) { this.readBatchSize = readBatchSize; } public boolean getWeakRead() { return weakRead; } public void setWeakRead(boolean weakRead) { this.weakRead = weakRead; } public String getUserSavePoint() { return userSavePoint; } public void setUserSavePoint(String userSavePoint) { this.userSavePoint = userSavePoint; } public String getCompatibleMode() { return compatibleMode; } public void setCompatibleMode(String compatibleMode) { this.compatibleMode = compatibleMode; } } ================================================ FILE: oceanbasev10reader/src/main/resources/plugin.json ================================================ { "name": "oceanbasev10reader", "class": "com.alibaba.datax.plugin.reader.oceanbasev10reader.OceanBaseReader", "description": "read data from oceanbase with SQL interface", "developer": "oceanbase" } ================================================ FILE: oceanbasev10reader/src/test/java/com/alibaba/datax/plugin/reader/oceanbasev10reader/util/ObReaderUtilsTest.java ================================================ package com.alibaba.datax.plugin.reader.oceanbasev10reader.util; import org.junit.Test; public class ObReaderUtilsTest { @Test public void getDbTest() { assert ObReaderUtils.getDbNameFromJdbcUrl("jdbc:mysql://127.0.0.1:3306/testdb").equalsIgnoreCase("testdb"); assert ObReaderUtils.getDbNameFromJdbcUrl("jdbc:oceanbase://127.0.0.1:2883/testdb").equalsIgnoreCase("testdb"); assert ObReaderUtils.getDbNameFromJdbcUrl("||_dsc_ob10_dsc_||obcluster:mysql||_dsc_ob10_dsc_||jdbc:mysql://127.0.0.1:3306/testdb").equalsIgnoreCase("testdb"); assert ObReaderUtils.getDbNameFromJdbcUrl("||_dsc_ob10_dsc_||obcluster:oracle||_dsc_ob10_dsc_||jdbc:oceanbase://127.0.0.1:3306/testdb").equalsIgnoreCase("testdb"); } @Test public void compareObVersionTest() { assert ObReaderUtils.compareObVersion("2.2.70", "3.2.2") == -1; assert ObReaderUtils.compareObVersion("2.2.70", "2.2.50") == 1; assert ObReaderUtils.compareObVersion("2.2.70", "3.1.2") == -1; assert ObReaderUtils.compareObVersion("3.1.2", "3.1.2") == 0; assert ObReaderUtils.compareObVersion("3.2.3.0", "3.2.3.0") == 0; assert ObReaderUtils.compareObVersion("3.2.3.0-CE", "3.2.3.0") == 0; } } ================================================ FILE: oceanbasev10writer/doc/oceanbasev10writer.md ================================================ ## 1 快速介绍 OceanBaseV10Writer 插件实现了写入数据到 OceanBase V1.0以及更高版本数据库的目的表的功能。在底层实现上, OceanbaseV10Writer 通过 java客户端(底层MySQL JDBC或oceanbase client) 连接obproxy远程 OceanBase 数据库,并执行相应的 insert .. on duplicate key update这条sql 语句将数据写入 OceanBase ,内部会分批次提交入库。 Oceanbasev10Writer 面向ETL开发工程师,他们使用 Oceanbasev10Writer 从数仓导入数据到 Oceanbase。同时 Oceanbasev10Writer 亦可以作为数据迁移工具为DBA等用户提供服务。 注意,oceanbasewriter是ob 0.5的writer,oceanbasev10writer是ob 1.0及以后版本的writer。 ## 2 实现原理 Oceanbasev10Writer 通过 DataX 框架获取 Reader 生成的协议数据,生成insert ... on duplicate key update语句,在主键或唯一键冲突时,更新表中的所有字段。目前只有这一种行为,写入模式(只写入不更新)和更新指定字段目前暂未支持。 出于性能考虑,写入采用batch方式批量写,当行数累计到预定阈值时,才发起写入请求。 插件连接ob使用Mysql/Oceanbase JDBC driver通过obproxy连接ob; ## 3 功能说明 ### 3.1 配置样例 - 这里使用一份从内存产生到 Oceanbase 导入的数据。 ``` { "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "oceanbasev10writer", "parameter": { "obWriteMode": "update", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/dbName", "table": [ "test" ] } ], "username": "xxx", "password":"xxx", "batchSize": 256, "memstoreThreshold": "0.9" } } } ] } } ``` - 这里使用一份从内存产生到 Oceanbase 旁路导入的数据。 ``` { "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "oceanbasev10writer", "parameter": { "obWriteMode": "update", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/dbName", "table": [ "test" ] } ], "username": "xxx", "password":"xxx", "batchSize": 256, "directPath": true, "rpcPort": 2882, "parallel": 8, "heartBeatInterval": 1000, "heartBeatTimeout": 6000, "bufferSize": 1048576, "memstoreThreshold": "0.9" } } } ] } } ``` ### 3.2 参数说明 - **jdbcUrl** - 描述:连接ob使用的jdbc url,支持两种格式: - ||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||jdbc:mysql://obproxyIp:obproxyPort/db - 此格式下username仅填写用户名本身,无需三段式写法 - jdbc:mysql://ip:port/db - 此格式下username需要三段式写法 - 必选:是 - 默认值:无 - **table** - 描述:目的表的表名称。开源版obwriter插件仅支持写入一个表。表名中一般不含库名; - 必选:是 - 默认值:无 - **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 ``` **column配置项必须指定,不能留空!** 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、 column 不能配置任何常量值 ``` - 必选:是 - 默认值:否 - **preSql** - 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称.只支持delete语句 - 必选:否 - 默认值:无 - **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与Oceanbase的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。 - 必选:否 - 默认值:1000 - **memstoreThreshold** - 描述:OB租户的memstore使用率,当达到这个阀值的时候暂停导入,等释放内存后继续导入. 防止租户内存溢出 - 必选:否 - 默认值:0.9 - **username** - 描述:访问oceanbase的用户名。注意当jdbcUrl配置为||_dsc_ob10_dsc_||集群名:租户名||_dsc_ob10_dsc_||这样的格式时,此处不配置ob的集群名和租户名。否则需要配置为三段式形式。 - 必选:是 - 默认值:无 - **** password**** - 描述:访问oceanbase的密码 - 必选:是 - 默认值:无 - writerThreadCount - 描述:每个通道(channel)中写入使用的线程数 - 必选:否 - 默认值:1 - directPath - 描述:开启旁路导入 - 必选:否 - 默认值:false - rpcPort - 描述:oceanbase的rpc端口 - 必选:否 - 默认值:无 - parallel - 描述:旁路导入的启用线程数 - 必选:否 - 默认值:1 - bufferSize - 描述:旁路导入的切分数据块大小 - 必选:否 - 默认值:1048576 - heartBeatInterval - 描述:旁路导入的心跳间隔 - 必选:否 - 默认值:1000 - heartBeatTimeout - 描述:旁路导入的心跳超时时间 - 必选:否 - 默认值:6000 ``` **开启了旁路导入,即directPath:true时** 注意:1、此时rpcPort为必填项。 2、设置parallel时,parallel和oceanbase的负载有关。 3、设置heartBeatTimeout最低不能低于6000,heartBeatTimeout的值最低不能低于1000, 当heartBeatTimeout和heartBeatTimeout同时设置时,heartBeatTimeout-heartBeatTimeout的差值不能低于4000。 4、bufferSize的单位为字节数,默认为1M,即1048576。 ``` ## 4 常见问题 ### 4.1 连接断开导致写入失败 Data X写入ob的任务失败,在log中可以发现在写入ob时,连接被断开: ``` 2018-12-14 05:40:48.586 [18705170-3-17-writer] WARN CommonRdbmsWriter$Task - 遇到OB异常,回滚此次写入, 休眠 1秒,采用逐条写入提交,SQLState:S1000 java.sql.SQLException: Could not retrieve transation read-only status server at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:877) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:873) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3603) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3572) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.PreparedStatement.executeBatchInternal(PreparedStatement.java:1225) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.StatementImpl.executeBatch(StatementImpl.java:958) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.alibaba.datax.plugin.writer.oceanbasev10writer.task.MultiTableWriterTask.write(MultiTableWriterTask.java:357) [oceanbasev10writer-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.plugin.writer.oceanbasev10writer.task.MultiTableWriterTask.calcRuleAndDoBatchInsert(MultiTableWriterTask.java:338) [oceanbasev10writer-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.plugin.writer.oceanbasev10writer.task.MultiTableWriterTask.startWrite(MultiTableWriterTask.java:227) [oceanbasev10writer-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.plugin.writer.oceanbasev10writer.OceanBaseV10Writer$Task.startWrite(OceanBaseV10Writer.java:360) [oceanbasev10writer-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:62) [datax-core-0.0.1-SNAPSHOT.jar:na] at java.lang.Thread.run(Thread.java:834) [na:1.8.0_112] Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 5 milliseconds ago. The last packet sent successfully to the server was 4 milliseconds ago. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_112] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_112] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_112] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_112] at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3556) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3897) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2545) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2503) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1369) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3597) ~[mysql-connector-java-5.1.40.jar:5.1.40] ... 9 common frames omitted Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3008) ~[mysql-connector-java-5.1.40.jar:5.1.40] at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3466) ~[mysql-connector-java-5.1.40.jar:5.1.40] ... 17 common frames omitted ``` 关键字:could not retrieve transation status from read-only status server, communication link failure 检查运行Data X任务的机器,发现obproxy在任务运行时发生若干次重启: ![](https://cdn.nlark.com/lark/0/2018/png/97504/1544760936504-948a2699-e21b-4970-ad76-25b6ac1cd89d.png#height=156&id=wutJw&originHeight=156&originWidth=507&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=&width=507) 在第一次obproxy退出的日志里,找到退出原因: ``` [2018-12-14 05:40:47.611683] ERROR [PROXY] do_monitor_mem (ob_proxy_main.cpp:889) [7262][Y0-7F4480213880] [AL=47391-47390-29] obproxy's memroy is out of limit, will be going to commit suicide(mem_limited=838860800, OTHER_MEMORY_SIZE=73400320, is_out_of_mem_limit=true, cur_pos=9) BACKTRACE:0x49db91 0x47fdc9 0x43b115 0x43ee5d 0xa6e623 0xe401b2 0xe3f497 0x4f674c 0x7f4487ace77d 0x7f44865ed9ad [2018-12-14 05:40:47.612334] ERROR [PROXY] do_monitor_mem (ob_proxy_main.cpp:891) [7262][Y0-7F4480213880] [AL=47392-47391-651] history memory size, history_mem_size[0]=765460480 BACKTRACE:0x49db91 0x47fdc9 0x48717a 0x43f121 0xa6e623 0xe401b2 0xe3f497 0x4f674c 0x7f4487ace77d 0x7f44865ed9ad [2018-12-14 05:40:47.612934] ERROR [PROXY] do_monitor_mem (ob_proxy_main.cpp:891) [7262][Y0-7F4480213880] [AL=47393-47392-600] history memory size, history_mem_size[1]=765460480 BACKTRACE:0x49db91 0x47fdc9 0x48717a 0x43f121 0xa6e623 0xe401b2 0xe3f497 0x4f674c 0x7f4487ace77d 0x7f44865ed9ad [2018-12-14 05:40:47.613530] ERROR [PROXY] do_monitor_mem (ob_proxy_main.cpp:891) [7262][Y0-7F4480213880] [AL=47394-47393-596] history memory size, history_mem_size[2]=765460480 BACKTRACE:0x49db91 0x47fdc9 0x48717a 0x43f121 0xa6e623 0xe401b2 0xe3f497 0x4f674c 0x7f4487ace77d 0x7f44865ed9ad [2018-12-14 05:40:47.614121] ERROR [PROXY] do_monitor_mem (ob_proxy_main.cpp:891) [7262][Y0-7F4480213880] [AL=47395-47394-591] history memory size, history_mem_size[3]=765460480 BACKTRACE:0x49db91 0x47fdc9 0x48717a 0x43f121 0xa6e623 0xe401b2 0xe3f497 0x4f674c 0x7f4487ace77d 0x7f44865ed9ad [2018-12-14 05:40:47.614717] ERROR [PROXY] do_monitor_mem (ob_proxy_main.cpp:891) [7262][Y0-7F4480213880] [AL=47396-47395-596] history memory size, history_mem_size[4]=765460480 BACKTRACE:0x49db91 0x47fdc9 0x48717a 0x43f121 0xa6e623 0xe401b2 0xe3f497 0x4f674c 0x7f4487ace77d 0x7f44865ed9ad [2018-12-14 05:40:47.615307] ERROR [PROXY] do_monitor_mem (ob_proxy_main.cpp:891) [7262][Y0-7F4480213880] [AL=47397-47396-590] history memory size, history_mem_size[5]=765460480 BACKTRACE:0x49db91 0x47fdc9 0x48717a 0x43f121 0xa6e623 0xe401b2 0xe3f497 0x4f674c 0x7f4487ace77d 0x7f44865ed9ad ``` 关键字:obproxy's memroy is out of limit, will be going to commit suicide 可以看到,obproxy由于内存不足退出。 #### 解决方案 obproxy在启动时, 可以指定使用内存上限,默认是800M,在某些情况下,比如连接数较多(该失败的任务为写入100张分表,并发数32,因此连接数为3200),可能会导致obproxy内存不够用。要解决该问题,一方面可以调低任务的并发数,另一方面可以调大obproxy的内存限制,比如调整至2G。 ### 4.2 Session interrupted 在使用ob 1.0 writer往单表里写入数据时,遇到以下错误: ``` 2019-01-03 19:37:27.197 [0-insertTask-73] WARN InsertTask - Insert fatal error SqlState =HY000, errorCode = 5066, java.sql.SQLException: Session interrupted, server ip:port[11.145.28.93:2881] ``` 关键字:fatal,Session interrupted,server ip:port 在任务执行的log中,还可以发现如下log: ``` 2019-08-09 11:56:56.758 [2-insertTask-82] ERROR StdoutPluginCollector - java.sql.SQLException: Session interrupted, server ip:port[11.232.58.16:2881] at com.alipay.oceanbase.obproxy.connection.ObGroupConnection.checkAndThrowException(ObGroupConnection.java:431) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.statement.ObStatement.doExecute(ObStatement.java:598) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.statement.ObStatement.execute(ObStatement.java:456) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.statement.ObPreparedStatement.execute(ObPreparedStatement.java:148) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doOneInsert(CommonRdbmsWriter.java:430) ~[plugin-rdbms-util-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.plugin.writer.oceanbasev10writer.task.InsertTask.doMultiInsert(InsertTask.java:196) [oceanbasev10writer-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.plugin.writer.oceanbasev10writer.task.InsertTask.run(InsertTask.java:85) [oceanbasev10writer-0.0.1-SNAPSHOT.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) [na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) [na:1.8.0_112] at java.lang.Thread.run(Thread.java:834) [na:1.8.0_112] Caused by: com.alipay.oceanbase.obproxy.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: INSERT command denied to user 'dwexp'@'%' for table 'mobile_product_version_info' at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_112] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_112] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_112] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_112] at com.alipay.oceanbase.obproxy.mysql.jdbc.Util.handleNewInstance(Util.java:409) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.Util.getInstance(Util.java:384) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4403) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4275) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2706) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2867) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2843) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2085) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1310) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:493) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.statement.ObPreparedStatement.executeOnConnection(ObPreparedStatement.java:121) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.statement.ObStatement.doExecuteOnConnection(ObStatement.java:677) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] at com.alipay.oceanbase.obproxy.statement.ObStatement.doExecute(ObStatement.java:558) ~[oceanbase-connector-java-2.0.8.20180730.jar:na] ... 8 common frames omitted ``` 可以看到,异常是由于没有insert权限(INSERT command denied to user 'dwexp'@'%' for table)引起的。 关键字:INSERT command denied to user 'dwexp'@'%' 可以看到这个错误是由于没有写入权限导致的,因此在observer的log、obproxy的log中都没有相关的信息。 #### 解决方案 在ob中给相关用户授权之后,任务重试即可成功。 参考授权命令为: ```sql grant select, insert, update on dbName.tableName to dwexp; grant select on oceanbase.gv$memstore to dwexp; ``` ================================================ FILE: oceanbasev10writer/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 oceanbasev10writer com.alibaba.datax 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-rdbms-util ${datax-project-version} guava com.google.guava com.alibaba druid org.slf4j slf4j-api ch.qos.logback logback-classic org.springframework spring-test 4.0.4.RELEASE test com.alibaba druid 1.2.18 com.alipay.oceanbase oceanbase-connector-java 3.2.0 system ${basedir}/src/main/libs/oceanbase-connector-java-3.2.0.jar com.alipay.oceanbase oceanbase-client com.oceanbase oceanbase-client 2.4.11 com.google.guava guava com.oceanbase shade-ob-partition-calculator 1.0-SNAPSHOT system ${pom.basedir}/src/main/libs/shade-ob-partition-calculator-1.0-SNAPSHOT.jar com.google.guava guava 27.0-jre log4j log4j 1.2.16 org.json json 20160810 junit junit 4.11 test com.oceanbase obkv-table-client 1.4.0 com.oceanbase obkv-hbase-client 2.1.0 com.alibaba fastjson org.slf4j slf4j-api com.oceanbase oceanbase-client com.google.guava guava commons-lang commons-lang com.alipay.sofa.common sofa-common-tools io.netty netty-codec-dns io.netty netty-codec-http io.netty netty-codec-http2 io.netty netty-codec-haproxy io.netty netty-codec-mqtt io.netty netty-codec-memcache io.netty netty-codec-redis io.netty netty-codec-smtp io.netty netty-codec-socks io.netty netty-codec-stomp io.netty netty-codec-xml io.netty netty-handler-proxy io.netty netty-handler-ssl-ocsp io.netty netty-resolver-dns io.netty netty-resolver-dns-classes-macos io.netty netty-resolver-dns-native-macos io.netty netty-transport-rxtx io.netty netty-transport-udt io.netty netty-transport-sctp com.alipay.sofa.common sofa-common-tools 1.3.11 org.slf4j slf4j-api com.google.guava guava com.alibaba fastjson 1.2.83 commons-lang commons-lang 2.6 mysql mysql-connector-java ${mysql.driver.version} src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: oceanbasev10writer/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/oceanbasev10writer target/ oceanbasev10writer-0.0.1-SNAPSHOT.jar plugin/writer/oceanbasev10writer src/main/libs *.jar plugin/writer/oceanbasev10writer/libs false plugin/writer/oceanbasev10writer/libs runtime ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/Config.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer; public interface Config { String MEMSTORE_THRESHOLD = "memstoreThreshold"; double DEFAULT_MEMSTORE_THRESHOLD = 0.9d; double DEFAULT_SLOW_MEMSTORE_THRESHOLD = 0.75d; String MEMSTORE_CHECK_INTERVAL_SECOND = "memstoreCheckIntervalSecond"; long DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND = 30; int DEFAULT_BATCH_SIZE = 100; int MAX_BATCH_SIZE = 4096; String FAIL_TRY_COUNT = "failTryCount"; int DEFAULT_FAIL_TRY_COUNT = 10000; String WRITER_THREAD_COUNT = "writerThreadCount"; int DEFAULT_WRITER_THREAD_COUNT = 1; String CONCURRENT_WRITE = "concurrentWrite"; boolean DEFAULT_CONCURRENT_WRITE = true; String OB_VERSION = "obVersion"; String TIMEOUT = "timeout"; String PRINT_COST = "printCost"; boolean DEFAULT_PRINT_COST = false; String COST_BOUND = "costBound"; long DEFAULT_COST_BOUND = 20; String MAX_ACTIVE_CONNECTION = "maxActiveConnection"; int DEFAULT_MAX_ACTIVE_CONNECTION = 2000; String WRITER_SUB_TASK_COUNT = "writerSubTaskCount"; int DEFAULT_WRITER_SUB_TASK_COUNT = 1; int MAX_WRITER_SUB_TASK_COUNT = 4096; String OB_WRITE_MODE = "obWriteMode"; String OB_COMPATIBLE_MODE = "obCompatibilityMode"; String OB_COMPATIBLE_MODE_ORACLE = "ORACLE"; String OB_COMPATIBLE_MODE_MYSQL = "MYSQL"; String OCJ_GET_CONNECT_TIMEOUT = "ocjGetConnectTimeout"; int DEFAULT_OCJ_GET_CONNECT_TIMEOUT = 5000; // 5s String OCJ_PROXY_CONNECT_TIMEOUT = "ocjProxyConnectTimeout"; int DEFAULT_OCJ_PROXY_CONNECT_TIMEOUT = 5000; // 5s String OCJ_CREATE_RESOURCE_TIMEOUT = "ocjCreateResourceTimeout"; int DEFAULT_OCJ_CREATE_RESOURCE_TIMEOUT = 60000; // 60s String OB_UPDATE_COLUMNS = "obUpdateColumns"; String USE_PART_CALCULATOR = "usePartCalculator"; boolean DEFAULT_USE_PART_CALCULATOR = false; String BLOCKS_COUNT = "blocksCount"; String DIRECT_PATH = "directPath"; String RPC_PORT = "rpcPort"; // 区别于recordLimit,这个参数仅针对某张表。即一张表超过最大错误数不会影响其他表。仅用于旁路导入。 String MAX_ERRORS = "maxErrors"; } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/OceanBaseV10Writer.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil; import com.alibaba.datax.plugin.writer.oceanbasev10writer.task.ConcurrentTableWriterTask; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.DbUtils; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.List; /** * 2016-04-07 *

* 专门针对OceanBase1.0的Writer * * @author biliang.wbl * */ public class OceanBaseV10Writer extends Writer { private static DataBaseType DATABASE_TYPE = DataBaseType.OceanBase; /** * Job 中的方法仅执行一次,Task 中方法会由框架启动多个 Task 线程并行执行。 *

* 整个 Writer 执行流程是: * *

	 * Job类init-->prepare-->split
	 * 
	 *                          Task类init-->prepare-->startWrite-->post-->destroy
	 *                          Task类init-->prepare-->startWrite-->post-->destroy
	 * 
	 *                                                                            Job类post-->destroy
	 * 
*/ public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonJob; private static final Logger LOG = LoggerFactory.getLogger(Job.class); /** * 注意:此方法仅执行一次。 最佳实践:通常在这里对用户的配置进行校验:是否缺失必填项?有无错误值?有没有无关配置项?... * 并给出清晰的报错/警告提示。校验通常建议采用静态工具类进行,以保证本类结构清晰。 */ @Override public void init() { this.originalConfig = super.getPluginJobConf(); checkCompatibleMode(originalConfig); //将config中的column和table中的关键字进行转义 List columns = originalConfig.getList(Key.COLUMN, String.class); ObWriterUtils.escapeDatabaseKeyword(columns); originalConfig.set(Key.COLUMN, columns); List conns = originalConfig.getList(Constant.CONN_MARK, JSONObject.class); for (int i = 0; i < conns.size(); i++) { JSONObject conn = conns.get(i); Configuration connConfig = Configuration.from(conn.toString()); List tables = connConfig.getList(Key.TABLE, String.class); ObWriterUtils.escapeDatabaseKeyword(tables); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.TABLE), tables); } this.commonJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonJob.init(this.originalConfig); } /** * 注意:此方法仅执行一次。 最佳实践:如果 Job 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) @Override public void prepare() { int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); if (tableNumber == 1) { this.commonJob.prepare(this.originalConfig); final String version = fetchServerVersion(originalConfig); ObWriterUtils.setObVersion(version); originalConfig.set(Config.OB_VERSION, version); } String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); // 获取presql配置,并执行 List preSqls = originalConfig.getList(Key.PRE_SQL, String.class); if (preSqls == null || preSqls.size() == 0) { return; } List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); for (Object connConfObject : conns) { Configuration connConf = Configuration.from(connConfObject.toString()); // 这里的 jdbcUrl 已经 append 了合适后缀参数 String jdbcUrl = connConf.getString(Key.JDBC_URL); List tableList = connConf.getList(Key.TABLE, String.class); for (String table : tableList) { List renderedPreSqls = WriterUtil.renderPreOrPostSqls(preSqls, table); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { Connection conn = DBUtil.getConnection(DATABASE_TYPE, jdbcUrl, username, password); LOG.info("Begin to execute preSqls:[{}]. context info:{}.", StringUtils.join(renderedPreSqls, ";"), jdbcUrl); WriterUtil.executeSqls(conn, renderedPreSqls, jdbcUrl, DATABASE_TYPE); ObWriterUtils.asyncClose(null, null, conn); } } } if (LOG.isDebugEnabled()) { LOG.debug("After job prepare(), originalConfig now is:[\n{}\n]", originalConfig.toJSON()); } } /** * 注意:此方法仅执行一次。 最佳实践:通常采用工具静态类完成把 Job 配置切分成多个 Task 配置的工作。 这里的 * mandatoryNumber 是强制必须切分的份数。 */ @Override public List split(int mandatoryNumber) { int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); if (tableNumber == 1) { return this.commonJob.split(this.originalConfig, mandatoryNumber); } Configuration simplifiedConf = this.originalConfig; List splitResultConfigs = new ArrayList(); for (int j = 0; j < mandatoryNumber; j++) { splitResultConfigs.add(simplifiedConf.clone()); } return splitResultConfigs; } /** * 注意:此方法仅执行一次。 最佳实践:如果 Job 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ @Override public void post() { int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); if (tableNumber == 1) { commonJob.post(this.originalConfig); return; } String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); List postSqls = originalConfig.getList(Key.POST_SQL, String.class); if (postSqls == null || postSqls.size() == 0) { return; } for (Object connConfObject : conns) { Configuration connConf = Configuration.from(connConfObject.toString()); String jdbcUrl = connConf.getString(Key.JDBC_URL); List tableList = connConf.getList(Key.TABLE, String.class); for (String table : tableList) { List renderedPostSqls = WriterUtil.renderPreOrPostSqls(postSqls, table); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { // 说明有 postSql 配置,则此处删除掉 Connection conn = DBUtil.getConnection(DATABASE_TYPE, jdbcUrl, username, password); LOG.info("Begin to execute postSqls:[{}]. context info:{}.", StringUtils.join(renderedPostSqls, ";"), jdbcUrl); WriterUtil.executeSqls(conn, renderedPostSqls, jdbcUrl, DATABASE_TYPE); ObWriterUtils.asyncClose(null, null, conn); } } } originalConfig.remove(Key.POST_SQL); } /** * 注意:此方法仅执行一次。 最佳实践:通常配合 Job 中的 post() 方法一起完成 Job 的资源释放。 */ @Override public void destroy() { this.commonJob.destroy(this.originalConfig); } private String fetchServerVersion(Configuration config) { final String fetchVersionSql = "show variables like 'version_comment'"; String versionComment = DbUtils.fetchSingleValueWithRetry(config, fetchVersionSql); return versionComment.split(" ")[1]; } private void checkCompatibleMode(Configuration configure) { final String fetchCompatibleModeSql = "SHOW VARIABLES LIKE 'ob_compatibility_mode'"; String compatibleMode = DbUtils.fetchSingleValueWithRetry(configure, fetchCompatibleModeSql); ObWriterUtils.setCompatibleMode(compatibleMode); configure.set(Config.OB_COMPATIBLE_MODE, compatibleMode); } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration writerSliceConfig; private CommonRdbmsWriter.Task writerTask; /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:此处通过对 taskConfig 配置的读取,进而初始化一些资源为 * startWrite()做准备。 */ @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); int tableNumber = writerSliceConfig.getInt(Constant.TABLE_NUMBER_MARK); if (tableNumber == 1) { // always use concurrentTableWriter this.writerTask = new ConcurrentTableWriterTask(DATABASE_TYPE); } else { throw new RuntimeException("writing to multi-tables is not supported."); } LOG.info("tableNumber:" + tableNumber + ",writerTask Class:" + writerTask.getClass().getName()); this.writerTask.init(this.writerSliceConfig); } /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:如果 Task * 中有需要进行数据同步之前的处理,可以在此处完成,如果没有必要则可以直接去掉。 */ @Override public void prepare() { this.writerTask.prepare(this.writerSliceConfig); } /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:此处适当封装确保简洁清晰完成数据写入工作。 */ @Override public void startWrite(RecordReceiver recordReceiver) { this.writerTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:如果 Task 中有需要进行数据同步之后的后续处理,可以在此处完成。 */ @Override public void post() { this.writerTask.post(this.writerSliceConfig); } /** * 注意:此方法每个 Task 都会执行一次。 最佳实践:通常配合Task 中的 post() 方法一起完成 Task 的资源释放。 */ @Override public void destroy() { this.writerTask.destroy(this.writerSliceConfig); } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/common/Table.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.common; import java.util.Objects; public class Table { private String tableName; private String dbName; private Throwable error; private Status status; public Table(String dbName, String tableName) { this.dbName = dbName; this.tableName = tableName; this.status = Status.INITIAL; } public Throwable getError() { return error; } public void setError(Throwable error) { this.error = error; } public Status getStatus() { return status; } public void setStatus(Status status) { this.status = status; } @Override public boolean equals(Object o) { if (this == o) { return true; } if (o == null || getClass() != o.getClass()) { return false; } Table table = (Table) o; return tableName.equals(table.tableName) && dbName.equals(table.dbName); } @Override public int hashCode() { return Objects.hash(tableName, dbName); } public enum Status { /** * */ INITIAL(0), /** * */ RUNNING(1), /** * */ FAILURE(2), /** * */ SUCCESS(3); private int code; /** * @param code */ private Status(int code) { this.code = code; } public int getCode() { return code; } public void setCode(int code) { this.code = code; } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/common/TableCache.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.common; import java.util.concurrent.ConcurrentHashMap; public class TableCache { private static final TableCache INSTANCE = new TableCache(); private final ConcurrentHashMap TABLE_CACHE; private TableCache() { TABLE_CACHE = new ConcurrentHashMap<>(); } public static TableCache getInstance() { return INSTANCE; } public Table getTable(String dbName, String tableName) { String fullTableName = String.join("-", dbName, tableName); return TABLE_CACHE.computeIfAbsent(fullTableName, (k) -> new Table(dbName, tableName)); } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/directPath/AbstractRestrictedConnection.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath; import java.sql.Array; import java.sql.Blob; import java.sql.CallableStatement; import java.sql.Clob; import java.sql.DatabaseMetaData; import java.sql.NClob; import java.sql.PreparedStatement; import java.sql.SQLClientInfoException; import java.sql.SQLException; import java.sql.SQLWarning; import java.sql.SQLXML; import java.sql.Savepoint; import java.sql.Statement; import java.sql.Struct; import java.util.Map; import java.util.Properties; import java.util.concurrent.Executor; public abstract class AbstractRestrictedConnection implements java.sql.Connection { @Override public CallableStatement prepareCall(String sql) throws SQLException { throw new UnsupportedOperationException("prepareCall(String) is unsupported"); } @Override public String nativeSQL(String sql) throws SQLException { throw new UnsupportedOperationException("nativeSQL(String) is unsupported"); } @Override public void setAutoCommit(boolean autoCommit) throws SQLException { throw new UnsupportedOperationException("setAutoCommit(boolean) is unsupported"); } @Override public boolean getAutoCommit() throws SQLException { throw new UnsupportedOperationException("getAutoCommit is unsupported"); } @Override public void abort(Executor executor) throws SQLException { throw new UnsupportedOperationException("abort(Executor) is unsupported"); } @Override public void setNetworkTimeout(Executor executor, int milliseconds) throws SQLException { throw new UnsupportedOperationException("setNetworkTimeout(Executor, int) is unsupported"); } @Override public int getNetworkTimeout() throws SQLException { throw new UnsupportedOperationException("getNetworkTimeout is unsupported"); } @Override public DatabaseMetaData getMetaData() throws SQLException { throw new UnsupportedOperationException("getMetaData is unsupported"); } @Override public void setReadOnly(boolean readOnly) throws SQLException { throw new UnsupportedOperationException("setReadOnly(boolean) is unsupported"); } @Override public boolean isReadOnly() throws SQLException { throw new UnsupportedOperationException("isReadOnly is unsupported"); } @Override public void setCatalog(String catalog) throws SQLException { throw new UnsupportedOperationException("setCatalog(String) is unsupported"); } @Override public String getCatalog() throws SQLException { throw new UnsupportedOperationException("getCatalog is unsupported"); } @Override public void setTransactionIsolation(int level) throws SQLException { throw new UnsupportedOperationException("setTransactionIsolation(int) is unsupported"); } @Override public int getTransactionIsolation() throws SQLException { throw new UnsupportedOperationException("getTransactionIsolation is unsupported"); } @Override public SQLWarning getWarnings() throws SQLException { throw new UnsupportedOperationException("getWarnings is unsupported"); } @Override public void clearWarnings() throws SQLException { throw new UnsupportedOperationException("clearWarnings is unsupported"); } @Override public Statement createStatement(int resultSetType, int resultSetConcurrency) throws SQLException { throw new UnsupportedOperationException("createStatement(int, int) is unsupported"); } @Override public PreparedStatement prepareStatement(String sql, int resultSetType, int resultSetConcurrency) throws SQLException { throw new UnsupportedOperationException("prepareStatement(String, int, int) is unsupported"); } @Override public CallableStatement prepareCall(String sql, int resultSetType, int resultSetConcurrency) throws SQLException { throw new UnsupportedOperationException("prepareCall(String, int, int) is unsupported"); } @Override public Map> getTypeMap() throws SQLException { throw new UnsupportedOperationException("getTypeMap is unsupported"); } @Override public void setTypeMap(Map> map) throws SQLException { throw new UnsupportedOperationException("setTypeMap(Map>) is unsupported"); } @Override public void setHoldability(int holdability) throws SQLException { throw new UnsupportedOperationException("setHoldability is unsupported"); } @Override public int getHoldability() throws SQLException { throw new UnsupportedOperationException("getHoldability is unsupported"); } @Override public Savepoint setSavepoint() throws SQLException { throw new UnsupportedOperationException("setSavepoint is unsupported"); } @Override public Savepoint setSavepoint(String name) throws SQLException { throw new UnsupportedOperationException("setSavepoint(String) is unsupported"); } @Override public void rollback(Savepoint savepoint) throws SQLException { throw new UnsupportedOperationException("rollback(Savepoint) is unsupported"); } @Override public void releaseSavepoint(Savepoint savepoint) throws SQLException { throw new UnsupportedOperationException("releaseSavepoint(Savepoint) is unsupported"); } @Override public Statement createStatement(int resultSetType, int resultSetConcurrency, int resultSetHoldability) throws SQLException { throw new UnsupportedOperationException("createStatement(int, int, int) is unsupported"); } @Override public PreparedStatement prepareStatement(String sql, int resultSetType, int resultSetConcurrency, int resultSetHoldability) throws SQLException { throw new UnsupportedOperationException("prepareStatement(String, int, int, int) is unsupported"); } @Override public CallableStatement prepareCall(String sql, int resultSetType, int resultSetConcurrency, int resultSetHoldability) throws SQLException { throw new UnsupportedOperationException("prepareCall(String, int, int, int) is unsupported"); } @Override public PreparedStatement prepareStatement(String sql, int autoGeneratedKeys) throws SQLException { throw new UnsupportedOperationException("prepareStatement(String, int) is unsupported"); } @Override public PreparedStatement prepareStatement(String sql, int[] columnIndexes) throws SQLException { throw new UnsupportedOperationException("prepareStatement(String, int[]) is unsupported"); } @Override public PreparedStatement prepareStatement(String sql, String[] columnNames) throws SQLException { throw new UnsupportedOperationException("prepareStatement(String, String[]) is unsupported"); } @Override public Clob createClob() throws SQLException { throw new UnsupportedOperationException("createClob is unsupported"); } @Override public Blob createBlob() throws SQLException { throw new UnsupportedOperationException("createBlob is unsupported"); } @Override public NClob createNClob() throws SQLException { throw new UnsupportedOperationException("createNClob is unsupported"); } @Override public SQLXML createSQLXML() throws SQLException { throw new UnsupportedOperationException("createSQLXML is unsupported"); } @Override public boolean isValid(int timeout) throws SQLException { throw new UnsupportedOperationException("isValid(int) is unsupported"); } @Override public void setClientInfo(String name, String value) throws SQLClientInfoException { throw new UnsupportedOperationException("setClientInfo(String, String) is unsupported"); } @Override public void setClientInfo(Properties properties) throws SQLClientInfoException { throw new UnsupportedOperationException("setClientInfo(Properties) is unsupported"); } @Override public String getClientInfo(String name) throws SQLException { throw new UnsupportedOperationException("getClientInfo(String) is unsupported"); } @Override public Properties getClientInfo() throws SQLException { throw new UnsupportedOperationException("getClientInfo is unsupported"); } @Override public Array createArrayOf(String typeName, Object[] elements) throws SQLException { throw new UnsupportedOperationException("createArrayOf(String, Object[]) is unsupported"); } @Override public Struct createStruct(String typeName, Object[] attributes) throws SQLException { throw new UnsupportedOperationException("createStruct(String, Object[]) is unsupported"); } @Override public void setSchema(String schema) throws SQLException { throw new UnsupportedOperationException("setSchema(String) is unsupported"); } @Override public T unwrap(Class iface) throws SQLException { throw new UnsupportedOperationException("unwrap(Class) is unsupported"); } @Override public boolean isWrapperFor(Class iface) throws SQLException { throw new UnsupportedOperationException("isWrapperFor(Class) is unsupported"); } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/directPath/AbstractRestrictedPreparedStatement.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath; import java.io.InputStream; import java.io.Reader; import java.math.BigDecimal; import java.math.BigInteger; import java.net.URL; import java.nio.charset.Charset; import java.sql.Array; import java.sql.Blob; import java.sql.Clob; import java.sql.Date; import java.sql.NClob; import java.sql.ParameterMetaData; import java.sql.Ref; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.RowId; import java.sql.SQLException; import java.sql.SQLWarning; import java.sql.SQLXML; import java.sql.Time; import java.sql.Timestamp; import java.time.Instant; import java.time.LocalDate; import java.time.LocalDateTime; import java.time.LocalTime; import java.time.OffsetDateTime; import java.time.OffsetTime; import java.time.ZonedDateTime; import java.util.Calendar; import java.util.List; import com.alipay.oceanbase.rpc.protocol.payload.impl.ObObj; import com.alipay.oceanbase.rpc.protocol.payload.impl.ObObjType; import com.alipay.oceanbase.rpc.util.ObVString; import org.apache.commons.io.IOUtils; public abstract class AbstractRestrictedPreparedStatement implements java.sql.PreparedStatement { private boolean closed; @Override public void setNull(int parameterIndex, int sqlType) throws SQLException { this.setParameter(parameterIndex, createObObj(null)); } @Override public void setNull(int parameterIndex, int sqlType, String typeName) throws SQLException { throw new UnsupportedOperationException("setNull(int, int, String) is unsupported"); } @Override public void setBoolean(int parameterIndex, boolean x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setByte(int parameterIndex, byte x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setShort(int parameterIndex, short x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setInt(int parameterIndex, int x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setLong(int parameterIndex, long x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setFloat(int parameterIndex, float x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setDouble(int parameterIndex, double x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setBigDecimal(int parameterIndex, BigDecimal x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setString(int parameterIndex, String x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setBytes(int parameterIndex, byte[] x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setDate(int parameterIndex, Date x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setDate(int parameterIndex, Date x, Calendar cal) throws SQLException { throw new UnsupportedOperationException("setDate(int, Date, Calendar) is unsupported"); } @Override public void setTime(int parameterIndex, Time x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setTime(int parameterIndex, Time x, Calendar cal) throws SQLException { throw new UnsupportedOperationException("setTime(int, Time, Calendar) is unsupported"); } @Override public void setTimestamp(int parameterIndex, Timestamp x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setTimestamp(int parameterIndex, Timestamp x, Calendar cal) throws SQLException { throw new UnsupportedOperationException("setTimestamp(int, Timestamp, Calendar) is unsupported"); } @Override public void setObject(int parameterIndex, Object x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setObject(int parameterIndex, Object x, int targetSqlType) throws SQLException { throw new UnsupportedOperationException("setObject(int, Object, int) is unsupported"); } @Override public void setObject(int parameterIndex, Object x, int targetSqlType, int scaleOrLength) throws SQLException { throw new UnsupportedOperationException("setObject(int, Object, int, int) is unsupported"); } @Override public void setRef(int parameterIndex, Ref x) throws SQLException { throw new UnsupportedOperationException("setRef(int, Ref) is unsupported"); } @Override public void setArray(int parameterIndex, Array x) throws SQLException { throw new UnsupportedOperationException("setArray(int, Array) is unsupported"); } @Override public void setSQLXML(int parameterIndex, SQLXML xmlObject) throws SQLException { throw new UnsupportedOperationException("setSQLXML(int, SQLXML) is unsupported"); } @Override public void setURL(int parameterIndex, URL x) throws SQLException { // if (x == null) { // this.setParameter(parameterIndex, createObObj(x)); // } else { // // TODO If need BackslashEscapes and character encoding ? // this.setParameter(parameterIndex, createObObj(x.toString())); // } throw new UnsupportedOperationException("setURL(int, URL) is unsupported"); } @Override public void setRowId(int parameterIndex, RowId x) throws SQLException { throw new UnsupportedOperationException("setRowId(int, RowId) is unsupported"); } @Override public void setNString(int parameterIndex, String value) throws SQLException { this.setParameter(parameterIndex, createObObj(value)); } @Override public void setBlob(int parameterIndex, Blob x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setBlob(int parameterIndex, InputStream x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setBlob(int parameterIndex, InputStream x, long length) throws SQLException { throw new UnsupportedOperationException("setBlob(int, InputStream, length) is unsupported"); } @Override public void setClob(int parameterIndex, Clob x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setClob(int parameterIndex, Reader x) throws SQLException { this.setCharacterStream(parameterIndex, x); } @Override public void setClob(int parameterIndex, Reader x, long length) throws SQLException { throw new UnsupportedOperationException("setClob(int, Reader, length) is unsupported"); } @Override public void setNClob(int parameterIndex, NClob x) throws SQLException { this.setClob(parameterIndex, (Clob) (x)); } @Override public void setNClob(int parameterIndex, Reader x) throws SQLException { this.setClob(parameterIndex, x); } @Override public void setNClob(int parameterIndex, Reader x, long length) throws SQLException { throw new UnsupportedOperationException("setNClob(int, Reader, length) is unsupported"); } @Override public void setAsciiStream(int parameterIndex, InputStream x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Deprecated @Override public void setUnicodeStream(int parameterIndex, InputStream x, int length) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setAsciiStream(int parameterIndex, InputStream x, int length) throws SQLException { throw new UnsupportedOperationException("setAsciiStream(int, InputStream, length) is unsupported"); } @Override public void setAsciiStream(int parameterIndex, InputStream x, long length) throws SQLException { throw new UnsupportedOperationException("setAsciiStream(int, InputStream, length) is unsupported"); } @Override public void setBinaryStream(int parameterIndex, InputStream x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setBinaryStream(int parameterIndex, InputStream x, int length) throws SQLException { throw new UnsupportedOperationException("setBinaryStream(int, InputStream, length) is unsupported"); } @Override public void setBinaryStream(int parameterIndex, InputStream x, long length) throws SQLException { throw new UnsupportedOperationException("setBinaryStream(int, InputStream, length) is unsupported"); } @Override public void setCharacterStream(int parameterIndex, Reader x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setCharacterStream(int parameterIndex, Reader x, int length) throws SQLException { throw new UnsupportedOperationException("setCharacterStream(int, InputStream, length) is unsupported"); } @Override public void setCharacterStream(int parameterIndex, Reader x, long length) throws SQLException { throw new UnsupportedOperationException("setCharacterStream(int, InputStream, length) is unsupported"); } @Override public void setNCharacterStream(int parameterIndex, Reader x) throws SQLException { this.setParameter(parameterIndex, createObObj(x)); } @Override public void setNCharacterStream(int parameterIndex, Reader x, long length) throws SQLException { throw new UnsupportedOperationException("setNCharacterStream(int, InputStream, length) is unsupported"); } /** * @return boolean */ protected abstract boolean isOracleMode(); /** * Set parameter to the target position. * * @param parameterIndex * @param obObj * @throws SQLException */ protected abstract void setParameter(int parameterIndex, ObObj obObj) throws SQLException; /** * Close the current prepared statement. * * @throws SQLException */ @Override public void close() throws SQLException { this.closed = true; } /** * Return whether the current prepared statement is closed? * * @return boolean * @throws SQLException */ @Override public boolean isClosed() throws SQLException { return this.closed; } /** * Create a {@link ObObj } array with input values. * * @param values Original row value * @return ObObj[] */ public ObObj[] createObObjArray(Object[] values) { if (values == null) { return null; } ObObj[] array = new ObObj[values.length]; for (int i = 0; i < values.length; i++) { array[i] = createObObj(values[i]); } return array; } /** * Create a {@link ObObj } array with input values. * * @param values Original row value * @return ObObj[] */ public ObObj[] createObObjArray(List values) { if (values == null) { return null; } ObObj[] array = new ObObj[values.size()]; for (int i = 0; i < values.size(); i++) { array[i] = createObObj(values.get(i)); } return array; } /** * Create a {@link ObObj } instance. * * @param value Original column value * @return ObObj */ public ObObj createObObj(Object value) { try { // Only used for strongly typed declared variables Object convertedValue = value == null ? null : convertValue(value); return new ObObj(ObObjType.defaultObjMeta(convertedValue), convertedValue); } catch (Exception ex) { throw new IllegalArgumentException(ex); } } /** * Some values with data type is unsupported by ObObjType#valueOfType. * We should convert the input value to supported value data type. * * @param value * @return Object * @throws Exception */ public static Object convertValue(Object value) throws Exception { if (value instanceof BigDecimal) { return value.toString(); } else if (value instanceof BigInteger) { return value.toString(); } else if (value instanceof Instant) { return Timestamp.from(((Instant) value)); } else if (value instanceof LocalDate) { // Warn: java.sql.Date.valueOf() is deprecated. As local zone is used. return Date.valueOf(((LocalDate) value)); } else if (value instanceof LocalTime) { // Warn: java.sql.Time.valueOf() is deprecated. Time t = Time.valueOf((LocalTime) value); return new Timestamp(t.getTime()); } else if (value instanceof LocalDateTime) { return Timestamp.valueOf(((LocalDateTime) value)); } else if (value instanceof OffsetDateTime) { return Timestamp.from(((OffsetDateTime) value).toInstant()); } else if (value instanceof Time) { return new Timestamp(((Time) value).getTime()); } else if (value instanceof ZonedDateTime) { // Note: Be care of time zone!!! return Timestamp.from(((ZonedDateTime) value).toInstant()); } else if (value instanceof OffsetTime) { LocalTime lt = ((OffsetTime) value).toLocalTime(); // Warn: java.sql.Time.valueOf() is deprecated. return new Timestamp(Time.valueOf(lt).getTime()); } else if (value instanceof InputStream) { try (InputStream is = ((InputStream) value)) { // Note: Be care of character set!!! return new ObVString(IOUtils.toString(is, Charset.defaultCharset())); } } else if (value instanceof Blob) { Blob b = (Blob) value; try (InputStream is = b.getBinaryStream()) { if (is == null) { return null; } // Note: Be care of character set!!! return new ObVString(IOUtils.toString(is, Charset.defaultCharset())); } finally { b.free(); } } else if (value instanceof Reader) { try (Reader r = ((Reader) value)) { return IOUtils.toString(r); } } else if (value instanceof Clob) { Clob c = (Clob) value; try (Reader r = c.getCharacterStream()) { return r == null ? null : IOUtils.toString(r); } finally { c.free(); } } else { return value; } } // *********************************************************************************** // @Override public boolean getMoreResults(int current) throws SQLException { throw new UnsupportedOperationException("getMoreResults(int) is unsupported"); } @Override public ResultSet getGeneratedKeys() throws SQLException { throw new UnsupportedOperationException("getGeneratedKeys is unsupported"); } @Override public int executeUpdate(String sql, int autoGeneratedKeys) throws SQLException { throw new UnsupportedOperationException("executeUpdate(String, int) is unsupported"); } @Override public int executeUpdate(String sql, int[] columnIndexes) throws SQLException { throw new UnsupportedOperationException("executeUpdate(String, int[]) is unsupported"); } @Override public int executeUpdate(String sql, String[] columnNames) throws SQLException { throw new UnsupportedOperationException("executeUpdate(String, String[]) is unsupported"); } @Override public boolean execute(String sql, int autoGeneratedKeys) throws SQLException { throw new UnsupportedOperationException("execute(String, int) is unsupported"); } @Override public boolean execute(String sql, int[] columnIndexes) throws SQLException { throw new UnsupportedOperationException("execute(String, int[]) is unsupported"); } @Override public boolean execute(String sql, String[] columnNames) throws SQLException { throw new UnsupportedOperationException("execute(String, String[]) is unsupported"); } @Override public int getResultSetHoldability() throws SQLException { throw new UnsupportedOperationException("getResultSetHoldability is unsupported"); } @Override public void setPoolable(boolean poolable) throws SQLException { throw new UnsupportedOperationException("setPoolable(boolean) is unsupported"); } @Override public boolean isPoolable() throws SQLException { throw new UnsupportedOperationException("isPoolable is unsupported"); } @Override public void closeOnCompletion() throws SQLException { throw new UnsupportedOperationException("closeOnCompletion is unsupported"); } @Override public boolean isCloseOnCompletion() throws SQLException { throw new UnsupportedOperationException("isCloseOnCompletion is unsupported"); } @Override public ResultSet executeQuery(String sql) throws SQLException { throw new UnsupportedOperationException("executeQuery(String) is unsupported"); } @Override public int executeUpdate(String sql) throws SQLException { throw new UnsupportedOperationException("executeUpdate(String) is unsupported"); } @Override public int getMaxFieldSize() throws SQLException { throw new UnsupportedOperationException("getMaxFieldSize is unsupported"); } @Override public void setMaxFieldSize(int max) throws SQLException { throw new UnsupportedOperationException("setMaxFieldSize(int) is unsupported"); } @Override public int getMaxRows() throws SQLException { throw new UnsupportedOperationException("getMaxRows is unsupported"); } @Override public void setMaxRows(int max) throws SQLException { throw new UnsupportedOperationException("setMaxRows(int) is unsupported"); } @Override public void setEscapeProcessing(boolean enable) throws SQLException { throw new UnsupportedOperationException("setEscapeProcessing(boolean) is unsupported"); } @Override public int getQueryTimeout() throws SQLException { throw new UnsupportedOperationException("getQueryTimeout is unsupported"); } @Override public void setQueryTimeout(int seconds) throws SQLException { throw new UnsupportedOperationException("setQueryTimeout(int) is unsupported"); } @Override public void cancel() throws SQLException { throw new UnsupportedOperationException("cancel is unsupported"); } @Override public SQLWarning getWarnings() throws SQLException { throw new UnsupportedOperationException("getWarnings is unsupported"); } @Override public void clearWarnings() throws SQLException { throw new UnsupportedOperationException("clearWarnings is unsupported"); } @Override public void setCursorName(String name) throws SQLException { throw new UnsupportedOperationException("setCursorName(String) is unsupported"); } @Override public boolean execute(String sql) throws SQLException { throw new UnsupportedOperationException("execute(String) is unsupported"); } @Override public ResultSet getResultSet() throws SQLException { throw new UnsupportedOperationException("getResultSet is unsupported"); } @Override public int getUpdateCount() throws SQLException { throw new UnsupportedOperationException("getUpdateCount is unsupported"); } @Override public boolean getMoreResults() throws SQLException { throw new UnsupportedOperationException("getMoreResults is unsupported"); } @Override public void setFetchDirection(int direction) throws SQLException { throw new UnsupportedOperationException("setFetchDirection(int) is unsupported"); } @Override public int getFetchDirection() throws SQLException { throw new UnsupportedOperationException("getFetchDirection is unsupported"); } @Override public void setFetchSize(int rows) throws SQLException { throw new UnsupportedOperationException("setFetchSize(int) is unsupported"); } @Override public int getFetchSize() throws SQLException { throw new UnsupportedOperationException("getFetchSize is unsupported"); } @Override public int getResultSetConcurrency() throws SQLException { throw new UnsupportedOperationException("getResultSetConcurrency is unsupported"); } @Override public int getResultSetType() throws SQLException { throw new UnsupportedOperationException("getResultSetType is unsupported"); } @Override public void addBatch(String sql) throws SQLException { throw new UnsupportedOperationException("addBatch(String) is unsupported"); } @Override public ResultSet executeQuery() throws SQLException { throw new UnsupportedOperationException("executeQuery is unsupported"); } @Override public int executeUpdate() throws SQLException { throw new UnsupportedOperationException("executeUpdate is unsupported"); } @Override public boolean execute() throws SQLException { throw new UnsupportedOperationException("execute is unsupported"); } @Override public ParameterMetaData getParameterMetaData() throws SQLException { throw new UnsupportedOperationException("getParameterMetaData is unsupported"); } @Override public ResultSetMetaData getMetaData() throws SQLException { throw new UnsupportedOperationException("getMetaData is unsupported"); } @Override public T unwrap(Class iface) throws SQLException { throw new UnsupportedOperationException("isWrapperFor(Class) is unsupported"); } @Override public boolean isWrapperFor(Class iface) throws SQLException { throw new UnsupportedOperationException("isWrapperFor(Class) is unsupported"); } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/directPath/DirectLoaderBuilder.java ================================================ /* * Copyright 2024 OceanBase. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath; import java.io.Serializable; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadConnection; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadManager; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadStatement; import com.alipay.oceanbase.rpc.direct_load.exception.ObDirectLoadException; import com.alipay.oceanbase.rpc.exception.ObTableException; import com.alipay.oceanbase.rpc.protocol.payload.impl.ObLoadDupActionType; import org.apache.commons.lang.ObjectUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * The builder for {@link ObTableDirectLoad}. */ public class DirectLoaderBuilder implements Serializable { private static final Logger log = LoggerFactory.getLogger(DirectLoaderBuilder.class); private String host; private int port; private String user; private String tenant; private String password; private String schema; private String table; /** * Server-side parallelism. */ private int parallel; private long maxErrorCount; private ObLoadDupActionType duplicateKeyAction; /** * The overall timeout of the direct load task */ private Long timeout; private Long heartBeatTimeout; private Long heartBeatInterval; public DirectLoaderBuilder host(String host) { this.host = host; return this; } public DirectLoaderBuilder port(int port) { this.port = port; return this; } public DirectLoaderBuilder user(String user) { //1.4.0的obkv版本只需要用户名称,不能带租户和集群信息 int indexOf = user.indexOf("@"); this.user = user; if (indexOf > 0) { this.user = user.substring(0, indexOf); } return this; } public DirectLoaderBuilder tenant(String tenant) { this.tenant = tenant; return this; } public DirectLoaderBuilder password(String password) { this.password = password; return this; } public DirectLoaderBuilder schema(String schema) { this.schema = schema; return this; } public DirectLoaderBuilder table(String table) { this.table = table; return this; } public DirectLoaderBuilder parallel(int parallel) { this.parallel = parallel; return this; } public DirectLoaderBuilder maxErrorCount(long maxErrorCount) { this.maxErrorCount = maxErrorCount; return this; } public DirectLoaderBuilder duplicateKeyAction(ObLoadDupActionType duplicateKeyAction) { this.duplicateKeyAction = duplicateKeyAction; return this; } public DirectLoaderBuilder timeout(long timeout) { this.timeout = timeout; return this; } public DirectLoaderBuilder heartBeatTimeout(Long heartBeatTimeout) { this.heartBeatTimeout = heartBeatTimeout; return this; } public DirectLoaderBuilder heartBeatInterval(Long heartBeatInterval) { this.heartBeatInterval = heartBeatInterval; return this; } public ObTableDirectLoad build() { try { ObDirectLoadConnection obDirectLoadConnection = buildConnection(parallel); ObDirectLoadStatement obDirectLoadStatement = buildStatement(obDirectLoadConnection); return new ObTableDirectLoad(schema, table, obDirectLoadStatement, obDirectLoadConnection); } catch (ObDirectLoadException e) { throw new ObTableException(e.getMessage(), e); } } private ObDirectLoadConnection buildConnection(int writeThreadNum) throws ObDirectLoadException { if (heartBeatTimeout == null || heartBeatInterval == null) { throw new IllegalArgumentException("heartBeatTimeout and heartBeatInterval must not be null"); } ObDirectLoadConnection build = ObDirectLoadManager.getConnectionBuilder() .setServerInfo(host, port) .setLoginInfo(tenant, user, password, schema) .setHeartBeatInfo(heartBeatTimeout, heartBeatInterval) .enableParallelWrite(writeThreadNum) .build(); log.info("ObDirectLoadConnection value is:{}", ObjectUtils.toString(build)); return build; } private ObDirectLoadStatement buildStatement(ObDirectLoadConnection connection) throws ObDirectLoadException { ObDirectLoadStatement build = connection.getStatementBuilder() .setTableName(table) .setParallel(parallel) .setQueryTimeout(timeout) .setDupAction(duplicateKeyAction) .setMaxErrorRowCount(maxErrorCount) .build(); log.info("ObDirectLoadStatement value is:{}", ObjectUtils.toString(build)); return build; } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/directPath/DirectPathConnection.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath; import java.sql.SQLException; import java.util.Arrays; import com.alibaba.datax.common.util.Configuration; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadBucket; import com.alipay.oceanbase.rpc.protocol.payload.impl.ObLoadDupActionType; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import static com.google.common.base.Preconditions.checkArgument; public class DirectPathConnection extends AbstractRestrictedConnection { private static final int OB_DIRECT_PATH_DEFAULT_BLOCKS = 1; private static final long OB_DIRECT_PATH_HEART_BEAT_TIMEOUT = 60000; private static final long OB_DIRECT_PATH_HEART_BEAT_INTERVAL = 10000; private static final int DEFAULT_BUFFERSIZE = 1048576; private final Configuration configuration; private State state; private int commiters; private final int blocks; private final ObTableDirectLoad load; private final Object lock = new Object(); private static final Logger log = LoggerFactory.getLogger(DirectPathConnection.class); /** * Construct a new instance. * * @param load * @param blocks */ private DirectPathConnection(ObTableDirectLoad load, int blocks, Configuration configuration) { this.configuration = configuration; this.load = load; this.blocks = blocks; } /** * Begin a new {@link DirectPathConnection } * * @return DirectPathConnection * @throws SQLException */ public DirectPathConnection begin() throws SQLException { synchronized (lock) { if (state == null || state == State.CLOSED) { try { this.load.begin(); this.state = State.BEGIN; } catch (Exception ex) { throw new SQLException(ex); } } else { throw new IllegalStateException("Begin transaction failed as connection state is already BEGIN"); } } return this; } /** * Commit buffered data with MAXIMUM timeout. * * @throws SQLException */ @Override public void commit() throws SQLException { synchronized (lock) { if (state == State.BEGIN) { this.commiters++; if (commiters == blocks) { try { this.load.commit(); state = State.FINISHED; } catch (Exception ex) { throw new SQLException(ex); } } else if (commiters > blocks) { throw new IllegalStateException("Your commit have exceed the limit. (" + commiters + ">" + blocks + ")"); } } else { throw new IllegalStateException("Commit transaction failed as connection state is not BEGIN"); } } } /** * Rollback if error occurred. * * @throws SQLException */ @Override public void rollback() throws SQLException { synchronized (lock) { if (state == State.BEGIN) { try { //obkv-table-client-2.1.0的close方法包含回滚逻辑 this.load.close(); } catch (Exception ex) { throw new SQLException(ex); } } else { throw new IllegalStateException("Rollback transaction failed as connection state is not BEGIN"); } } } /** * Close this connection. */ @Override public void close() { synchronized (lock) { // Closed only if state is BEGIN this.load.close(); this.state = State.CLOSED; } } /** * @return DirectPathPreparedStatement */ @Override public DirectPathPreparedStatement createStatement() throws SQLException { return this.prepareStatement(null); } /** * A new batch need create a new {@link DirectPathPreparedStatement }. * The {@link DirectPathPreparedStatement } can not be reuse, otherwise it may cause duplicate records. * * @return DirectPathStatement */ @Override public DirectPathPreparedStatement prepareStatement(String sql) throws SQLException { if (state == State.BEGIN) { Integer bufferSize = configuration.getInt(DirectPathConstants.BUFFERSIZE, DEFAULT_BUFFERSIZE); log.info("The current bufferSize size is{}", bufferSize); return new DirectPathPreparedStatement(this, bufferSize); } else { throw new IllegalStateException("Create statement failed as connection state is not BEGIN"); } } /** * Return the schema name of this connection instance. * * @return String */ @Override public String getSchema() { if (state == State.BEGIN) { return this.load.getTable().getDatabase(); } else { throw new IllegalStateException("Get schema failed as connection state is not BEGIN"); } } /** * Return the table name of this connection instance. * * @return String */ public String getTableName() { if (state == State.BEGIN) { return this.load.getTableName(); } else { throw new IllegalStateException("Get table failed as connection state is not BEGIN"); } } /** * Return whether this connection is closed. * * @return boolean */ @Override public boolean isClosed() { synchronized (lock) { return this.state == State.CLOSED; } } public boolean isFinished() { return this.state.equals(State.FINISHED); } /** * Insert bucket data into buffer. * * @param bucket * @return int[] * @throws SQLException */ int[] insert(ObDirectLoadBucket bucket) throws SQLException { try { this.load.write(bucket); int[] result = new int[bucket.getRowNum()]; Arrays.fill(result, 1); return result; } catch (Exception ex) { throw new SQLException(ex); } } /** * Indicates the state of {@link DirectPathConnection } */ enum State { /** * Begin transaction */ BEGIN, /** * Transaction is finished, ready to close. */ FINISHED, /** * Transaction is closed. */ CLOSED; } /** * This builder used to build a new {@link DirectPathConnection } */ public static class Builder { private String host; private int port; private String user; private String tenant; private String password; private String schema; private String table; /** * Client job count. */ private int blocks = OB_DIRECT_PATH_DEFAULT_BLOCKS; /** * Server threads used to sort. */ private int parallel; private long maxErrorCount; private ObLoadDupActionType duplicateKeyAction; // Used for load data private long serverTimeout; private Configuration configuration; public Builder host(String host) { this.host = host; return this; } public Builder port(int port) { this.port = port; return this; } public Builder user(String user) { this.user = user; return this; } public Builder tenant(String tenant) { this.tenant = tenant; return this; } public Builder password(String password) { this.password = password; return this; } public Builder schema(String schema) { this.schema = schema; return this; } public Builder table(String table) { this.table = table; return this; } public Builder blocks(int blocks) { this.blocks = blocks; return this; } public Builder parallel(int parallel) { this.parallel = parallel; return this; } public Builder maxErrorCount(long maxErrorCount) { this.maxErrorCount = maxErrorCount; return this; } public Builder duplicateKeyAction(ObLoadDupActionType duplicateKeyAction) { this.duplicateKeyAction = duplicateKeyAction; return this; } public Builder serverTimeout(long serverTimeout) { this.serverTimeout = serverTimeout; return this; } public Builder configuration(Configuration configuration) { this.configuration = configuration; return this; } /** * Build a new {@link DirectPathConnection } * * @return DirectPathConnection */ public DirectPathConnection build() throws Exception { return createConnection(host, port, user, tenant, password, schema, table, // blocks, parallel, maxErrorCount, duplicateKeyAction, serverTimeout, duplicateKeyAction).begin(); } /** * Create a new {@link DirectPathConnection } * * @param host * @param port * @param user * @param tenant * @param password * @param schema * @param table * @param parallel * @param maxErrorCount * @param action * @param serverTimeout * @return DirectPathConnection * @throws Exception */ DirectPathConnection createConnection(String host, int port, String user, String tenant, String password, String schema, String table, // int blocks, int parallel, long maxErrorCount, ObLoadDupActionType action, long serverTimeout, ObLoadDupActionType obLoadDupActionType) throws Exception { checkArgument(StringUtils.isNotBlank(host), "Host is null.(host=%s)", host); checkArgument((port > 0 && port < 65535), "Port is invalid.(port=%s)", port); checkArgument(StringUtils.isNotBlank(user), "User Name is null.(user=%s)", user); checkArgument(StringUtils.isNotBlank(tenant), "Tenant Name is null.(tenant=%s)", tenant); checkArgument(StringUtils.isNotBlank(schema), "Schema Name is null.(schema=%s)", schema); checkArgument(StringUtils.isNotBlank(table), "Table Name is null.(table=%s)", table); checkArgument(blocks > 0, "Client Blocks is invalid.(blocks=%s)", blocks); checkArgument(parallel > 0, "Server Parallel is invalid.(parallel=%s)", parallel); checkArgument(maxErrorCount > -1, "MaxErrorCount is invalid.(maxErrorCount=%s)", maxErrorCount); checkArgument(action != null, "ObLoadDupActionType is null.(obLoadDupActionType=%s)", action); checkArgument(serverTimeout > 0, "Server timeout is invalid.(timeout=%s)", serverTimeout); Long heartBeatTimeout = 0L; Long heartBeatInterval = 0L; if (configuration != null) { heartBeatTimeout = configuration.getLong(DirectPathConstants.HEART_BEAT_TIMEOUT, OB_DIRECT_PATH_HEART_BEAT_TIMEOUT); heartBeatInterval = configuration.getLong(DirectPathConstants.HEART_BEAT_INTERVAL, OB_DIRECT_PATH_HEART_BEAT_INTERVAL); parallel = configuration.getInt(DirectPathConstants.PARALLEL, parallel); } DirectLoaderBuilder builder = new DirectLoaderBuilder() .host(host).port(port) .user(user) .tenant(tenant) .password(password) .schema(schema) .table(table) .parallel(parallel) .maxErrorCount(maxErrorCount) .timeout(serverTimeout) .duplicateKeyAction(obLoadDupActionType) .heartBeatTimeout(heartBeatTimeout) .heartBeatInterval(heartBeatInterval); ObTableDirectLoad directLoad = builder.build(); return new DirectPathConnection(directLoad, blocks, configuration); } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/directPath/DirectPathConstants.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath; public class DirectPathConstants { // 以下常量已在DirectPathConnection中被正确使用 public static final String HEART_BEAT_TIMEOUT = "heartBeatTimeout"; public static final String HEART_BEAT_INTERVAL = "heartBeatInterval"; public static final String PARALLEL = "parallel"; public static final String BUFFERSIZE = "bufferSize"; } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/directPath/DirectPathPreparedStatement.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath; import java.sql.SQLException; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.OptionalInt; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadBucket; import com.alipay.oceanbase.rpc.direct_load.exception.ObDirectLoadException; import com.alipay.oceanbase.rpc.protocol.payload.impl.ObObj; import static com.google.common.base.Preconditions.checkArgument; public class DirectPathPreparedStatement extends AbstractRestrictedPreparedStatement { private ObDirectLoadBucket bucket; private final DirectPathConnection conn; private final Map parameters; private final Integer bufferSize; private static final int DEFAULT_BUFFERSIZE = 1048576; public static final int[] EMPTY_ARRAY = new int[0]; /** * Construct a new {@link DirectPathConnection } instance. * * @param conn */ public DirectPathPreparedStatement(DirectPathConnection conn) { this.conn = conn; this.parameters = new HashMap<>(); this.bufferSize = DEFAULT_BUFFERSIZE; this.bucket = new ObDirectLoadBucket(); } public DirectPathPreparedStatement(DirectPathConnection conn, Integer bufferSize) { this.conn = conn; this.parameters = new HashMap<>(); this.bufferSize = bufferSize; this.bucket = new ObDirectLoadBucket(bufferSize); } /** * Return current direct path connection. * * @return DirectPathConnection * @throws SQLException */ @Override public DirectPathConnection getConnection() throws SQLException { return this.conn; } /** * Copy a new row data avoid overwrite. * * @throws SQLException */ @Override public void addBatch() throws SQLException { checkRange(); ObObj[] objObjArray = new ObObj[parameters.size()]; for (Map.Entry entry : parameters.entrySet()) { objObjArray[entry.getKey() - 1] = entry.getValue(); } this.addBatch(objObjArray); } /** * Add a new row into buffer with input original value list. * * @param values One original row data. */ public void addBatch(List values) { this.addBatch(createObObjArray(values)); } /** * Add a new row into buffer with input original value array. * * @param values One original row data. */ public void addBatch(Object[] values) { this.addBatch(createObObjArray(values)); } /** * Add a new row into buffer with input ObObj array. * * @param arr One row data described as ObObj. */ private void addBatch(ObObj[] arr) { checkArgument(arr != null && arr.length > 0, "Input values is null"); try { this.bucket.addRow(arr); } catch (ObDirectLoadException e) { throw new RuntimeException(e); } } /** * Buffered the row data in memory. (defined in the bucket) * You must invoke {@code ObDirectLoadBucket.clearBatch } after executeBatch. * * @return int[] * @throws SQLException */ @Override public int[] executeBatch() throws SQLException { return this.bucket.isEmpty() ? EMPTY_ARRAY : this.conn.insert(bucket); } /** * Clear batch is always recreate a new {@link ObDirectLoadBucket} */ @Override public void clearBatch() { this.parameters.clear(); this.bucket = new ObDirectLoadBucket(bufferSize); } /** * Clear the holder parameters. * * @throws SQLException */ @Override public void clearParameters() throws SQLException { this.parameters.clear(); } /** * @return boolean */ @Override public boolean isOracleMode() { return false; } /** * Set parameter to the target position. * * @param parameterIndex Start From 1 * @param obObj Convert original value to {@link ObObj } * @throws SQLException */ @Override protected void setParameter(int parameterIndex, ObObj obObj) throws SQLException { checkArgument(parameterIndex > 0, "Parameter index should start from 1"); this.parameters.put(parameterIndex, obObj); } /** * Avoid range exception: *

* Map.put(1, "abc"); * Map.put(5, "def"); // Error: parameter index is 5, but 2 values exists. */ private void checkRange() { OptionalInt optionalInt = parameters.keySet().stream().mapToInt(e -> e).max(); int parameterIndex = optionalInt.orElseThrow(() -> new IllegalArgumentException("No parameter index found")); checkArgument(parameterIndex == parameters.size(), "Parameter index(%s) is unmatched with value list(%s)", parameterIndex, parameters.size()); } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/directPath/ObTableDirectLoad.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath; import java.sql.SQLException; import java.util.Objects; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadBucket; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadConnection; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadStatement; import com.alipay.oceanbase.rpc.direct_load.ObDirectLoadTraceId; import com.alipay.oceanbase.rpc.direct_load.exception.ObDirectLoadException; import com.alipay.oceanbase.rpc.direct_load.protocol.payload.ObTableLoadClientStatus; import com.alipay.oceanbase.rpc.table.ObTable; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * Wrapper of the direct-load API for OceanBase. */ public class ObTableDirectLoad implements AutoCloseable { private static final Logger LOG = LoggerFactory.getLogger(ObTableDirectLoad.class); private final String tableName; private final String schemaTableName; private final ObDirectLoadStatement statement; private final ObDirectLoadConnection connection; public ObTableDirectLoad(String schemaName, String tableName, ObDirectLoadStatement statement, ObDirectLoadConnection connection) { Objects.requireNonNull(schemaName, "schemaName must not be null"); Objects.requireNonNull(tableName, "tableName must not be null"); Objects.requireNonNull(statement, "statement must not be null"); Objects.requireNonNull(connection, "connection must not be null"); this.tableName = tableName; this.schemaTableName = String.format("%s.%s", schemaName, tableName); this.statement = statement; this.connection = connection; } /** * Begin the direct load operation. * * @throws ObDirectLoadException if an error occurs during the operation. */ public void begin() throws ObDirectLoadException { statement.begin(); } /** * Write data into the direct load operation. * * @param bucket The data bucket to write. * @throws SQLException if writing fails. */ public void write(ObDirectLoadBucket bucket) throws SQLException { try { if (bucket == null || bucket.isEmpty()) { throw new IllegalArgumentException("Bucket must not be null or empty."); } LOG.info("Writing {} rows to table: {}", bucket.getRowNum(), schemaTableName); statement.write(bucket); LOG.info("Successfully wrote bucket data to table: {}", schemaTableName); } catch (ObDirectLoadException e) { LOG.error("Failed to write to table: {}", schemaTableName, e); throw new SQLException(String.format("Failed to write to table: %s", schemaTableName), e); } } /** * Commit the current direct load operation. * * @throws SQLException if commit fails. */ public void commit() throws SQLException { try { LOG.info("Committing direct load for table: {}", schemaTableName); statement.commit(); LOG.info("Successfully committed direct load for table: {}", schemaTableName); } catch (ObDirectLoadException e) { LOG.error("Failed to commit for table: {}", schemaTableName, e); throw new SQLException(String.format("Failed to commit for table: %s", schemaTableName), e); } } /** * Close the direct load operation. */ public void close() { LOG.info("Closing direct load for table: {}", schemaTableName); statement.close(); connection.close(); LOG.info("Direct load closed for table: {}", schemaTableName); } /** * Gets the status from the current connection based on the traceId */ public ObTableLoadClientStatus getStatus() throws SQLException { ObDirectLoadTraceId traceId = statement.getTraceId(); // Check if traceId is null and throw an exception with a clear message if (traceId == null) { throw new SQLException("traceId is null."); } // Retrieve the status using the traceId ObTableLoadClientStatus status = statement.getConnection().getProtocol().getHeartBeatRpc(traceId).getStatus(); if (status == null) { LOG.info("Direct load connect protocol heartBeatRpc for table is null: {}", schemaTableName); throw new SQLException("status is null."); } // Return status if not null; otherwise, return ERROR return status; } /** * Gets the current table */ public ObTable getTable() { try { return this.statement.getObTablePool().getControlObTable(); } catch (ObDirectLoadException e) { throw new RuntimeException(e); } } public String getTableName() { if (StringUtils.isBlank(tableName)) { throw new IllegalArgumentException("tableName is blank."); } return tableName; } /** * Inserts data into the direct load operation. * * @param bucket The data bucket containing rows to insert. * @throws SQLException if an error occurs during the insert operation. */ public void insert(ObDirectLoadBucket bucket) throws SQLException { LOG.info("Inserting {} rows to table: {}", bucket.getRowNum(), schemaTableName); if (bucket.isEmpty()) { LOG.warn("Parameter 'bucket' must not be empty."); throw new IllegalArgumentException("Parameter 'bucket' must not be empty."); } try { // Perform the insertion into the load operation statement.write(bucket); LOG.info("Successfully inserted data into table: {}", schemaTableName); } catch (Exception ex) { LOG.error("Unexpected error during insert operation for table: {}", schemaTableName, ex); throw new SQLException("Unexpected error during insert operation.", ex); } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/AbstractConnHolder.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; public abstract class AbstractConnHolder { private static final Logger LOG = LoggerFactory.getLogger(AbstractConnHolder.class); protected final Configuration config; protected Connection conn; protected String jdbcUrl; protected String userName; protected String password; protected AbstractConnHolder(Configuration config, String jdbcUrl, String userName, String password) { this.config = config; this.jdbcUrl = jdbcUrl; this.userName = userName; this.password = password; } public AbstractConnHolder(Configuration config) { this.config = config; } public abstract Connection initConnection(); public Configuration getConfig() { return config; } public Connection getConn() { try { if (conn != null && !conn.isClosed()) { return conn; } } catch (Exception e) { LOG.warn("judge connection is closed or not failed. try to reconnect.", e); } return reconnect(); } public Connection reconnect() { DBUtil.closeDBResources(null, conn); return initConnection(); } public abstract String getJdbcUrl(); public abstract String getUserName(); public abstract void destroy(); public abstract void doCommit(); } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ConnHolder.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import java.sql.Connection; public abstract class ConnHolder { protected final Configuration config; protected Connection conn; public ConnHolder(Configuration config) { this.config = config; } public abstract Connection initConnection(); public Configuration getConfig() { return config; } public Connection getConn() { return conn; } public Connection reconnect() { DBUtil.closeDBResources(null, conn); return initConnection(); } public abstract String getJdbcUrl(); public abstract String getUserName(); public abstract void destroy(); } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DataBaseWriterBuffer.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import java.sql.Connection; import java.util.ArrayList; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * * @author oceanbase * */ public class DataBaseWriterBuffer { private static final Logger LOG = LoggerFactory.getLogger(DataBaseWriterBuffer.class); private final AbstractConnHolder connHolder; private final String dbName; private Map> tableBuffer = new HashMap>(); private long lastCheckMemstoreTime; public DataBaseWriterBuffer(Configuration config,String jdbcUrl, String userName, String password,String dbName){ this.connHolder = new ObClientConnHolder(config, jdbcUrl, userName, password); this.dbName=dbName; } public AbstractConnHolder getConnHolder(){ return connHolder; } public void initTableBuffer(List tableList) { for (String table : tableList) { tableBuffer.put(table, new LinkedList()); } } public List getTableList(){ return new ArrayList(tableBuffer.keySet()); } public void addRecord(Record record, String tableName) { LinkedList recordList = tableBuffer.get(tableName); if (recordList == null) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, String.format("The [table] calculated based on the rules does not exist. The calculated [tableName]=%s, [db]=%s. Please check the rules you configured.", tableName, connHolder.getJdbcUrl())); } recordList.add(record); } public Map> getTableBuffer() { return tableBuffer; } public String getDbName() { return dbName; } public long getLastCheckMemstoreTime() { return lastCheckMemstoreTime; } public void setLastCheckMemstoreTime(long lastCheckMemstoreTime) { this.lastCheckMemstoreTime = lastCheckMemstoreTime; } /** * 检查当前DB的memstore使用状态 *

* 若超过阈值,则休眠 * * @param memstoreCheckIntervalSecond * @param memstoreThreshold */ public synchronized void checkMemstore(long memstoreCheckIntervalSecond, double memstoreThreshold) { long now = System.currentTimeMillis(); if (now - getLastCheckMemstoreTime() < 1000 * memstoreCheckIntervalSecond) { return; } LOG.debug(String.format("checking memstore usage: lastCheckTime=%d, now=%d, check interval=%d, threshold=%f", getLastCheckMemstoreTime(), now, memstoreCheckIntervalSecond, memstoreThreshold)); Connection conn = getConnHolder().getConn(); while (ObWriterUtils.isMemstoreFull(conn, memstoreThreshold)) { LOG.warn("OB memstore is full,sleep 60 seconds, jdbc=" + getConnHolder().getJdbcUrl() + ",threshold=" + memstoreThreshold); ObWriterUtils.sleep(60000); } setLastCheckMemstoreTime(now); } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DirectPathAbstractConnHolder.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import java.sql.Connection; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public abstract class DirectPathAbstractConnHolder { private static final Logger LOG = LoggerFactory.getLogger(AbstractConnHolder.class); protected Configuration config; protected String jdbcUrl; protected String userName; protected String password; protected Connection conn; protected DirectPathAbstractConnHolder(Configuration config, String jdbcUrl, String userName, String password) { this.config = config; this.jdbcUrl = jdbcUrl; this.userName = userName; this.password = password; } public Connection reconnect() { DBUtil.closeDBResources(null, conn); return initConnection(); } public Connection getConn() { if (conn == null) { return initConnection(); } else { try { if (conn.isClosed()) { return reconnect(); } return conn; } catch (Exception e) { LOG.debug("can not judge whether the hold connection is closed or not, just reuse the hold connection"); return conn; } } } public String getJdbcUrl() { return jdbcUrl; } public Configuration getConfig() { return config; } public void doCommit() {} public abstract void destroy(); public abstract Connection initConnection(); } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/DirectPathConnHolder.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import java.sql.Connection; import java.sql.SQLException; import java.util.Optional; import java.util.concurrent.ConcurrentHashMap; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import com.alibaba.datax.plugin.writer.oceanbasev10writer.common.Table; import com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath.DirectPathConnection; import com.alipay.oceanbase.rpc.protocol.payload.impl.ObLoadDupActionType; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class DirectPathConnHolder extends AbstractConnHolder { private static final Logger LOG = LoggerFactory.getLogger(DirectPathConnHolder.class); /** * The server side timeout. */ private static final long SERVER_TIMEOUT = 24L * 60 * 60 * 1000 * 1000; private static final ConcurrentHashMap cache = new ConcurrentHashMap<>(); private String tableName; private String host; private int rpcPort; private String tenantName; private String databaseName; private int blocks; private int threads; private int maxErrors; private ObLoadDupActionType duplicateKeyAction; public DirectPathConnHolder(Configuration config, ServerConnectInfo connectInfo, String tableName, int threadsPerChannel) { super(config, connectInfo.jdbcUrl, connectInfo.userName, connectInfo.password); // direct path: //● publicCloud & odp - single or full //● publicCloud & observer - not support //● !publicCloud & odp - full //● !publicCloud & observer - single this.userName = connectInfo.getFullUserName(); this.host = connectInfo.host; this.rpcPort = connectInfo.rpcPort; this.tenantName = connectInfo.tenantName; if (!connectInfo.publicCloud && StringUtils.isEmpty(tenantName)) { throw new IllegalStateException("tenant name is needed when using direct path load in private cloud."); } this.databaseName = connectInfo.databaseName; this.tableName = tableName; this.blocks = config.getInt(Config.BLOCKS_COUNT, 1); this.threads = threadsPerChannel * Math.min(blocks, 32); this.maxErrors = config.getInt(Config.MAX_ERRORS, 0); this.duplicateKeyAction = "insert".equalsIgnoreCase(config.getString(Config.OB_WRITE_MODE)) ? ObLoadDupActionType.IGNORE : ObLoadDupActionType.REPLACE; } @Override public Connection initConnection() { synchronized (cache) { conn = cache.computeIfAbsent(new Table(databaseName, tableName), e -> { try { return new DirectPathConnection.Builder().host(host) // .port(rpcPort) // .tenant(tenantName) // .user(userName) // .password(Optional.ofNullable(password).orElse("")) // .schema(databaseName) // .table(tableName) // .blocks(blocks) // .parallel(threads) // .maxErrorCount(maxErrors) // .duplicateKeyAction(duplicateKeyAction) // .serverTimeout(SERVER_TIMEOUT) // .configuration(config) .build(); } catch (Exception ex) { throw DataXException.asDataXException(DBUtilErrorCode.CONN_DB_ERROR, ex); } }); } return conn; } public String getJdbcUrl() { return ""; } public String getUserName() { return ""; } @Override public void destroy() { if (conn != null && ((DirectPathConnection) conn).isFinished()) { DBUtil.closeDBResources(null, conn); } } @Override public void doCommit() { try { if (conn != null) { conn.commit(); } } catch (SQLException e) { throw new RuntimeException(e); } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OBDataSourceV10.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import java.sql.Connection; import java.sql.SQLException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import com.alipay.oceanbase.obproxy.datasource.ObGroupDataSource; import com.alipay.oceanbase.obproxy.exception.ConnectionPropertiesNotSupportedException; import com.alipay.oceanbase.obproxy.util.StringParser.IllegalFormatException; import com.google.common.collect.Maps; public class OBDataSourceV10 { private static final Logger LOG = LoggerFactory.getLogger(OBDataSourceV10.class); private static final Map dataSources = Maps.newHashMap(); private static int ocjGetConnectionTimeout = 0; private static int ocjGlobalProxyroGetConnectionTimeout = 0; private static int ocjMaxWaitOfCreateClusterResourceMs = 0; private static Configuration taskConfig; public static String genKey(String fullUserName, String dbName) { //username@tenantName#clusterName/dbName return fullUserName + "/" + dbName; } public static synchronized void init(Configuration configuration, final String fullUsername, final String password, final String dbName) { taskConfig = configuration; final String rsUrl = ""; final String dataSourceKey = genKey(fullUsername, dbName); final int maxActiveConnection = configuration.getInt(Config.MAX_ACTIVE_CONNECTION, Config.DEFAULT_MAX_ACTIVE_CONNECTION); if (dataSources.containsKey(dataSourceKey)) { dataSources.get(dataSourceKey).increseRefercnce(); } else { long timeout = configuration.getInt(Config.TIMEOUT, 30); if (timeout < 30) { timeout = 30; } if (ocjGetConnectionTimeout == 0) { ocjGetConnectionTimeout = configuration.getInt(Config.OCJ_GET_CONNECT_TIMEOUT, Config.DEFAULT_OCJ_GET_CONNECT_TIMEOUT); ocjGlobalProxyroGetConnectionTimeout = configuration.getInt(Config.OCJ_PROXY_CONNECT_TIMEOUT, Config.DEFAULT_OCJ_PROXY_CONNECT_TIMEOUT); ocjMaxWaitOfCreateClusterResourceMs = configuration.getInt(Config.OCJ_CREATE_RESOURCE_TIMEOUT, Config.DEFAULT_OCJ_CREATE_RESOURCE_TIMEOUT); LOG.info(String.format("initializing OCJ with ocjGetConnectionTimeout=%d, " + "ocjGlobalProxyroGetConnectionTimeout=%d, ocjMaxWaitOfCreateClusterResourceMs=%d", ocjGetConnectionTimeout, ocjGlobalProxyroGetConnectionTimeout, ocjMaxWaitOfCreateClusterResourceMs)); } DataSourceHolder holder = null; try { holder = new DataSourceHolder(rsUrl, fullUsername, password, dbName, maxActiveConnection, timeout); dataSources.put(dataSourceKey, holder); } catch (ConnectionPropertiesNotSupportedException e) { e.printStackTrace(); throw new DataXException(ObDataSourceErrorCode.DESC, "connect error"); } catch (IllegalArgumentException e) { e.printStackTrace(); throw new DataXException(ObDataSourceErrorCode.DESC, "connect error"); } catch (IllegalFormatException e) { e.printStackTrace(); throw new DataXException(ObDataSourceErrorCode.DESC, "connect error"); } catch (SQLException e) { e.printStackTrace(); throw new DataXException(ObDataSourceErrorCode.DESC, "connect error"); } } } public static synchronized void destory(final String dataSourceKey){ DataSourceHolder holder = dataSources.get(dataSourceKey); holder.decreaseReference(); if (holder.canClose()) { dataSources.remove(dataSourceKey); holder.close(); LOG.info(String.format("close datasource success [%s]", dataSourceKey)); } } public static Connection getConnection(final String url) { Connection conn = null; try { conn = dataSources.get(url).getconnection(); } catch (SQLException e) { e.printStackTrace(); } return conn; } private static Map buildJdbcProperty() { Map property = new HashMap(); property.put("useServerPrepStmts", "false"); property.put("characterEncoding", "UTF-8"); property.put("useLocalSessionState", "false"); property.put("rewriteBatchedStatements", "true"); property.put("socketTimeout", "25000"); return property; } private static class DataSourceHolder { private volatile int reference; private final ObGroupDataSource groupDataSource; public static final Map jdbcProperty = buildJdbcProperty();; public DataSourceHolder(final String rsUrl, final String fullUsername, final String password, final String dbName, final int maxActive, final long timeout) throws ConnectionPropertiesNotSupportedException, IllegalFormatException, IllegalArgumentException, SQLException { this.reference = 1; this.groupDataSource = new ObGroupDataSource(); this.groupDataSource.setUrl(rsUrl); this.groupDataSource.setFullUsername(fullUsername); this.groupDataSource.setPassword(password); this.groupDataSource.setDatabase(dbName); this.groupDataSource.setConnectionProperties(jdbcProperty); this.groupDataSource.setGetConnectionTimeout(ocjGetConnectionTimeout); this.groupDataSource.setGlobalProxyroGetConnectionTimeout(ocjGlobalProxyroGetConnectionTimeout); this.groupDataSource.setMaxWaitOfCreateClusterResourceMs(ocjMaxWaitOfCreateClusterResourceMs); this.groupDataSource.setMaxActive(maxActive); this.groupDataSource.setGlobalSlowQueryThresholdUs(3000000); // 3s, sql with response time more than 3s will be logged this.groupDataSource.setGlobalCleanLogFileEnabled(true); // enable log cleanup this.groupDataSource.setGlobalLogFileSizeThreshold(17179869184L); // 16G, log file total size this.groupDataSource.setGlobalCleanLogFileInterval(10000); // 10s, check interval this.groupDataSource.setInitialSize(1); List initSqls = new ArrayList(); if (taskConfig != null) { List sessionConfig = taskConfig.getList(Key.SESSION, new ArrayList(), String.class); if (sessionConfig != null || sessionConfig.size() > 0) { initSqls.addAll(sessionConfig); } } // set up for writing timestamp columns if (ObWriterUtils.isOracleMode()) { initSqls.add("ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';"); initSqls.add("ALTER SESSION SET NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF';"); initSqls.add("ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT='YYYY-MM-DD HH24:MI:SS.FF TZR TZD';"); } this.groupDataSource.setConnectionInitSqls(initSqls); this.groupDataSource.init(); // this.groupDataSource; LOG.info("Create GroupDataSource rsUrl=[{}], fullUserName=[{}], dbName=[{}], getConnectionTimeout= {}ms, maxActive={}", rsUrl, fullUsername, dbName, 5000, maxActive); } public Connection getconnection() throws SQLException { return groupDataSource.getConnection(); } public synchronized void increseRefercnce() { this.reference++; } public synchronized void decreaseReference() { this.reference--; } public synchronized boolean canClose() { return reference == 0; } public synchronized void close() { if (this.canClose()) { groupDataSource.destroy(); } } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/OCJConnHolder.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import java.sql.Connection; import java.sql.SQLException; import com.alibaba.datax.common.util.Configuration; /** * wrap oceanbase java client * * @author oceanbase */ public class OCJConnHolder extends AbstractConnHolder { private ServerConnectInfo connectInfo; private String dataSourceKey; public OCJConnHolder(Configuration config, ServerConnectInfo connInfo) { super(config); this.connectInfo = connInfo; this.dataSourceKey = OBDataSourceV10.genKey(connectInfo.getFullUserName(), connectInfo.databaseName); OBDataSourceV10.init(config, connectInfo.getFullUserName(), connectInfo.password, connectInfo.databaseName); } @Override public Connection initConnection() { conn = OBDataSourceV10.getConnection(dataSourceKey); return conn; } @Override public String getJdbcUrl() { return connectInfo.jdbcUrl; } @Override public String getUserName() { return connectInfo.userName; } public void destroy() { OBDataSourceV10.destory(this.dataSourceKey); } public void doCommit() { try { if (conn != null) { conn.commit(); } } catch (SQLException e) { throw new RuntimeException(e); } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObClientConnHolder.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import java.sql.Connection; import java.sql.SQLException; import java.util.ArrayList; import java.util.List; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; /** * 数据库连接代理对象,负责创建连接,重新连接 * * @author oceanbase */ public class ObClientConnHolder extends AbstractConnHolder { private final String jdbcUrl; private final String userName; private final String password; public ObClientConnHolder(Configuration config, String jdbcUrl, String userName, String password) { super(config); this.jdbcUrl = jdbcUrl; this.userName = userName; this.password = password; } // Connect to ob with obclient and obproxy @Override public Connection initConnection() { String BASIC_MESSAGE = String.format("jdbcUrl:[%s]", this.jdbcUrl); DataBaseType dbType = DataBaseType.OceanBase; if (ObWriterUtils.isOracleMode()) { // set up for writing timestamp columns List sessionConfig = config.getList(Key.SESSION, new ArrayList(), String.class); sessionConfig.add("ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'"); sessionConfig.add("ALTER SESSION SET NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF'"); sessionConfig.add("ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT='YYYY-MM-DD HH24:MI:SS.FF TZR TZD'"); config.set(Key.SESSION, sessionConfig); } conn = DBUtil.getConnection(dbType, jdbcUrl, userName, password); DBUtil.dealWithSessionConfig(conn, config, dbType, BASIC_MESSAGE); return conn; } @Override public String getJdbcUrl() { return jdbcUrl; } @Override public String getUserName() { return userName; } @Override public void destroy() { DBUtil.closeDBResources(null, conn); } @Override public void doCommit() { try { if (conn != null) { conn.commit(); } } catch (SQLException e) { throw new RuntimeException(e); } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ObDataSourceErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import com.alibaba.datax.common.spi.ErrorCode; public enum ObDataSourceErrorCode implements ErrorCode { DESC("ObDataSourceError code","connect error"); private final String code; private final String describe; private ObDataSourceErrorCode(String code, String describe) { this.code = code; this.describe = describe; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.describe; } @Override public String toString() { return String.format("Code:[%s], Describe:[%s]. ", this.code, this.describe); } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/ext/ServerConnectInfo.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.ext; import static org.apache.commons.lang3.StringUtils.EMPTY; import java.util.regex.Matcher; import java.util.regex.Pattern; import com.alibaba.datax.common.util.Configuration; public class ServerConnectInfo { public String clusterName; public String tenantName; // userName doesn't contain tenantName or clusterName public String userName; public String password; public String databaseName; public String ipPort; public String jdbcUrl; public String host; public String port; public boolean publicCloud; public int rpcPort; public Configuration config; public ServerConnectInfo(final String jdbcUrl, final String username, final String password, Configuration config) { this.jdbcUrl = jdbcUrl; this.password = password; this.config = config; parseJdbcUrl(jdbcUrl); parseFullUserName(username); } private void parseJdbcUrl(final String jdbcUrl) { Pattern pattern = Pattern.compile("//([\\w\\.\\-]+:\\d+)/([^\\\\?]*)"); Matcher matcher = pattern.matcher(jdbcUrl); if (matcher.find()) { String ipPort = matcher.group(1); String dbName = matcher.group(2); this.ipPort = ipPort; this.host = ipPort.split(":")[0]; this.port = ipPort.split(":")[1]; this.databaseName = dbName; this.publicCloud = host.endsWith("aliyuncs.com"); } else { throw new RuntimeException("Invalid argument:" + jdbcUrl); } } protected void parseFullUserName(final String fullUserName) { int tenantIndex = fullUserName.indexOf("@"); int clusterIndex = fullUserName.indexOf("#"); // 适用于jdbcUrl以||_dsc_ob10_dsc_开头的场景 if (fullUserName.contains(":") && tenantIndex < 0) { String[] names = fullUserName.split(":"); if (names.length != 3) { throw new RuntimeException("invalid argument: " + fullUserName); } else { this.clusterName = names[0]; this.tenantName = names[1]; this.userName = names[2]; } } else if (tenantIndex < 0) { // 适用于short jdbcUrl,且username中不含租户名(主要是公有云场景,此场景下不计算分区) this.userName = fullUserName; this.clusterName = EMPTY; this.tenantName = EMPTY; } else { // 适用于short jdbcUrl,且username中含租户名 this.userName = fullUserName.substring(0, tenantIndex); if (clusterIndex < 0) { this.clusterName = EMPTY; this.tenantName = fullUserName.substring(tenantIndex + 1); } else { this.clusterName = fullUserName.substring(clusterIndex + 1); this.tenantName = fullUserName.substring(tenantIndex + 1, clusterIndex); } } } @Override public String toString() { return "ServerConnectInfo{" + "clusterName='" + clusterName + '\'' + ", tenantName='" + tenantName + '\'' + ", userName='" + userName + '\'' + ", password='" + password + '\'' + ", databaseName='" + databaseName + '\'' + ", ipPort='" + ipPort + '\'' + ", jdbcUrl='" + jdbcUrl + '\'' + ", host='" + host + '\'' + ", publicCloud=" + publicCloud + ", rpcPort=" + rpcPort + '}'; } public String getFullUserName() { StringBuilder builder = new StringBuilder(); builder.append(userName); if (!EMPTY.equals(tenantName)) { builder.append("@").append(tenantName); } if (!EMPTY.equals(clusterName)) { builder.append("#").append(clusterName); } if (EMPTY.equals(this.clusterName) && EMPTY.equals(this.tenantName)) { return this.userName; } return builder.toString(); } public void setRpcPort(int rpcPort) { this.rpcPort = rpcPort; } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/IObPartCalculator.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.part; import com.alibaba.datax.common.element.Record; /** * @author cjyyz * @date 2023/02/07 * @since */ public interface IObPartCalculator { /** * 计算 Partition Id * * @param record * @return Long */ Long calculate(Record record); } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/ObPartitionCalculatorV1.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.part; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; import com.alipay.oceanbase.obproxy.data.TableEntryKey; import com.alipay.oceanbase.obproxy.util.ObPartitionIdCalculator; import java.util.ArrayList; import java.util.List; import java.util.Objects; import java.util.concurrent.TimeUnit; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * OceanBase 1.x和2.x的分区计算 * * @author cjyyz * @date 2023/02/07 * @since */ public class ObPartitionCalculatorV1 implements IObPartCalculator { private static final Logger LOG = LoggerFactory.getLogger(ObPartitionCalculatorV1.class); /** * 分区键的位置 */ private List partIndexes; /** * 表的全部字段名 */ private List columnNames; /** * ocj partition calculator */ private ObPartitionIdCalculator calculator; /** * @param connectInfo * @param table * @param columns */ public ObPartitionCalculatorV1(ServerConnectInfo connectInfo, String table, List columns) { initCalculator(connectInfo, table); if (Objects.isNull(calculator)) { LOG.warn("partCalculator is null"); return; } this.partIndexes = new ArrayList<>(columns.size()); this.columnNames = new ArrayList<>(columns); for (int i = 0; i < columns.size(); ++i) { String columnName = columns.get(i); if (calculator.isPartitionKeyColumn(columnName)) { LOG.info(columnName + " is partition key."); partIndexes.add(i); } } } /** * @param record * @return Long */ @Override public Long calculate(Record record) { if (Objects.isNull(calculator)) { return null; } for (Integer i : partIndexes) { calculator.addColumn(columnNames.get(i), record.getColumn(i).asString()); } return calculator.calculate(); } /** * @param connectInfo * @param table */ private void initCalculator(ServerConnectInfo connectInfo, String table) { LOG.info(String.format("create tableEntryKey with clusterName %s, tenantName %s, databaseName %s, tableName %s", connectInfo.clusterName, connectInfo.tenantName, connectInfo.databaseName, table)); TableEntryKey tableEntryKey = new TableEntryKey(connectInfo.clusterName, connectInfo.tenantName, connectInfo.databaseName, table); int retry = 0; do { try { if (retry > 0) { TimeUnit.SECONDS.sleep(1); LOG.info("retry create new part calculator {} times", retry); } LOG.info("create partCalculator with address: " + connectInfo.ipPort); calculator = new ObPartitionIdCalculator(connectInfo.ipPort, tableEntryKey); } catch (Exception ex) { ++retry; LOG.warn("create new part calculator failed, retry: {}", ex.getMessage()); } } while (calculator == null && retry < 3); } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/part/ObPartitionCalculatorV2.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.part; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.DbUtils; import com.oceanbase.partition.calculator.ObPartIdCalculator; import com.oceanbase.partition.calculator.enums.ObPartLevel; import com.oceanbase.partition.calculator.enums.ObServerMode; import com.oceanbase.partition.calculator.helper.TableEntryExtractor; import com.oceanbase.partition.calculator.model.TableEntry; import com.oceanbase.partition.calculator.model.TableEntryKey; import com.oceanbase.partition.calculator.model.Version; import com.oceanbase.partition.metadata.desc.ObPartColumn; import com.oceanbase.partition.metadata.desc.ObTablePart; import java.sql.Connection; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Objects; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * OceanBase 3.x和4.x的分区计算 * * @author cjyyz * @date 2023/02/07 * @since */ public class ObPartitionCalculatorV2 implements IObPartCalculator { private static final Logger LOG = LoggerFactory.getLogger(ObPartitionCalculatorV2.class); /** * OB的模式以及版本信息 */ private ObServerMode mode; /** * ob-partition-calculator 分区计算组件 */ private ObPartIdCalculator calculator; /** * 记录columns的字段名和在record中的位置。 * 当目标表结构的分区键是生成列时,calculator 需要从改结构中获取到生成列所依赖的字段的值 * e.g. * create table t1 ( * c1 varchar(20), * c2 varchar(20) generated always as (substr(`c1`,1,8)) * )partition by key(c2) partitions 5 * * 此时,columnNameIndexMap包含的元素是 c1:0 * 需要将c1字段的值从columnNameIndexMap中添加到{@link com.oceanbase.partition.calculator.ObPartIdCalculator#getRefColumnValues()} */ private Map columnNameIndexMap; /** * @param connectInfo * @param table * @param mode */ public ObPartitionCalculatorV2(ServerConnectInfo connectInfo, String table, ObServerMode mode, List columns) { this.mode = mode; this.columnNameIndexMap = new HashMap<>(); for (int i = 0; i < columns.size(); i++) { columnNameIndexMap.put(columns.get(i).toLowerCase(), i); } initCalculator(connectInfo, table); } /** * @param record * @return Long */ @Override public Long calculate(Record record) { if (Objects.isNull(calculator)) { return null; } if (!calculator.getTableEntry().isPartitionTable()) { return 0L; } return calculator.calculatePartId(filterNullableColumns(record)); } /** * 初始化分区计算组件 * * @param connectInfo * @param table */ private void initCalculator(ServerConnectInfo connectInfo, String table) { TableEntryKey tableEntryKey = new TableEntryKey(connectInfo.clusterName, connectInfo.tenantName, connectInfo.databaseName, table, mode); boolean subsequentFromV4 = !mode.getVersion().isOlderThan(new Version("4.0.0.0")); try { TableEntry tableEntry; try (Connection conn = getConnection(connectInfo, subsequentFromV4)){ TableEntryExtractor extractor = new TableEntryExtractor(); tableEntry = extractor.queryTableEntry(conn, tableEntryKey,subsequentFromV4); } this.calculator = new ObPartIdCalculator(false, tableEntry, subsequentFromV4); } catch (Exception e) { LOG.warn("create new part calculator failed. reason: {}", e.getMessage()); } } private Connection getConnection(ServerConnectInfo connectInfo, boolean subsequentFromV4) throws Exception { // OceanBase 4.0.0.0及之后版本均使用业务租户连接计算分区 if (subsequentFromV4) { return DBUtil.getConnection(DataBaseType.OceanBase, connectInfo.jdbcUrl, connectInfo.getFullUserName(), connectInfo.password); } // OceanBase 4.0.0.0之前版本使用sys租户连接计算分区 return DbUtils.buildSysConn(connectInfo.jdbcUrl, connectInfo.clusterName); } /** * 只选择分区字段值传入分区计算组件 * * @param record * @return Object[] */ private Object[] filterNullableColumns(Record record) { final ObTablePart tablePart = calculator.getTableEntry().getTablePart(); final Object[] filteredRecords = new Object[record.getColumnNumber()]; if (tablePart.getLevel().getIndex() > ObPartLevel.LEVEL_ZERO.getIndex()) { // 从record中添加非生成列的一级分区值到filteredRecords数组中 for (ObPartColumn partColumn : tablePart.getPartColumns()) { if (partColumn.getColumnExpr() == null) { int metaIndex = partColumn.getColumnIndex(); String columnName = partColumn.getColumnName().toLowerCase(); int idxInRecord = columnNameIndexMap.get(columnName); filteredRecords[metaIndex] = record.getColumn(idxInRecord).asString(); } } // 从record中添加生成列的一级分区值到calculator的redColumnMap中,ObTablePart.getRefPartColumns中的字段名均为小写 for (ObPartColumn partColumn : tablePart.getRefPartColumns()) { String columnName = partColumn.getColumnName(); int index = columnNameIndexMap.get(columnName); calculator.addRefColumn(columnName, record.getColumn(index).asString()); } } if (tablePart.getLevel().getIndex() >= ObPartLevel.LEVEL_TWO.getIndex()) { // 从record中添加非生成列的二级分区值到filteredRecords数组中 for (ObPartColumn partColumn : tablePart.getSubPartColumns()) { if (partColumn.getColumnExpr() == null) { int metaIndex = partColumn.getColumnIndex(); String columnName = partColumn.getColumnName().toLowerCase(); int idxInRecord = columnNameIndexMap.get(columnName); filteredRecords[metaIndex] = record.getColumn(idxInRecord).asString(); } } // 从record中添加生成列的二级分区值到calculator的redColumnMap中,ObTablePart.getRefSubPartColumns中的字段名均为小写 for (ObPartColumn partColumn : tablePart.getRefSubPartColumns()) { String columnName = partColumn.getColumnName(); int index = columnNameIndexMap.get(columnName); calculator.addRefColumn(columnName, record.getColumn(index).asString()); } } return filteredRecords; } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/AbstractInsertTask.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; import java.util.List; import java.util.Queue; import java.util.concurrent.TimeUnit; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.AbstractConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public abstract class AbstractInsertTask implements Runnable { private static final Logger LOG = LoggerFactory.getLogger(AbstractInsertTask.class); protected final long taskId; protected ConcurrentTableWriterTask writerTask; protected ConcurrentTableWriterTask.ConcurrentTableWriter writer; protected Queue> queue; protected boolean isStop; protected Configuration config; protected ServerConnectInfo connInfo; protected AbstractConnHolder connHolder; protected long totalCost = 0; protected long insertCount = 0; private boolean printCost = Config.DEFAULT_PRINT_COST; private long costBound = Config.DEFAULT_COST_BOUND; public AbstractInsertTask(final long taskId, Queue> recordsQueue, Configuration config, ServerConnectInfo connectInfo, ConcurrentTableWriterTask task, ConcurrentTableWriterTask.ConcurrentTableWriter writer) { this.taskId = taskId; this.queue = recordsQueue; this.config = config; this.connInfo = connectInfo; this.isStop = false; this.printCost = config.getBool(Config.PRINT_COST, Config.DEFAULT_PRINT_COST); this.costBound = config.getLong(Config.COST_BOUND, Config.DEFAULT_COST_BOUND); this.writer = writer; this.writerTask = task; initConnHolder(); } public AbstractInsertTask(final long taskId, Queue> recordsQueue, Configuration config, ServerConnectInfo connectInfo) { this.taskId = taskId; this.queue = recordsQueue; this.config = config; this.connInfo = connectInfo; this.isStop = false; this.printCost = config.getBool(Config.PRINT_COST, Config.DEFAULT_PRINT_COST); this.costBound = config.getLong(Config.COST_BOUND, Config.DEFAULT_COST_BOUND); initConnHolder(); } protected abstract void initConnHolder(); public void setWriterTask(ConcurrentTableWriterTask writerTask) { this.writerTask = writerTask; } public void setWriter(ConcurrentTableWriterTask.ConcurrentTableWriter writer) { this.writer = writer; } private boolean isStop() { return isStop; } public void setStop() { isStop = true; } public AbstractConnHolder getConnHolder() { return connHolder; } public void calStatistic(final long cost) { writer.increFinishCount(); insertCount++; totalCost += cost; if (this.printCost && cost > this.costBound) { LOG.info("slow multi insert cost {}ms", cost); } } @Override public void run() { Thread.currentThread().setName(String.format("%d-insertTask-%d", taskId, Thread.currentThread().getId())); LOG.debug("Task {} start to execute...", taskId); while (!isStop()) { try { List records = queue.poll(); if (null != records) { write(records); } else if (writerTask.isFinished()) { writerTask.singalTaskFinish(); LOG.debug("not more task, thread exist ..."); break; } else { TimeUnit.MILLISECONDS.sleep(5); } } catch (InterruptedException e) { LOG.debug("TableWriter is interrupt"); } catch (Exception e) { LOG.warn("ERROR UNEXPECTED ", e); break; } } LOG.debug("Thread exist..."); } protected abstract void write(List records); public long getTotalCost() { return totalCost; } public long getInsertCount() { return insertCount; } public void destroy() { if (connHolder != null) { connHolder.destroy(); } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ColumnMetaCache.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; import java.sql.Connection; import java.sql.SQLException; import java.util.List; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.Triple; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.plugin.rdbms.util.DBUtil; public class ColumnMetaCache { private static final Logger LOG = LoggerFactory.getLogger(ColumnMetaCache.class); private static String tableName; private static Triple, List, List> columnMeta = null; public ColumnMetaCache() { } public static void init(Connection connection, final String tableName, final List columns) throws SQLException { if (columnMeta == null) { synchronized(ColumnMetaCache.class) { ColumnMetaCache.tableName = tableName; if (columnMeta == null) { columnMeta = DBUtil.getColumnMetaData(connection, tableName, StringUtils.join(columns, ",")); LOG.info("fetch columnMeta of table {} success", tableName); } } } } public static Triple, List, List> getColumnMeta() { return columnMeta; } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/ConcurrentTableWriterTask.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.AbstractConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; import com.alibaba.datax.plugin.writer.oceanbasev10writer.part.IObPartCalculator; import com.alibaba.datax.plugin.writer.oceanbasev10writer.part.ObPartitionCalculatorV1; import com.alibaba.datax.plugin.writer.oceanbasev10writer.part.ObPartitionCalculatorV2; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; import com.oceanbase.partition.calculator.enums.ObServerMode; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Types; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.concurrent.BlockingQueue; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.ReentrantLock; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import static com.alibaba.datax.plugin.writer.oceanbasev10writer.Config.DEFAULT_SLOW_MEMSTORE_THRESHOLD; import static com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils.LoadMode.FAST; import static com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils.LoadMode.PAUSE; import static com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils.LoadMode.SLOW; public class ConcurrentTableWriterTask extends CommonRdbmsWriter.Task { private static final Logger LOG = LoggerFactory.getLogger(ConcurrentTableWriterTask.class); // memstore_total 与 memstore_limit 比例的阈值,一旦超过这个值,则暂停写入 private double memstoreThreshold = Config.DEFAULT_MEMSTORE_THRESHOLD; // memstore检查的间隔 private long memstoreCheckIntervalSecond = Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND; // 最后一次检查 private long lastCheckMemstoreTime; private volatile ObWriterUtils.LoadMode loadMode = FAST; private static AtomicLong totalTask = new AtomicLong(0); private long taskId = -1; private AtomicBoolean isMemStoreFull = new AtomicBoolean(false); private HashMap> groupInsertValues; private IObPartCalculator obPartCalculator; private ConcurrentTableWriter concurrentWriter = null; private AbstractConnHolder connHolder; private boolean allTaskInQueue = false; private Lock lock = new ReentrantLock(); private Condition condition = lock.newCondition(); private long startTime; private String obWriteMode = "update"; private boolean isOracleCompatibleMode = false; private String obUpdateColumns = null; private String dbName; private int calPartFailedCount = 0; private boolean directPath; public ConcurrentTableWriterTask(DataBaseType dataBaseType) { super(dataBaseType); taskId = totalTask.getAndIncrement(); } @Override public void init(Configuration config) { super.init(config); this.directPath = config.getBool(Config.DIRECT_PATH, false); // OceanBase 所有操作都是 insert into on duplicate key update 模式 // writeMode应该使用enum来定义 this.writeMode = "update"; obWriteMode = config.getString(Config.OB_WRITE_MODE, "update"); ServerConnectInfo connectInfo = new ServerConnectInfo(jdbcUrl, username, password, config); connectInfo.setRpcPort(config.getInt(Config.RPC_PORT, 0)); dbName = connectInfo.databaseName; //init check memstore this.memstoreThreshold = config.getDouble(Config.MEMSTORE_THRESHOLD, Config.DEFAULT_MEMSTORE_THRESHOLD); this.memstoreCheckIntervalSecond = config.getLong(Config.MEMSTORE_CHECK_INTERVAL_SECOND, Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND); this.connHolder = new ObClientConnHolder(config, connectInfo.jdbcUrl, connectInfo.getFullUserName(), connectInfo.password); this.isOracleCompatibleMode = ObWriterUtils.isOracleMode(); if (isOracleCompatibleMode) { connectInfo.databaseName = connectInfo.databaseName.toUpperCase(); //在转义的情况下不翻译 if (!(table.startsWith("\"") && table.endsWith("\""))) { table = table.toUpperCase(); } LOG.info(String.format("this is oracle compatible mode, change database to %s, table to %s", connectInfo.databaseName, table)); } if (config.getBool(Config.USE_PART_CALCULATOR, Config.DEFAULT_USE_PART_CALCULATOR)) { this.obPartCalculator = createPartitionCalculator(connectInfo, ObServerMode.from(config.getString(Config.OB_COMPATIBLE_MODE), config.getString(Config.OB_VERSION))); } else { LOG.info("Disable partition calculation feature."); } obUpdateColumns = config.getString(Config.OB_UPDATE_COLUMNS, null); groupInsertValues = new HashMap>(); rewriteSql(); if (null == concurrentWriter) { concurrentWriter = new ConcurrentTableWriter(config, connectInfo, writeRecordSql); allTaskInQueue = false; } } /** * 创建需要的分区计算组件 * * @param connectInfo * @return */ private IObPartCalculator createPartitionCalculator(ServerConnectInfo connectInfo, ObServerMode obServerMode) { if (obServerMode.isSubsequentFrom("3.0.0.0")) { LOG.info("oceanbase version is {}, use ob-partition-calculator to calculate partition Id.", obServerMode.getVersion()); return new ObPartitionCalculatorV2(connectInfo, table, obServerMode, columns); } LOG.info("oceanbase version is {}, use ocj to calculate partition Id.", obServerMode.getVersion()); return new ObPartitionCalculatorV1(connectInfo, table, columns); } public boolean isFinished() { return allTaskInQueue && concurrentWriter.checkFinish(); } public boolean allTaskInQueue() { return allTaskInQueue; } public void setPutAllTaskInQueue() { this.allTaskInQueue = true; LOG.info("ConcurrentTableWriter has put all task in queue, queueSize = {}, total = {}, finished = {}", concurrentWriter.getTaskQueueSize(), concurrentWriter.getTotalTaskCount(), concurrentWriter.getFinishTaskCount()); } private void rewriteSql() { Connection conn = connHolder.initConnection(); if (isOracleCompatibleMode && obWriteMode.equalsIgnoreCase("update")) { // change obWriteMode to insert so the insert statement will be generated. obWriteMode = "insert"; } this.writeRecordSql = ObWriterUtils.buildWriteSql(table, columns, conn, obWriteMode, obUpdateColumns); LOG.info("writeRecordSql :{}", this.writeRecordSql); } @Override public void prepare(Configuration writerSliceConfig) { super.prepare(writerSliceConfig); concurrentWriter.start(); } @Override public void startWriteWithConnection(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector, Connection connection) { this.taskPluginCollector = taskPluginCollector; // 用于写入数据的时候的类型根据目的表字段类型转换 int retryTimes = 0; boolean needRetry = false; do { try { if (retryTimes > 0) { TimeUnit.SECONDS.sleep((1 << retryTimes)); DBUtil.closeDBResources(null, connection); connection = DBUtil.getConnection(dataBaseType, jdbcUrl, username, password); LOG.warn("getColumnMetaData of table {} failed, retry the {} times ...", this.table, retryTimes); } ColumnMetaCache.init(connection, this.table, this.columns); this.resultSetMetaData = ColumnMetaCache.getColumnMeta(); needRetry = false; } catch (SQLException e) { needRetry = true; ++retryTimes; e.printStackTrace(); LOG.warn("fetch column meta of [{}] failed..., retry {} times", this.table, retryTimes); } catch (InterruptedException e) { LOG.warn("startWriteWithConnection interrupt, ignored"); } finally { } } while (needRetry && retryTimes < 100); try { Record record; startTime = System.currentTimeMillis(); while ((record = recordReceiver.getFromReader()) != null) { if (record.getColumnNumber() != this.columnNumber) { // 源头读取字段列数与目的表字段写入列数不相等,直接报错 LOG.error("column not equal {} != {}, record = {}", this.columnNumber, record.getColumnNumber(), record.toString()); throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format("Recoverable exception in OB. Roll back this write and hibernate for one minute. SQLState: %d. ErrorCode: %d", record.getColumnNumber(), this.columnNumber)); } addRecordToCache(record); } addLeftRecords(); waitTaskFinish(); } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } finally { DBUtil.closeDBResources(null, null, connection); } } public PreparedStatement fillStatement(PreparedStatement preparedStatement, Record record) throws SQLException { return fillPreparedStatement(preparedStatement, record); } private void addLeftRecords() { //不需要刷新Cache,已经是最后一批数据了 for (List groupValues : groupInsertValues.values()) { if (groupValues.size() > 0) { addRecordsToWriteQueue(groupValues); } } } private void addRecordToCache(final Record record) { Long partId = null; try { partId = obPartCalculator == null ? Long.MAX_VALUE : obPartCalculator.calculate(record); } catch (Exception e1) { if (calPartFailedCount++ < 10) { LOG.warn("fail to get partition id: " + e1.getMessage() + ", record: " + record); } } if (partId == null) { LOG.debug("fail to calculate parition id, just put into the default buffer."); partId = Long.MAX_VALUE; } List groupValues = groupInsertValues.computeIfAbsent(partId, k -> new ArrayList(batchSize)); groupValues.add(record); if (groupValues.size() >= batchSize) { groupValues = addRecordsToWriteQueue(groupValues); groupInsertValues.put(partId, groupValues); } } /** * @param records * @return 返回一个新的Cache用于存储接下来的数据 */ private List addRecordsToWriteQueue(List records) { int i = 0; while (true) { if (i > 0) { LOG.info("retry add batch record the {} times", i); } try { concurrentWriter.addBatchRecords(records); break; } catch (InterruptedException e) { i++; LOG.info("Concurrent table writer is interrupted"); } } return new ArrayList(batchSize); } private void checkMemStore() { Connection checkConn = connHolder.getConn(); try { if (checkConn == null || checkConn.isClosed()) { checkConn = connHolder.reconnect(); } } catch (Exception e) { LOG.warn("Check connection is unusable"); } long now = System.currentTimeMillis(); if (now - lastCheckMemstoreTime < 1000 * memstoreCheckIntervalSecond) { return; } double memUsedRatio = ObWriterUtils.queryMemUsedRatio(checkConn); if (memUsedRatio >= DEFAULT_SLOW_MEMSTORE_THRESHOLD) { this.loadMode = memUsedRatio >= memstoreThreshold ? PAUSE : SLOW; LOG.info("Memstore used ration is {}. Load data {}", memUsedRatio, loadMode.name()); } else { this.loadMode = FAST; } lastCheckMemstoreTime = now; } public boolean isMemStoreFull() { return isMemStoreFull.get(); } public boolean isShouldPause() { return this.loadMode.equals(PAUSE); } public boolean isShouldSlow() { return this.loadMode.equals(SLOW); } public void print() { if (LOG.isDebugEnabled()) { LOG.debug("Statistic total task {}, finished {}, queue Size {}", concurrentWriter.getTotalTaskCount(), concurrentWriter.getFinishTaskCount(), concurrentWriter.getTaskQueueSize()); concurrentWriter.printStatistics(); } } public void waitTaskFinish() { setPutAllTaskInQueue(); lock.lock(); try { while (!concurrentWriter.checkFinish()) { condition.await(15, TimeUnit.SECONDS); print(); checkMemStore(); } if (directPath){ concurrentWriter.doCommit(); } } catch (InterruptedException e) { LOG.warn("Concurrent table writer wait task finish interrupt"); } finally { lock.unlock(); } LOG.debug("wait all InsertTask finished ..."); } public void singalTaskFinish() { lock.lock(); condition.signal(); lock.unlock(); } @Override public void destroy(Configuration writerSliceConfig) { if (concurrentWriter != null) { concurrentWriter.destory(); } // 把本级持有的conn关闭掉 DBUtil.closeDBResources(null, connHolder.getConn()); super.destroy(writerSliceConfig); } public class ConcurrentTableWriter { private BlockingQueue> queue; private List abstractInsertTasks; private Configuration config; private ServerConnectInfo connectInfo; private String rewriteRecordSql; private AtomicLong totalTaskCount; private AtomicLong finishTaskCount; private final int threadCount; public ConcurrentTableWriter(Configuration config, ServerConnectInfo connInfo, String rewriteRecordSql) { threadCount = config.getInt(Config.WRITER_THREAD_COUNT, Config.DEFAULT_WRITER_THREAD_COUNT); queue = new LinkedBlockingQueue>(threadCount << 1); abstractInsertTasks = new ArrayList(threadCount); this.config = config; this.connectInfo = connInfo; this.rewriteRecordSql = rewriteRecordSql; this.totalTaskCount = new AtomicLong(0); this.finishTaskCount = new AtomicLong(0); } public long getTotalTaskCount() { return totalTaskCount.get(); } public long getFinishTaskCount() { return finishTaskCount.get(); } public int getTaskQueueSize() { return queue.size(); } public void increFinishCount() { finishTaskCount.incrementAndGet(); } //should check after put all the task in the queue public boolean checkFinish() { long finishCount = finishTaskCount.get(); long totalCount = totalTaskCount.get(); return finishCount == totalCount; } public synchronized void start() { for (int i = 0; i < threadCount; ++i) { LOG.info("start {} insert task.", (i + 1)); AbstractInsertTask insertTask = null; if (directPath) { insertTask = new DirectPathInsertTask(taskId, queue, config, connectInfo, ConcurrentTableWriterTask.this, this); } else { insertTask = new InsertTask(taskId, queue, config, connectInfo, rewriteRecordSql); } insertTask.setWriterTask(ConcurrentTableWriterTask.this); insertTask.setWriter(this); abstractInsertTasks.add(insertTask); } WriterThreadPool.executeBatch(abstractInsertTasks); } public void doCommit() { this.abstractInsertTasks.get(0).getConnHolder().doCommit(); } public int getThreadCount() { return threadCount; } public void printStatistics() { long insertTotalCost = 0; long insertTotalCount = 0; for (AbstractInsertTask task : abstractInsertTasks) { insertTotalCost += task.getTotalCost(); insertTotalCount += task.getInsertCount(); } long avgCost = 0; if (insertTotalCount != 0) { avgCost = insertTotalCost / insertTotalCount; } ConcurrentTableWriterTask.LOG.debug("Insert {} times, totalCost {} ms, average {} ms", insertTotalCount, insertTotalCost, avgCost); } public void addBatchRecords(final List records) throws InterruptedException { boolean isSucc = false; while (!isSucc) { isSucc = queue.offer(records, 5, TimeUnit.MILLISECONDS); checkMemStore(); } totalTaskCount.incrementAndGet(); } public synchronized void destory() { if (abstractInsertTasks != null) { for (AbstractInsertTask task : abstractInsertTasks) { task.setStop(); } for (AbstractInsertTask task : abstractInsertTasks) { task.destroy(); } } } } public String getTable() { return table; } // 直接使用了两个类变量:columnNumber,resultSetMetaData protected PreparedStatement fillPreparedStatement(PreparedStatement preparedStatement, Record record) throws SQLException { for (int i = 0; i < this.columnNumber; i++) { int columnSqltype = this.resultSetMetaData.getMiddle().get(i); String typeName = this.resultSetMetaData.getRight().get(i); preparedStatement = fillPreparedStatementColumnType(preparedStatement, i, columnSqltype, typeName, record.getColumn(i)); } return preparedStatement; } protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException { java.util.Date utilDate; switch (columnSqltype) { case Types.CHAR: case Types.NCHAR: case Types.CLOB: case Types.NCLOB: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: preparedStatement.setString(columnIndex + 1, column .asString()); break; case Types.SMALLINT: case Types.INTEGER: case Types.BIGINT: case Types.NUMERIC: case Types.DECIMAL: case Types.FLOAT: case Types.REAL: case Types.DOUBLE: String strValue = column.asString(); if (emptyAsNull && "".equals(strValue)) { preparedStatement.setString(columnIndex + 1, null); } else { preparedStatement.setString(columnIndex + 1, strValue); } break; //tinyint is a little special in some database like mysql {boolean->tinyint(1)} case Types.TINYINT: Long longValue = column.asLong(); if (null == longValue) { preparedStatement.setString(columnIndex + 1, null); } else { preparedStatement.setString(columnIndex + 1, longValue.toString()); } break; // for mysql bug, see http://bugs.mysql.com/bug.php?id=35115 case Types.DATE: if (typeName == null) { typeName = this.resultSetMetaData.getRight().get(columnIndex); } if (typeName.equalsIgnoreCase("year")) { if (column.asBigInteger() == null) { preparedStatement.setString(columnIndex + 1, null); } else { preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); } } else { java.sql.Date sqlDate = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "Date 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlDate = new java.sql.Date(utilDate.getTime()); } preparedStatement.setDate(columnIndex + 1, sqlDate); } break; case Types.TIME: java.sql.Time sqlTime = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "TIME 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlTime = new java.sql.Time(utilDate.getTime()); } preparedStatement.setTime(columnIndex + 1, sqlTime); break; case Types.TIMESTAMP: java.sql.Timestamp sqlTimestamp = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "TIMESTAMP 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlTimestamp = new java.sql.Timestamp( utilDate.getTime()); } preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); break; case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: preparedStatement.setBytes(columnIndex + 1, column .asBytes()); break; case Types.BINARY: String isArray = column.getRawData().toString(); if (isArray.startsWith("[") && isArray.endsWith("]")) { preparedStatement.setString(columnIndex + 1, column .asString()); } else { preparedStatement.setBytes(columnIndex + 1, column .asBytes()); } break; case Types.BOOLEAN: preparedStatement.setBoolean(columnIndex + 1, column.asBoolean()); break; // warn: bit(1) -> Types.BIT 可使用setBoolean // warn: bit(>1) -> Types.VARBINARY 可使用setBytes case Types.BIT: if (this.dataBaseType == DataBaseType.MySql) { preparedStatement.setBoolean(columnIndex + 1, column.asBoolean()); } else { preparedStatement.setString(columnIndex + 1, column.asString()); } break; default: throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.", this.resultSetMetaData.getLeft() .get(columnIndex), this.resultSetMetaData.getMiddle() .get(columnIndex), this.resultSetMetaData.getRight() .get(columnIndex))); } return preparedStatement; } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/DirectPathInsertTask.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; import java.text.MessageFormat; import java.util.Arrays; import java.util.List; import java.util.Queue; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Column.Type; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.oceanbasev10writer.common.Table; import com.alibaba.datax.plugin.writer.oceanbasev10writer.common.TableCache; import com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath.DirectPathConnection; import com.alibaba.datax.plugin.writer.oceanbasev10writer.directPath.DirectPathPreparedStatement; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.DirectPathConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class DirectPathInsertTask extends AbstractInsertTask { private static final Logger LOG = LoggerFactory.getLogger(DirectPathInsertTask.class); public DirectPathInsertTask(long taskId, Queue> recordsQueue, Configuration config, ServerConnectInfo connectInfo, ConcurrentTableWriterTask task, ConcurrentTableWriterTask.ConcurrentTableWriter writer) { super(taskId, recordsQueue, config, connectInfo, task, writer); } @Override protected void initConnHolder() { this.connHolder = new DirectPathConnHolder(config, connInfo, writerTask.getTable(), writer.getThreadCount()); this.connHolder.initConnection(); } @Override protected void write(List records) { Table table = TableCache.getInstance().getTable(connInfo.databaseName, writerTask.getTable()); if (Table.Status.FAILURE.equals(table.getStatus())) { return; } DirectPathConnection conn = (DirectPathConnection) connHolder.getConn(); if (records != null && !records.isEmpty()) { long startTime = System.currentTimeMillis(); try (DirectPathPreparedStatement stmt = conn.createStatement()) { final int columnNumber = records.get(0).getColumnNumber(); Object[] values = new Object[columnNumber]; for (Record record : records) { for (int i = 0; i < columnNumber; i++) { Column column = record.getColumn(i); //处理一下时间类型 if (column.getType().equals(Type.DATE)) { values[i] = record.getColumn(i).asString(); } else { values[i] = record.getColumn(i).getRawData(); } } stmt.addBatch(values); } int[] result = stmt.executeBatch(); if (LOG.isDebugEnabled()) { LOG.debug("[{}] Insert {} rows success", Thread.currentThread().getName(), Arrays.stream(result).sum()); } calStatistic(System.currentTimeMillis() - startTime); stmt.clearBatch(); } catch (Throwable ex) { String msg = MessageFormat.format("Insert data into table \"{0}\" failed. Error: {1}", writerTask.getTable(), ex.getMessage()); LOG.error(msg, ex); table.setError(ex); table.setStatus(Table.Status.FAILURE); throw new RuntimeException(msg); } } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/InsertTask.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.AbstractConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ServerConnectInfo; import com.alibaba.datax.plugin.writer.oceanbasev10writer.task.ConcurrentTableWriterTask.ConcurrentTableWriter; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.SQLException; import java.util.List; import java.util.concurrent.BlockingQueue; import java.util.concurrent.TimeUnit; public class InsertTask extends AbstractInsertTask implements Runnable { private static final Logger LOG = LoggerFactory.getLogger(InsertTask.class); private ConcurrentTableWriterTask writerTask; private ConcurrentTableWriter writer; private String writeRecordSql; private long totalCost = 0; private long insertCount = 0; private BlockingQueue> queue; private boolean isStop; private AbstractConnHolder connHolder; private final long taskId; private ServerConnectInfo connInfo; // 失败重试次数 private int failTryCount = Config.DEFAULT_FAIL_TRY_COUNT; private boolean printCost = Config.DEFAULT_PRINT_COST; private long costBound = Config.DEFAULT_COST_BOUND; public InsertTask( final long taskId, BlockingQueue> recordsQueue, Configuration config, ServerConnectInfo connectInfo, String writeRecordSql) { super(taskId, recordsQueue, config, connectInfo); this.taskId = taskId; this.queue = recordsQueue; this.connInfo = connectInfo; failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); printCost = config.getBool(Config.PRINT_COST, Config.DEFAULT_PRINT_COST); costBound = config.getLong(Config.COST_BOUND, Config.DEFAULT_COST_BOUND); this.connHolder = new ObClientConnHolder(config, connInfo.jdbcUrl, connInfo.getFullUserName(), connInfo.password); this.writeRecordSql = writeRecordSql; this.isStop = false; connHolder.initConnection(); } protected void initConnHolder() { } public void setWriterTask(ConcurrentTableWriterTask writerTask) { this.writerTask = writerTask; } public void setWriter(ConcurrentTableWriter writer) { this.writer = writer; } private boolean isStop() { return isStop; } public void setStop() { isStop = true; } public long getTotalCost() { return totalCost; } public long getInsertCount() { return insertCount; } @Override public void run() { Thread.currentThread().setName(String.format("%d-insertTask-%d", taskId, Thread.currentThread().getId())); LOG.debug("Task {} start to execute...", taskId); while (!isStop()) { try { List records = queue.poll(5, TimeUnit.MILLISECONDS); if (null != records) { doMultiInsert(records, this.printCost, this.costBound); } else if (writerTask.isFinished()) { writerTask.singalTaskFinish(); LOG.debug("not more task, thread exist ..."); break; } } catch (InterruptedException e) { LOG.debug("TableWriter is interrupt"); } catch (Exception e) { LOG.warn("ERROR UNEXPECTED ", e); } } LOG.debug("Thread exist..."); } protected void write(List records) { } public void destroy() { connHolder.destroy(); } public void calStatistic(final long cost) { writer.increFinishCount(); ++insertCount; totalCost += cost; if (this.printCost && cost > this.costBound) { LOG.info("slow multi insert cost {}ms", cost); } } public void doMultiInsert(final List buffer, final boolean printCost, final long restrict) { checkMemstore(); Connection conn = connHolder.getConn(); boolean success = false; long cost = 0; long startTime = 0; try { for (int i = 0; i < failTryCount; ++i) { if (i > 0) { conn = connHolder.getConn(); LOG.info("retry {}, start do batch insert, size={}", i, buffer.size()); checkMemstore(); } startTime = System.currentTimeMillis(); PreparedStatement ps = null; try { conn.setAutoCommit(false); ps = conn.prepareStatement(writeRecordSql); for (Record record : buffer) { ps = writerTask.fillStatement(ps, record); ps.addBatch(); } ps.executeBatch(); conn.commit(); success = true; cost = System.currentTimeMillis() - startTime; calStatistic(cost); break; } catch (SQLException e) { LOG.warn("Insert fatal error SqlState ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); if (LOG.isDebugEnabled() && (i == 0 || i > 10)) { for (Record record : buffer) { LOG.warn("ERROR : record {}", record); } } // 按照错误码分类,分情况处理 // 如果是OB系统级异常,则需要重建连接 boolean fatalFail = ObWriterUtils.isFatalError(e); if (fatalFail) { ObWriterUtils.sleep(300000); connHolder.reconnect(); // 如果是可恢复的异常,则重试 } else if (ObWriterUtils.isRecoverableError(e)) { conn.rollback(); ObWriterUtils.sleep(60000); } else {// 其它异常直接退出,采用逐条写入方式 conn.rollback(); ObWriterUtils.sleep(1000); break; } } catch (Exception e) { e.printStackTrace(); LOG.warn("Insert error unexpected {}", e); } finally { DBUtil.closeDBResources(ps, null); } } } catch (SQLException e) { LOG.warn("ERROR:retry failSql State ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); } if (!success) { LOG.info("do one insert"); conn = connHolder.reconnect(); writerTask.doOneInsert(conn, buffer); cost = System.currentTimeMillis() - startTime; calStatistic(cost); } } private void checkMemstore() { if (writerTask.isShouldSlow()) { ObWriterUtils.sleep(100); } else { while (writerTask.isShouldPause()) { ObWriterUtils.sleep(100); } } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/SingleTableWriterTask.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.SQLException; import java.util.List; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.AbstractConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.ext.ObClientConnHolder; import com.alibaba.datax.plugin.writer.oceanbasev10writer.util.ObWriterUtils; public class SingleTableWriterTask extends CommonRdbmsWriter.Task { // memstore_total 与 memstore_limit 比例的阈值,一旦超过这个值,则暂停写入 private double memstoreThreshold = Config.DEFAULT_MEMSTORE_THRESHOLD; // memstore检查的间隔 private long memstoreCheckIntervalSecond = Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND; // 最后一次检查 private long lastCheckMemstoreTime; // 失败重试次数 private int failTryCount = Config.DEFAULT_FAIL_TRY_COUNT; private AbstractConnHolder connHolder; private String obWriteMode = "update"; private boolean isOracleCompatibleMode = false; private String obUpdateColumns = null; public SingleTableWriterTask(DataBaseType dataBaseType) { super(dataBaseType); } @Override public void init(Configuration config) { super.init(config); this.memstoreThreshold = config.getDouble(Config.MEMSTORE_THRESHOLD, Config.DEFAULT_MEMSTORE_THRESHOLD); this.memstoreCheckIntervalSecond = config.getLong(Config.MEMSTORE_CHECK_INTERVAL_SECOND, Config.DEFAULT_MEMSTORE_CHECK_INTERVAL_SECOND); failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); // OceanBase 所有操作都是 insert into on duplicate key update 模式 // writeMode应该使用enum来定义 this.writeMode = "update"; this.connHolder = new ObClientConnHolder(config, jdbcUrl, username, password); //ob1.0里面, this.batchSize = Math.min(128, config.getInt(Key.BATCH_SIZE, 128)); LOG.info("In Write OceanBase 1.0, Real Batch Size : " + this.batchSize); isOracleCompatibleMode = ObWriterUtils.isOracleMode(); LOG.info("isOracleCompatibleMode=" + isOracleCompatibleMode); obUpdateColumns = config.getString(Config.OB_UPDATE_COLUMNS, null); obWriteMode = config.getString(Config.OB_WRITE_MODE, "update"); if (isOracleCompatibleMode) { obWriteMode = "insert"; } rewriteSql(); } private void rewriteSql() { Connection conn = connHolder.initConnection(); this.writeRecordSql = ObWriterUtils.buildWriteSql(table, columns, conn, obWriteMode, obUpdateColumns); } protected void doBatchInsert(Connection conn, List buffer) throws SQLException { doBatchInsert(buffer); } private void doBatchInsert(List buffer) { Connection conn = connHolder.getConn(); // 检查内存 checkMemstore(conn); boolean success = false; try { for (int i = 0; i < failTryCount; i++) { PreparedStatement ps = null; try { conn.setAutoCommit(false); ps = conn.prepareStatement(this.writeRecordSql); for (Record record : buffer) { ps = fillPreparedStatement(ps, record); ps.addBatch(); } ps.executeBatch(); conn.commit(); // 标记执行正常,且退出for循环 success = true; break; } catch (SQLException e) { // 如果是OB系统级异常,则需要重建连接 boolean fatalFail = ObWriterUtils.isFatalError(e); if (fatalFail) { LOG.warn("Fatal exception in OB. Roll back this write and hibernate for five minutes. SQLState: {}. ErrorCode: {}", e.getSQLState(), e.getErrorCode(), e); ObWriterUtils.sleep(300000); DBUtil.closeDBResources(null, conn); conn = connHolder.reconnect(); // 如果是可恢复的异常,则重试 } else if (ObWriterUtils.isRecoverableError(e)) { LOG.warn("Recoverable exception in OB. Roll back this write and hibernate for one minute. SQLState: {}. ErrorCode: {}", e.getSQLState(), e.getErrorCode(), e); conn.rollback(); ObWriterUtils.sleep(60000); // 其它异常直接退出,采用逐条写入方式 } else { LOG.warn("Exception in OB. Roll back this write and hibernate for one second. Write and submit the records one by one. SQLState: {}. ErrorCode: {}", e.getSQLState(), e.getErrorCode(), e); conn.rollback(); ObWriterUtils.sleep(1000); break; } } finally { DBUtil.closeDBResources(ps, null); } } } catch (SQLException e) { LOG.warn("Exception in OB. Roll back this write. Write and submit the records one by one. SQLState: {}. ErrorCode: {}", e.getSQLState(), e.getErrorCode(), e); } if (!success) { doOneInsert(conn, buffer); } } private void checkMemstore(Connection conn) { long now = System.currentTimeMillis(); if (now - lastCheckMemstoreTime < 1000 * memstoreCheckIntervalSecond) { return; } while (ObWriterUtils.isMemstoreFull(conn, memstoreThreshold)) { LOG.warn("OB memstore is full,sleep 60 seconds, threshold=" + memstoreThreshold); ObWriterUtils.sleep(60000); } lastCheckMemstoreTime = now; } @Override public void destroy(Configuration writerSliceConfig) { // 把本级持有的conn关闭掉 DBUtil.closeDBResources(null, connHolder.getConn()); super.destroy(writerSliceConfig); } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/task/WriterThreadPool.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.task; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import org.checkerframework.checker.units.qual.A; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class WriterThreadPool { private static final Logger LOG = LoggerFactory.getLogger(InsertTask.class); private static ExecutorService executorService = Executors.newCachedThreadPool(); public WriterThreadPool() { } public static ExecutorService getInstance() { return executorService; } public static synchronized void shutdown() { LOG.info("start shutdown executor service..."); executorService.shutdown(); LOG.info("shutdown executor service success..."); } public static synchronized void execute(InsertTask task) { executorService.execute(task); } public static synchronized void executeBatch(List tasks) { for (AbstractInsertTask task : tasks) { executorService.execute(task); } } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/DbUtils.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.util.List; import java.util.concurrent.TimeUnit; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class DbUtils { protected static final Logger LOG = LoggerFactory.getLogger(DbUtils.class); public static String fetchSingleValueWithRetry(Configuration config, String query) { final String username = config.getString(Key.USERNAME); final String password = config.getString(Key.PASSWORD); String jdbcUrl = config.getString(Key.JDBC_URL); if (jdbcUrl == null) { List conns = config.getList(Constant.CONN_MARK, Object.class); Configuration connConf = Configuration.from(conns.get(0).toString()); jdbcUrl = connConf.getString(Key.JDBC_URL); } Connection conn = null; PreparedStatement stmt = null; ResultSet result = null; String value = null; int retry = 0; int failTryCount = config.getInt(Config.FAIL_TRY_COUNT, Config.DEFAULT_FAIL_TRY_COUNT); do { try { if (retry > 0) { int sleep = retry > 9 ? 500 : 1 << retry; try { TimeUnit.SECONDS.sleep(sleep); } catch (InterruptedException e) { } LOG.warn("retry fetch value for {} the {} times", query, retry); } conn = DBUtil.getConnection(DataBaseType.OceanBase, jdbcUrl, username, password); stmt = conn.prepareStatement(query); result = stmt.executeQuery(); if (result.next()) { value = result.getString("Value"); } else { throw new RuntimeException("no values returned for " + query); } LOG.info("value for query [{}] is [{}]", query, value); break; } catch (SQLException e) { ++retry; LOG.warn("fetch value with {} error {}", query, e); } finally { DBUtil.closeDBResources(result, stmt, conn); } } while (retry < failTryCount); return value; } /** * build sys connection from ordinary jdbc url * * @param jdbcUrl * @param clusterName * @return * @throws Exception */ public static Connection buildSysConn(String jdbcUrl, String clusterName) throws Exception { jdbcUrl = jdbcUrl.replace("jdbc:mysql://", "jdbc:oceanbase://"); int startIdx = jdbcUrl.indexOf('/', "jdbc:oceanbase://".length()); int endIdx = jdbcUrl.lastIndexOf('?'); String prefix = jdbcUrl.substring(0, startIdx + 1); final String postfix = jdbcUrl.substring(endIdx); String sysJDBCUrl = prefix + "oceanbase" + postfix; String tenantName = "sys"; String[][] userConfigs = { {"monitor", "monitor"} }; Connection conn = null; for (String[] userConfig : userConfigs) { try { conn = DBUtil.getConnectionWithoutRetry(DataBaseType.OceanBase, sysJDBCUrl, String.format("%s@%s#%s", userConfig[0], tenantName, clusterName), userConfig[1]); } catch (Exception e) { LOG.warn("fail connecting to ob: " + e.getMessage()); } if (conn == null) { LOG.warn("fail to get connection with user " + userConfig[0] + ", try alternative user."); } else { break; } } if (conn == null) { throw new Exception("fail to get connection with sys tenant."); } return conn; } } ================================================ FILE: oceanbasev10writer/src/main/java/com/alibaba/datax/plugin/writer/oceanbasev10writer/util/ObWriterUtils.java ================================================ package com.alibaba.datax.plugin.writer.oceanbasev10writer.util; import com.alibaba.datax.plugin.rdbms.reader.util.ObVersion; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter.Task; import com.alibaba.datax.plugin.writer.oceanbasev10writer.Config; import org.apache.commons.lang3.RandomUtils; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.ImmutablePair; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.*; import static com.alibaba.datax.plugin.writer.oceanbasev10writer.Config.DEFAULT_SLOW_MEMSTORE_THRESHOLD; public class ObWriterUtils { private static final String MYSQL_KEYWORDS = "ACCESSIBLE,ACCOUNT,ACTION,ADD,AFTER,AGAINST,AGGREGATE,ALGORITHM,ALL,ALTER,ALWAYS,ANALYSE,AND,ANY,AS,ASC,ASCII,ASENSITIVE,AT,AUTO_INCREMENT,AUTOEXTEND_SIZE,AVG,AVG_ROW_LENGTH,BACKUP,BEFORE,BEGIN,BETWEEN,BIGINT,BINARY,BINLOG,BIT,BLOB,BLOCK,BOOL,BOOLEAN,BOTH,BTREE,BY,BYTE,CACHE,CALL,CASCADE,CASCADED,CASE,CATALOG_NAME,CHAIN,CHANGE,CHANGED,CHANNEL,CHAR,CHARACTER,CHARSET,CHECK,CHECKSUM,CIPHER,CLASS_ORIGIN,CLIENT,CLOSE,COALESCE,CODE,COLLATE,COLLATION,COLUMN,COLUMN_FORMAT,COLUMN_NAME,COLUMNS,COMMENT,COMMIT,COMMITTED,COMPACT,COMPLETION,COMPRESSED,COMPRESSION,CONCURRENT,CONDITION,CONNECTION,CONSISTENT,CONSTRAINT,CONSTRAINT_CATALOG,CONSTRAINT_NAME,CONSTRAINT_SCHEMA,CONTAINS,CONTEXT,CONTINUE,CONVERT,CPU,CREATE,CROSS,CUBE,CURRENT,CURRENT_DATE,CURRENT_TIME,CURRENT_TIMESTAMP,CURRENT_USER,CURSOR,CURSOR_NAME,DATA,DATABASE,DATABASES,DATAFILE,DATE,DATETIME,DAY,DAY_HOUR,DAY_MICROSECOND,DAY_MINUTE,DAY_SECOND,DEALLOCATE,DEC,DECIMAL,DECLARE,DEFAULT,DEFAULT_AUTH,DEFINER,DELAY_KEY_WRITE,DELAYED,DELETE,DES_KEY_FILE,DESC,DESCRIBE,DETERMINISTIC,DIAGNOSTICS,DIRECTORY,DISABLE,DISCARD,DISK,DISTINCT,DISTINCTROW,DIV,DO,DOUBLE,DROP,DUAL,DUMPFILE,DUPLICATE,DYNAMIC,EACH,ELSE,ELSEIF,ENABLE,ENCLOSED,ENCRYPTION,END,ENDS,ENGINE,ENGINES,ENUM,ERROR,ERRORS,ESCAPE,ESCAPED,EVENT,EVENTS,EVERY,EXCHANGE,EXECUTE,EXISTS,EXIT,EXPANSION,EXPIRE,EXPLAIN,EXPORT,EXTENDED,EXTENT_SIZE,FAST,FAULTS,FETCH,FIELDS,FILE,FILE_BLOCK_SIZE,FILTER,FIRST,FIXED,FLOAT,FLOAT4,FLOAT8,FLUSH,FOLLOWS,FOR,FORCE,FOREIGN,FORMAT,FOUND,FROM,FULL,FULLTEXT,FUNCTION,GENERAL,GENERATED,GEOMETRY,GEOMETRYCOLLECTION,GET,GET_FORMAT,GLOBAL,GRANT,GRANTS,GROUP,GROUP_REPLICATION,HANDLER,HASH,HAVING,HELP,HIGH_PRIORITY,HOST,HOSTS,HOUR,HOUR_MICROSECOND,HOUR_MINUTE,HOUR_SECOND,IDENTIFIED,IF,IGNORE,IGNORE_SERVER_IDS,IMPORT,IN,INDEX,INDEXES,INFILE,INITIAL_SIZE,INNER,INOUT,INSENSITIVE,INSERT,INSERT_METHOD,INSTALL,INSTANCE,INT,INT1,INT2,INT3,INT4,INT8,INTEGER,INTERVAL,INTO,INVOKER,IO,IO_AFTER_GTIDS,IO_BEFORE_GTIDS,IO_THREAD,IPC,IS,ISOLATION,ISSUER,ITERATE,JOIN,JSON,KEY,KEY_BLOCK_SIZE,KEYS,KILL,LANGUAGE,LAST,LEADING,LEAVE,LEAVES,LEFT,LESS,LEVEL,LIKE,LIMIT,LINEAR,LINES,LINESTRING,LIST,LOAD,LOCAL,LOCALTIME,LOCALTIMESTAMP,LOCK,LOCKS,LOGFILE,LOGS,LONG,LONGBLOB,LONGTEXT,LOOP,LOW_PRIORITY,MASTER,MASTER_AUTO_POSITION,MASTER_BIND,MASTER_CONNECT_RETRY,MASTER_DELAY,MASTER_HEARTBEAT_PERIOD,MASTER_HOST,MASTER_LOG_FILE,MASTER_LOG_POS,MASTER_PASSWORD,MASTER_PORT,MASTER_RETRY_COUNT,MASTER_SERVER_ID,MASTER_SSL,MASTER_SSL_CA,MASTER_SSL_CAPATH,MASTER_SSL_CERT,MASTER_SSL_CIPHER,MASTER_SSL_CRL,MASTER_SSL_CRLPATH,MASTER_SSL_KEY,MASTER_SSL_VERIFY_SERVER_CERT,MASTER_TLS_VERSION,MASTER_USER,MATCH,MAX_CONNECTIONS_PER_HOUR,MAX_QUERIES_PER_HOUR,MAX_ROWS,MAX_SIZE,MAX_STATEMENT_TIME,MAX_UPDATES_PER_HOUR,MAX_USER_CONNECTIONS,MAXVALUE,MEDIUM,MEDIUMBLOB,MEDIUMINT,MEDIUMTEXT,MEMORY,MERGE,MESSAGE_TEXT,MICROSECOND,MIDDLEINT,MIGRATE,MIN_ROWS,MINUTE,MINUTE_MICROSECOND,MINUTE_SECOND,MOD,MODE,MODIFIES,MODIFY,MONTH,MULTILINESTRING,MULTIPOINT,MULTIPOLYGON,MUTEX,MYSQL_ERRNO,NAME,NAMES,NATIONAL,NATURAL,NCHAR,NDB,NDBCLUSTER,NEVER,NEW,NEXT,NO,NO_WAIT,NO_WRITE_TO_BINLOG,NODEGROUP,NONBLOCKING,NONE,NOT,NULL,NUMBER,NUMERIC,NVARCHAR,OFFSET,OLD_PASSWORD,ON,ONE,ONLY,OPEN,OPTIMIZE,OPTIMIZER_COSTS,OPTION,OPTIONALLY,OPTIONS,OR,ORDER,OUT,OUTER,OUTFILE,OWNER,PACK_KEYS,PAGE,PARSE_GCOL_EXPR,PARSER,PARTIAL,PARTITION,PARTITIONING,PARTITIONS,PASSWORD,PHASE,PLUGIN,PLUGIN_DIR,PLUGINS,POINT,POLYGON,PORT,PRECEDES,PRECISION,PREPARE,PRESERVE,PREV,PRIMARY,PRIVILEGES,PROCEDURE,PROCESSLIST,PROFILE,PROFILES,PROXY,PURGE,QUARTER,QUERY,QUICK,RANGE,READ,READ_ONLY,READ_WRITE,READS,REAL,REBUILD,RECOVER,REDO_BUFFER_SIZE,REDOFILE,REDUNDANT,REFERENCES,REGEXP,RELAY,RELAY_LOG_FILE,RELAY_LOG_POS,RELAY_THREAD,RELAYLOG,RELEASE,RELOAD,REMOVE,RENAME,REORGANIZE,REPAIR,REPEAT,REPEATABLE,REPLACE,REPLICATE_DO_DB,REPLICATE_DO_TABLE,REPLICATE_IGNORE_DB,REPLICATE_IGNORE_TABLE,REPLICATE_REWRITE_DB,REPLICATE_WILD_DO_TABLE,REPLICATE_WILD_IGNORE_TABLE,REPLICATION,REQUIRE,RESET,RESIGNAL,RESTORE,RESTRICT,RESUME,RETURN,RETURNED_SQLSTATE,RETURNS,REVERSE,REVOKE,RIGHT,RLIKE,ROLLBACK,ROLLUP,ROTATE,ROUTINE,ROW,ROW_COUNT,ROW_FORMAT,ROWS,RTREE,SAVEPOINT,SCHEDULE,SCHEMA,SCHEMA_NAME,SCHEMAS,SECOND,SECOND_MICROSECOND,SECURITY,SELECT,SENSITIVE,SEPARATOR,SERIAL,SERIALIZABLE,SERVER,SESSION,SET,SHARE,SHOW,SHUTDOWN,SIGNAL,SIGNED,SIMPLE,SLAVE,SLOW,SMALLINT,SNAPSHOT,SOCKET,SOME,SONAME,SOUNDS,SOURCE,SPATIAL,SPECIFIC,SQL,SQL_AFTER_GTIDS,SQL_AFTER_MTS_GAPS,SQL_BEFORE_GTIDS,SQL_BIG_RESULT,SQL_BUFFER_RESULT,SQL_CACHE,SQL_CALC_FOUND_ROWS,SQL_NO_CACHE,SQL_SMALL_RESULT,SQL_THREAD,SQL_TSI_DAY,SQL_TSI_HOUR,SQL_TSI_MINUTE,SQL_TSI_MONTH,SQL_TSI_QUARTER,SQL_TSI_SECOND,SQL_TSI_WEEK,SQL_TSI_YEAR,SQLEXCEPTION,SQLSTATE,SQLWARNING,SSL,STACKED,START,STARTING,STARTS,STATS_AUTO_RECALC,STATS_PERSISTENT,STATS_SAMPLE_PAGES,STATUS,STOP,STORAGE,STORED,STRAIGHT_JOIN,STRING,SUBCLASS_ORIGIN,SUBJECT,SUBPARTITION,SUBPARTITIONS,SUPER,SUSPEND,SWAPS,SWITCHES,TABLE,TABLE_CHECKSUM,TABLE_NAME,TABLES,TABLESPACE,TEMPORARY,TEMPTABLE,TERMINATED,TEXT,THAN,THEN,TIME,TIMESTAMP,TIMESTAMPADD,TIMESTAMPDIFF,TINYBLOB,TINYINT,TINYTEXT,TO,TRAILING,TRANSACTION,TRIGGER,TRIGGERS,TRUNCATE,TYPE,TYPES,UNCOMMITTED,UNDEFINED,UNDO,UNDO_BUFFER_SIZE,UNDOFILE,UNICODE,UNINSTALL,UNION,UNIQUE,UNKNOWN,UNLOCK,UNSIGNED,UNTIL,UPDATE,UPGRADE,USAGE,USE,USE_FRM,USER,USER_RESOURCES,USING,UTC_DATE,UTC_TIME,UTC_TIMESTAMP,VALIDATION,VALUE,VALUES,VARBINARY,VARCHAR,VARCHARACTER,VARIABLES,VARYING,VIEW,VIRTUAL,WAIT,WARNINGS,WEEK,WEIGHT_STRING,WHEN,WHERE,WHILE,WITH,WITHOUT,WORK,WRAPPER,WRITE,X509,XA,XID,XML,XOR,YEAR,YEAR_MONTH,ZEROFILL,FALSE,TRUE"; private static final String ORACLE_KEYWORDS = "ACCESS,ADD,ALL,ALTER,AND,ANY,ARRAYLEN,AS,ASC,AUDIT,BETWEEN,BY,CHAR,CHECK,CLUSTER,COLUMN,COMMENT,COMPRESS,CONNECT,CREATE,CURRENT,DATE,DECIMAL,DEFAULT,DELETE,DESC,DISTINCT,DROP,ELSE,EXCLUSIVE,EXISTS,FILE,FLOAT,FOR,FROM,GRANT,GROUP,HAVING,IDENTIFIED,IMMEDIATE,IN,INCREMENT,INDEX,INITIAL,INSERT,INTEGER,INTERSECT,INTO,IS,LEVEL,LIKE,LOCK,LONG,MAXEXTENTS,MINUS,MODE,MODIFY,NOAUDIT,NOCOMPRESS,NOT,NOTFOUND,NOWAIT,NULL,NUMBER,OF,OFFLINE,ON,ONLINE,OPTION,OR,ORDER,PCTFREE,PRIOR,PRIVILEGES,PUBLIC,RAW,RENAME,RESOURCE,REVOKE,ROW,ROWID,ROWLABEL,ROWNUM,ROWS,SELECT,SESSION,SET,SHARE,SIZE,SMALLINT,SQLBUF,START,SUCCESSFUL,SYNONYM,TABLE,THEN,TO,TRIGGER,UID,UNION,UNIQUE,UPDATE,USER,VALIDATE,VALUES,VARCHAR,VARCHAR2,VIEW,WHENEVER,WHERE,WITH"; private static String CHECK_MEMSTORE = "select 1 from %s.gv$memstore t where t.total>t.mem_limit * ?"; private static final String CHECK_MEMSTORE_4_0 = "select 1 from %s.gv$ob_memstore t where t.MEMSTORE_USED>t.MEMSTORE_LIMIT * ?"; private static String CHECK_MEMSTORE_RATIO = "select min(t.total/t.mem_limit) from %s.gv$memstore t"; private static final String CHECK_MEMSTORE_RATIO_4_0 = "select min(t.MEMSTORE_USED/t.MEMSTORE_LIMIT) from %s.gv$ob_memstore t"; private static Set databaseKeywords; private static String compatibleMode = null; private static String obVersion = null; protected static final Logger LOG = LoggerFactory.getLogger(Task.class); private static Set keywordsFromString2HashSet(final String keywords) { return new HashSet(Arrays.asList(keywords.split(","))); } public static String escapeDatabaseKeyword(String keyword) { if (databaseKeywords == null) { if (isOracleMode()) { databaseKeywords = keywordsFromString2HashSet(ORACLE_KEYWORDS); } else { databaseKeywords = keywordsFromString2HashSet(MYSQL_KEYWORDS); } } char escapeChar = isOracleMode() ? '"' : '`'; if (databaseKeywords.contains(keyword.toUpperCase())) { keyword = escapeChar + keyword + escapeChar; } return keyword; } public static void escapeDatabaseKeyword(List keywords) { for (int i = 0; i < keywords.size(); i++) { keywords.set(i, escapeDatabaseKeyword(keywords.get(i))); } } public static Boolean isEscapeMode(String keyword){ if(isOracleMode()){ return keyword.startsWith("\"") && keyword.endsWith("\""); }else{ return keyword.startsWith("`") && keyword.endsWith("`"); } } public static boolean isMemstoreFull(Connection conn, double memstoreThreshold) { PreparedStatement ps = null; ResultSet rs = null; boolean result = false; try { String sysDbName = "oceanbase"; if (isOracleMode()) { sysDbName = "sys"; } ps = conn.prepareStatement(String.format(getMemStoreSql(), sysDbName)); ps.setDouble(1, memstoreThreshold); rs = ps.executeQuery(); // 只要有满足条件的,则表示当前租户 有个机器的memstore即将满 result = rs.next(); } catch (Throwable e) { LOG.error("check memstore fail" + e.getMessage()); result = false; } finally { //do not need to close the statment in ob1.0 } LOG.info("isMemstoreFull=" + result); return result; } public static double queryMemUsedRatio (Connection conn) { PreparedStatement ps = null; ResultSet rs = null; double result = 0; try { String sysDbName = "oceanbase"; if (isOracleMode()) { sysDbName = "sys"; } ps = conn.prepareStatement(String.format(getMemStoreRatioSql(), sysDbName)); rs = ps.executeQuery(); // 只要有满足条件的,则表示当前租户 有个机器的memstore即将满 if (rs.next()) { result = rs.getDouble(1); } } catch (Throwable e) { LOG.warn("Check memstore fail, reason: {}. Use a random value instead.", e.getMessage()); result = RandomUtils.nextDouble(0.3D, DEFAULT_SLOW_MEMSTORE_THRESHOLD + 0.2D); } finally { //do not need to close the statment in ob1.0 } return result; } public static boolean isOracleMode(){ return (compatibleMode.equals(Config.OB_COMPATIBLE_MODE_ORACLE)); } private static String getMemStoreSql() { if (ObVersion.valueOf(obVersion).compareTo(ObVersion.V4000) >= 0) { return CHECK_MEMSTORE_4_0; } else { return CHECK_MEMSTORE; } } private static String getMemStoreRatioSql() { if (ObVersion.valueOf(obVersion).compareTo(ObVersion.V4000) >= 0) { return CHECK_MEMSTORE_RATIO_4_0; } else { return CHECK_MEMSTORE_RATIO; } } public static String getCompatibleMode() { return compatibleMode; } public static void setCompatibleMode(String mode) { compatibleMode = mode; } public static void setObVersion(String version) { obVersion = version; } private static String buildDeleteSql (String tableName, List columns) { StringBuilder builder = new StringBuilder("DELETE FROM "); builder.append(tableName).append(" WHERE "); for (int i = 0; i < columns.size(); i++) { builder.append(columns.get(i)).append(" = ?"); if (i != columns.size() - 1) { builder.append(" and "); } } return builder.toString(); } private static int[] getColumnIndex(List columnsInIndex, List allColumns) { for (int i = 0; i < allColumns.size(); i++) { if (!ObWriterUtils.isEscapeMode(allColumns.get(i))) { allColumns.set(i, allColumns.get(i).toUpperCase()); } } int[] colIdx = new int[columnsInIndex.size()]; for (int i = 0; i < columnsInIndex.size(); i++) { int index = allColumns.indexOf(columnsInIndex.get(i)); if (index < 0) { throw new RuntimeException( String.format("column {} is in unique or primary key but not in the column list.", columnsInIndex.get(i))); } colIdx[i] = index; } return colIdx; } public static List> buildDeleteSql(Connection conn, String dbName, String tableName, List columns) { List> deleteMeta = new ArrayList(); Map> uniqueKeys = getAllUniqueIndex(conn, dbName, tableName); for (Map.Entry> entry : uniqueKeys.entrySet()) { List colNames = entry.getValue(); String deleteSql = buildDeleteSql(tableName, colNames); int[] colIdx = getColumnIndex(colNames, columns); LOG.info("delete sql [{}], column index: {}", deleteSql, Arrays.toString(colIdx)); deleteMeta.add(new ImmutablePair(deleteSql, colIdx)); } return deleteMeta; } // this function is just for oracle mode private static Map> getAllUniqueIndex(Connection conn, String dbName, String tableName) { Map> uniqueKeys = new HashMap(); if (tableName.contains("\\.")) { dbName = tableName.split("\\.")[0]; tableName = tableName.split("\\.")[1]; } dbName = dbName.toUpperCase(); String sql = String.format("select cons.CONSTRAINT_NAME AS KEY_NAME, cols.COLUMN_NAME COLUMN_NAME " + "from all_constraints cons, all_cons_columns cols " + "WHERE cols.table_name = '%s' AND cons.constraint_type in('P', 'U') " + " AND cons.constraint_name = cols.constraint_name AND cons.owner = cols.owner " + " AND cols.owner = '%s' " + "Order by KEY_NAME, cols.POSITION", tableName, dbName); LOG.info("get all unique keys by sql {}", sql); Statement stmt = null; ResultSet rs = null; try { stmt = conn.createStatement(); rs = stmt.executeQuery(sql); while (rs.next()) { String keyName = rs.getString("Key_name"); String columnName = rs.getString("Column_name"); columnName= escapeDatabaseKeyword(columnName); if(!ObWriterUtils.isEscapeMode(columnName)){ columnName = columnName.toUpperCase(); } List s = uniqueKeys.get(keyName); if (s == null) { s = new ArrayList<>(); uniqueKeys.put(keyName, s); } s.add(columnName); } } catch (Throwable e) { LOG.error("show index from table fail :" + sql, e); } finally { asyncClose(rs, stmt, null); } //ObWriterUtils.escapeDatabaseKeywords(uniqueKeys); return uniqueKeys; } /** * * @param tableName * @param columnHolders * @param conn * @param writeMode * @return */ public static String buildWriteSql(String tableName, List columnHolders, Connection conn, String writeMode, String obUpdateColumns) { List valueHolders = new ArrayList(columnHolders.size()); for (int i = 0; i < columnHolders.size(); i++) { valueHolders.add("?"); } String writeDataSqlTemplate = new StringBuilder().append("INSERT INTO " + tableName + " (") .append(StringUtils.join(columnHolders, ",")).append(") VALUES(") .append(StringUtils.join(valueHolders, ",")).append(")").toString(); LOG.info("write mode: " + writeMode); // update mode if (!writeMode.equals("insert")) { if (obUpdateColumns == null) { Set skipColumns = getSkipColumns(conn, tableName); StringBuilder columnList = new StringBuilder(); for (String column : skipColumns) { columnList.append(column).append(","); } LOG.info("Skip columns: " + columnList.toString()); writeDataSqlTemplate = writeDataSqlTemplate + onDuplicateKeyUpdateString(columnHolders, skipColumns); } else { LOG.info("Update columns: " + obUpdateColumns); writeDataSqlTemplate = writeDataSqlTemplate + onDuplicateKeyUpdateString(obUpdateColumns); } } return writeDataSqlTemplate; } private static Set getSkipColumns(Connection conn, String tableName) { String sql = "show index from " + tableName; Statement stmt = null; ResultSet rs = null; try { stmt = conn.createStatement(); rs = stmt.executeQuery(sql); Map> uniqueKeys = new HashMap>(); while (rs.next()) { String nonUnique = rs.getString("Non_unique"); if (!"0".equals(nonUnique)) { continue; } String keyName = rs.getString("Key_name"); String columnName = StringUtils.upperCase(rs.getString("Column_name")); Set s = uniqueKeys.get(keyName); if (s == null) { s = new HashSet<>(); uniqueKeys.put(keyName, s); } s.add(columnName); } // If the table has only one primary/unique key, just skip the column in the update list, // it is safe since this primary/unique key does not change when the data in this inserting // row conflicts with existing values. if (uniqueKeys.size() == 1) { return uniqueKeys.values().iterator().next(); } else if (uniqueKeys.size() > 1) { // If this table has more than one primary/unique keys, then just skip the common columns in // all primary/unique keys. These columns can be found in every the primary/unique keys so they // must be intact when there are at least one primary/unique key conflicts between the new // data and existing data. So keeping them unchanged is safe. // // We can not skip all the columns in primary/unique keys because there might be some fields // which do not conflict with existing value. If we skip them in the update list of the INSERT // statement, these fields will not get updated, then we will have some fields with new values // while some with old values in the same row, which breaks data consistency. Iterator keyNameIterator = uniqueKeys.keySet().iterator(); Set skipColumns = uniqueKeys.get(keyNameIterator.next()); while(keyNameIterator.hasNext()) { skipColumns.retainAll(uniqueKeys.get(keyNameIterator.next())); } return skipColumns; } } catch (Throwable e) { LOG.error("show index from table fail :" + sql, e); } finally { asyncClose(rs, stmt, null); } return Collections.emptySet(); } /* * build ON DUPLICATE KEY UPDATE sub clause from updateColumns user specified */ private static String onDuplicateKeyUpdateString(String updateColumns) { if (updateColumns == null || updateColumns.length() < 1) { return ""; } StringBuilder builder = new StringBuilder(); builder.append(" ON DUPLICATE KEY UPDATE "); List list = new ArrayList(); for (String column : updateColumns.split(",")) { list.add(column + "=VALUES(" + column + ")"); } builder.append(StringUtils.join(list, ',')); return builder.toString(); } private static String onDuplicateKeyUpdateString(List columnHolders, Set skipColumns) { if (columnHolders == null || columnHolders.size() < 1) { return ""; } StringBuilder builder = new StringBuilder(); builder.append(" ON DUPLICATE KEY UPDATE "); List list = new ArrayList(); for (String column : columnHolders) { // skip update columns if (skipColumns.contains(column.toUpperCase())) { continue; } list.add(column + "=VALUES(" + column + ")"); } if (!list.isEmpty()) { builder.append(StringUtils.join(list, ',')); } else { // 如果除了UK 没有别的字段,则更新第一个字段 String column = columnHolders.get(0); builder.append(column + "=VALUES(" + column + ")"); } return builder.toString(); } /** * 休眠n毫秒 * * @param ms * 毫秒 */ public static void sleep(long ms) { try { Thread.sleep(ms); } catch (InterruptedException e) { } } /** * 致命错误 * * @param e * @return */ public static boolean isFatalError(SQLException e) { String sqlState = e.getSQLState(); if (StringUtils.startsWith(sqlState, "08")) { return true; } final int errorCode = Math.abs(e.getErrorCode()); switch (errorCode) { // Communications Errors case 1040: // ER_CON_COUNT_ERROR case 1042: // ER_BAD_HOST_ERROR case 1043: // ER_HANDSHAKE_ERROR case 1047: // ER_UNKNOWN_COM_ERROR case 1081: // ER_IPSOCK_ERROR case 1129: // ER_HOST_IS_BLOCKED case 1130: // ER_HOST_NOT_PRIVILEGED // Authentication Errors case 1045: // ER_ACCESS_DENIED_ERROR // Resource errors case 1004: // ER_CANT_CREATE_FILE case 1005: // ER_CANT_CREATE_TABLE case 1015: // ER_CANT_LOCK case 1021: // ER_DISK_FULL case 1041: // ER_OUT_OF_RESOURCES case 1094: // Unknown thread id: %lu // Out-of-memory errors case 1037: // ER_OUTOFMEMORY case 1038: // ER_OUT_OF_SORTMEMORY return true; } if (StringUtils.isNotBlank(e.getMessage())) { final String errorText = e.getMessage().toUpperCase(); if (errorCode == 0 && (errorText.indexOf("COMMUNICATIONS LINK FAILURE") > -1 || errorText.indexOf("COULD NOT CREATE CONNECTION") > -1) || errorText.indexOf("NO DATASOURCE") > -1 || errorText.indexOf("NO ALIVE DATASOURCE") > -1 || errorText.indexOf("NO OPERATIONS ALLOWED AFTER CONNECTION CLOSED") > -1) { return true; } } return false; } /** * 可恢复的错误 * * @param e * @return */ public static boolean isRecoverableError(SQLException e) { int error = Math.abs(e.getErrorCode()); // 明确可恢复 if (white.contains(error)) { return true; } // 明确不可恢复 if (black.contains(error)) { return false; } // 超过4000的,都是OB特有的ErrorCode return error > 4020; } private static Set white = new HashSet(); static { int[] errList = { 1213, 1047, 1041, 1094, 4000, 4012, 4013 }; for (int err : errList) { white.add(err); } } // 不考虑4000以下的 private static Set black = new HashSet(); static { int[] errList = { 4022, 4025, 4026, 4028, 4029, 4031, 4033, 4034, 4037, 4041, 4044 }; for (int err : errList) { black.add(err); } } /** * 由于ObProxy存在bug,事务超时或事务被杀时,conn的close是没有响应的 * * @param rs * @param stmt * @param conn */ public static void asyncClose(final ResultSet rs, final Statement stmt, final Connection conn) { Thread t = new Thread() { public void run() { DBUtil.closeDBResources(rs, stmt, conn); } }; t.setDaemon(true); t.start(); } /** * */ public static enum LoadMode { /** * Fast insert */ FAST, /** * Insert slowly */ SLOW, /** * Pause to insert */ PAUSE } } ================================================ FILE: oceanbasev10writer/src/main/resources/plugin.json ================================================ { "name": "oceanbasev10writer", "class": "com.alibaba.datax.plugin.writer.oceanbasev10writer.OceanBaseV10Writer", "description": "write data into oceanbase with sql interface", "developer": "oceanbase" } ================================================ FILE: ocswriter/doc/ocswriter.md ================================================ # DataX OCSWriter 适用memcached客户端写入ocs --- ## 1 快速介绍 ### 1.1 OCS简介 开放缓存服务( Open Cache Service,简称OCS)是基于内存的缓存服务,支持海量小数据的高速访问。OCS可以极大缓解对后端存储的压力,提高网站或应用的响应速度。OCS支持Key-Value的数据结构,兼容Memcached协议的客户端都可与OCS通信。
OCS 支持即开即用的方式快速部署;对于动态Web、APP应用,可通过缓存服务减轻对数据库的压力,从而提高网站整体的响应速度。
与本地MemCache相同之处在于OCS兼容Memcached协议,与用户环境兼容,可直接用于OCS服务 不同之处在于硬件和数据部署在云端,有完善的基础设施、网络安全保障、系统维护服务。所有的这些服务,都不需要投资,只需根据使用量进行付费即可。 ### 1.2 OCSWriter简介 OCSWriter是DataX实现的,基于Memcached协议的数据写入OCS通道。 ## 2 功能说明 ### 2.1 配置样例 * 这里使用一份从内存产生的数据导入到OCS。 ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "ocswriter", "parameter": { "proxy": "xxxx", "port": "11211", "userName": "user", "password": "******", "writeMode": "set|add|replace|append|prepend", "writeFormat": "text|binary", "fieldDelimiter": "\u0001", "expireTime": 1000, "indexes": "0,2", "batchSize": 1000 } } } ] } } ``` ### 2.2 参数说明 * **proxy** * 描述:OCS机器的ip或host。 * 必选:是 * **port** * 描述:OCS的连接域名,默认为11211 * 必选:否 * 默认值:11211 * **username** * 描述:OCS连接的访问账号。 * 必选:是 * **password** * 描述:OCS连接的访问密码 * 必选:是 * **writeMode** * 描述: OCSWriter写入方式,具体为: * set: 存储这个数据,如果已经存在则覆盖 * add: 存储这个数据,当且仅当这个key不存在的时候 * replace: 存储这个数据,当且仅当这个key存在 * append: 将数据存放在已存在的key对应的内容的后面,忽略exptime * prepend: 将数据存放在已存在的key对应的内容的前面,忽略exptime * 必选:是 * **writeFormat** * 描述: OCSWriter写出数据格式,目前支持两类数据写入方式: * text: 将源端数据序列化为文本格式,其中第一个字段作为OCS写入的KEY,后续所有字段序列化为STRING类型,使用用户指定的fieldDelimiter作为间隔符,将文本拼接为完整的字符串再写入OCS。 * binary: 将源端数据作为二进制直接写入,这类场景为未来做扩展使用,目前不支持。如果填写binary将会报错! * 必选:否 * 默认值:text * **expireTime** * 描述: OCS值缓存失效时间,目前MemCache支持两类过期时间, * Unix时间(自1970.1.1开始到现在的秒数),该时间指定了到未来某个时刻数据失效。 * 相对当前时间的秒数,该时间指定了从现在开始多长时间后数据失效。 **注意:如果过期时间的秒数大于60*60*24*30(即30天),则服务端认为是Unix时间。** * 单位:秒 * 必选:否 * 默认值:0【0表示永久有效】 * **indexes** * 描述: 用数据的第几列当做ocs的key * 必选:否 * 默认值:0 * **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与OCS的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况[memcached版本暂不支持批量写]。 * 必选:否 * 默认值:256 * **fieldDelimiter** * 描述:写入ocs的key和value分隔符。比如:key=tom\u0001boston, value=28\u0001lawer\u0001male\u0001married * 必选:否 * 默认值:\u0001 ## 3 性能报告 ### 3.1 datax机器配置 ``` CPU:16核、内存:24GB、网卡:单网卡1000mbps ``` ### 3.2 任务资源配置 ``` -Xms8g -Xmx8g -XX:+HeapDumpOnOutOfMemoryError ``` ### 3.3 测试报告 | 单条数据大小 | 通道并发数 | TPS | 通道流量 | 出口流量 | 备注 | | :--------: | :--------:| :--: | :--: | :--: | :--: | | 1KB | 1 | 579 tps | 583.31KB/s | 648.63KB/s | 无 | | 1KB | 10 | 6006 tps | 5.87MB/s | 6.73MB/s | 无 | | 1KB | 100 | 49916 tps | 48.56MB/s | 55.55MB/s | 无 | | 10KB | 1 | 438 tps | 4.62MB/s | 5.07MB/s | 无 | | 10KB | 10 | 4313 tps | 45.57MB/s | 49.51MB/s | 无 | | 10KB | 100 | 10713 tps | 112.80MB/s | 123.01MB/s | 无 | | 100KB | 1 | 275 tps | 26.09MB/s | 144.90KB/s | 无。数据冗余大,压缩比高。 | | 100KB | 10 | 2492 tps | 236.33MB/s | 1.30MB/s | 无 | | 100KB | 100 | 3187 tps | 302.17MB/s | 1.77MB/s | 无 | ### 3.4 性能测试小结 1. 单条数据小于10KB时建议开启100并发。 2. 不建议10KB以上的数据写入ocs。 ================================================ FILE: ocswriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 ocswriter com.alibaba.datax datax-common ${datax-project-version} com.alibaba.datax datax-core ${datax-project-version} org.slf4j slf4j-api org.testng testng 6.8.8 test org.easymock easymock 3.3.1 test com.google.code.simple-spring-memcached spymemcached 2.8.1 com.google.guava guava 16.0.1 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} 3.2 maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: ocswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/ocswriter target/ ocswriter-0.0.1-SNAPSHOT.jar plugin/writer/ocswriter false plugin/writer/ocswriter/libs runtime ================================================ FILE: ocswriter/src/main/java/com/alibaba/datax/plugin/writer/ocswriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.ocswriter; public final class Key { public final static String USER = "username"; public final static String PASSWORD = "password"; public final static String PROXY = "proxy"; public final static String PORT = "port"; public final static String WRITE_MODE = "writeMode"; public final static String WRITE_FORMAT = "writeFormat"; public final static String FIELD_DELIMITER = "fieldDelimiter"; public final static String EXPIRE_TIME = "expireTime"; public final static String BATCH_SIZE = "batchSize"; public final static String INDEXES = "indexes"; } ================================================ FILE: ocswriter/src/main/java/com/alibaba/datax/plugin/writer/ocswriter/OcsWriter.java ================================================ package com.alibaba.datax.plugin.writer.ocswriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.writer.ocswriter.utils.ConfigurationChecker; import com.alibaba.datax.plugin.writer.ocswriter.utils.OcsWriterErrorCode; import com.google.common.annotations.VisibleForTesting; import net.spy.memcached.AddrUtil; import net.spy.memcached.ConnectionFactoryBuilder; import net.spy.memcached.MemcachedClient; import net.spy.memcached.auth.AuthDescriptor; import net.spy.memcached.auth.PlainCallbackHandler; import net.spy.memcached.internal.OperationFuture; import org.apache.commons.lang3.StringUtils; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.Set; import java.util.concurrent.Callable; import java.util.concurrent.TimeUnit; public class OcsWriter extends Writer { public static class Job extends Writer.Job { private Configuration configuration; @Override public void init() { this.configuration = super.getPluginJobConf(); //参数有效性检查 ConfigurationChecker.check(this.configuration); } @Override public void prepare() { super.prepare(); } @Override public List split(int mandatoryNumber) { ArrayList configList = new ArrayList(); for (int i = 0; i < mandatoryNumber; i++) { configList.add(this.configuration.clone()); } return configList; } @Override public void destroy() { } } public static class Task extends Writer.Task { private Configuration configuration; private MemcachedClient client; private Set indexesFromUser = new HashSet(); private String delimiter; private int expireTime; //private int batchSize; private ConfigurationChecker.WRITE_MODE writeMode; private TaskPluginCollector taskPluginCollector; @Override public void init() { this.configuration = this.getPluginJobConf(); this.taskPluginCollector = super.getTaskPluginCollector(); } @Override public void prepare() { super.prepare(); //如果用户不配置,默认为第0列 String indexStr = this.configuration.getString(Key.INDEXES, "0"); for (String index : indexStr.split(",")) { indexesFromUser.add(Integer.parseInt(index)); } //如果用户不配置,默认为\u0001 delimiter = this.configuration.getString(Key.FIELD_DELIMITER, "\u0001"); expireTime = this.configuration.getInt(Key.EXPIRE_TIME, 0); //todo 此版本不支持批量提交,待ocswriter发布新版本client后支持。batchSize = this.configuration.getInt(Key.BATCH_SIZE, 100); writeMode = ConfigurationChecker.WRITE_MODE.valueOf(this.configuration.getString(Key.WRITE_MODE)); String proxy = this.configuration.getString(Key.PROXY); //默认端口为11211 String port = this.configuration.getString(Key.PORT, "11211"); String username = this.configuration.getString(Key.USER); String password = this.configuration.getString(Key.PASSWORD); AuthDescriptor ad = new AuthDescriptor(new String[]{"PLAIN"}, new PlainCallbackHandler(username, password)); try { client = getMemcachedConn(proxy, port, ad); } catch (Exception e) { //异常不能吃掉,直接抛出,便于定位 throw DataXException.asDataXException(OcsWriterErrorCode.OCS_INIT_ERROR, String.format("初始化ocs客户端失败"), e); } } /** * 建立ocs客户端连接 * 重试9次,间隔时间指数增长 */ private MemcachedClient getMemcachedConn(final String proxy, final String port, final AuthDescriptor ad) throws Exception { return RetryUtil.executeWithRetry(new Callable() { @Override public MemcachedClient call() throws Exception { return new MemcachedClient( new ConnectionFactoryBuilder().setProtocol(ConnectionFactoryBuilder.Protocol.BINARY) .setAuthDescriptor(ad) .build(), AddrUtil.getAddresses(proxy + ":" + port)); } }, 9, 1000L, true); } @Override public void startWrite(RecordReceiver lineReceiver) { Record record; String key; String value; while ((record = lineReceiver.getFromReader()) != null) { try { key = buildKey(record); value = buildValue(record); switch (writeMode) { case set: case replace: case add: commitWithRetry(key, value); break; case append: case prepend: commit(key, value); break; default: //没有default,因为参数检查的时候已经判断,不可能出现5中模式之外的模式 } } catch (Exception e) { this.taskPluginCollector.collectDirtyRecord(record, e); } } } /** * 没有重试的commit */ private void commit(final String key, final String value) { OperationFuture future; switch (writeMode) { case set: future = client.set(key, expireTime, value); break; case add: //幂等原则:相同的输入得到相同的输出,不管调用多少次。 //所以add和replace是幂等的。 future = client.add(key, expireTime, value); break; case replace: future = client.replace(key, expireTime, value); break; //todo 【注意】append和prepend重跑任务不能支持幂等,使用需谨慎,不需要重试 case append: future = client.append(0L, key, value); break; case prepend: future = client.prepend(0L, key, value); break; default: throw DataXException.asDataXException(OcsWriterErrorCode.DIRTY_RECORD, String.format("不支持的写入模式%s", writeMode.toString())); //因为前面参数校验的时候已经判断,不可能存在5中操作之外的类型。 } //【注意】getStatus()返回为null有可能是因为get()超时导致,此种情况当做脏数据处理。但有可能数据已经成功写入ocs。 if (future == null || future.getStatus() == null || !future.getStatus().isSuccess()) { throw DataXException.asDataXException(OcsWriterErrorCode.COMMIT_FAILED, "提交数据到ocs失败"); } } /** * 提交数据到ocs,有重试机制 */ private void commitWithRetry(final String key, final String value) throws Exception { RetryUtil.executeWithRetry(new Callable() { @Override public Object call() throws Exception { commit(key, value); return null; } }, 3, 1000L, false); } /** * 构建value * 如果有二进制字段当做脏数据处理 * 如果col为null,当做脏数据处理 */ private String buildValue(Record record) { ArrayList valueList = new ArrayList(); int colNum = record.getColumnNumber(); for (int i = 0; i < colNum; i++) { Column col = record.getColumn(i); if (col != null) { String value; Column.Type type = col.getType(); switch (type) { case STRING: case BOOL: case DOUBLE: case LONG: case DATE: value = col.asString(); //【注意】value字段中如果有分隔符,当做脏数据处理 if (value != null && value.contains(delimiter)) { throw DataXException.asDataXException(OcsWriterErrorCode.DIRTY_RECORD, String.format("数据中包含分隔符:%s", value)); } break; default: //目前不支持二进制,如果遇到二进制,则当做脏数据处理 throw DataXException.asDataXException(OcsWriterErrorCode.DIRTY_RECORD, String.format("不支持的数据格式:%s", type.toString())); } valueList.add(value); } else { //如果取到的列为null,需要当做脏数据处理 throw DataXException.asDataXException(OcsWriterErrorCode.DIRTY_RECORD, String.format("record中不存在第%s个字段", i)); } } return StringUtils.join(valueList, delimiter); } /** * 构建key * 构建数据为空时当做脏数据处理 */ private String buildKey(Record record) { ArrayList keyList = new ArrayList(); for (int index : indexesFromUser) { Column col = record.getColumn(index); if (col == null) { throw DataXException.asDataXException(OcsWriterErrorCode.DIRTY_RECORD, String.format("不存在第%s列", index)); } Column.Type type = col.getType(); String value; switch (type) { case STRING: case BOOL: case DOUBLE: case LONG: case DATE: value = col.asString(); if (value != null && value.contains(delimiter)) { throw DataXException.asDataXException(OcsWriterErrorCode.DIRTY_RECORD, String.format("主键中包含分隔符:%s", value)); } keyList.add(value); break; default: //目前不支持二进制,如果遇到二进制,则当做脏数据处理 throw DataXException.asDataXException(OcsWriterErrorCode.DIRTY_RECORD, String.format("不支持的数据格式:%s", type.toString())); } } String rtn = StringUtils.join(keyList, delimiter); if (StringUtils.isBlank(rtn)) { throw DataXException.asDataXException(OcsWriterErrorCode.DIRTY_RECORD, String.format("构建主键为空,请检查indexes的配置")); } return rtn; } /** * shutdown中会有数据异步提交,需要重试。 */ @Override public void destroy() { try { RetryUtil.executeWithRetry(new Callable() { @Override public Object call() throws Exception { if (client == null || client.shutdown(10000L, TimeUnit.MILLISECONDS)) { return null; } else { throw DataXException.asDataXException(OcsWriterErrorCode.SHUTDOWN_FAILED, "关闭ocsClient失败"); } } }, 8, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OcsWriterErrorCode.SHUTDOWN_FAILED, "关闭ocsClient失败", e); } } /** * 以下为测试使用 */ @VisibleForTesting public String buildValue_test(Record record) { return this.buildValue(record); } @VisibleForTesting public String buildKey_test(Record record) { return this.buildKey(record); } @VisibleForTesting public void setIndexesFromUser(HashSet indexesFromUser) { this.indexesFromUser = indexesFromUser; } } } ================================================ FILE: ocswriter/src/main/java/com/alibaba/datax/plugin/writer/ocswriter/utils/CommonUtils.java ================================================ package com.alibaba.datax.plugin.writer.ocswriter.utils; public class CommonUtils { public static void sleepInMs(long time) { try{ Thread.sleep(time); } catch (InterruptedException e) { // } } } ================================================ FILE: ocswriter/src/main/java/com/alibaba/datax/plugin/writer/ocswriter/utils/ConfigurationChecker.java ================================================ package com.alibaba.datax.plugin.writer.ocswriter.utils; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.ocswriter.Key; import com.google.common.annotations.VisibleForTesting; import net.spy.memcached.AddrUtil; import net.spy.memcached.ConnectionFactoryBuilder; import net.spy.memcached.MemcachedClient; import net.spy.memcached.auth.AuthDescriptor; import net.spy.memcached.auth.PlainCallbackHandler; import org.apache.commons.lang3.EnumUtils; import org.apache.commons.lang3.StringUtils; public class ConfigurationChecker { public static void check(Configuration config) { paramCheck(config); hostReachableCheck(config); } public enum WRITE_MODE { set, add, replace, append, prepend } private enum WRITE_FORMAT { text } /** * 参数有效性基本检查 */ private static void paramCheck(Configuration config) { String proxy = config.getString(Key.PROXY); if (StringUtils.isBlank(proxy)) { throw DataXException.asDataXException(OcsWriterErrorCode.REQUIRED_VALUE, String.format("ocs服务地址%s不能设置为空", Key.PROXY)); } String user = config.getString(Key.USER); if (StringUtils.isBlank(user)) { throw DataXException.asDataXException(OcsWriterErrorCode.REQUIRED_VALUE, String.format("访问ocs的用户%s不能设置为空", Key.USER)); } String password = config.getString(Key.PASSWORD); if (StringUtils.isBlank(password)) { throw DataXException.asDataXException(OcsWriterErrorCode.REQUIRED_VALUE, String.format("访问ocs的用户%s不能设置为空", Key.PASSWORD)); } String port = config.getString(Key.PORT, "11211"); if (StringUtils.isBlank(port)) { throw DataXException.asDataXException(OcsWriterErrorCode.REQUIRED_VALUE, String.format("ocs端口%s不能设置为空", Key.PORT)); } String indexes = config.getString(Key.INDEXES, "0"); if (StringUtils.isBlank(indexes)) { throw DataXException.asDataXException(OcsWriterErrorCode.REQUIRED_VALUE, String.format("当做key的列编号%s不能为空", Key.INDEXES)); } for (String index : indexes.split(",")) { try { if (Integer.parseInt(index) < 0) { throw DataXException.asDataXException(OcsWriterErrorCode.ILLEGAL_PARAM_VALUE, String.format("列编号%s必须为逗号分隔的非负整数", Key.INDEXES)); } } catch (NumberFormatException e) { throw DataXException.asDataXException(OcsWriterErrorCode.ILLEGAL_PARAM_VALUE, String.format("列编号%s必须为逗号分隔的非负整数", Key.INDEXES)); } } String writerMode = config.getString(Key.WRITE_MODE); if (StringUtils.isBlank(writerMode)) { throw DataXException.asDataXException(OcsWriterErrorCode.REQUIRED_VALUE, String.format("操作方式%s不能为空", Key.WRITE_MODE)); } if (!EnumUtils.isValidEnum(WRITE_MODE.class, writerMode.toLowerCase())) { throw DataXException.asDataXException(OcsWriterErrorCode.ILLEGAL_PARAM_VALUE, String.format("不支持操作方式%s,仅支持%s", writerMode, StringUtils.join(WRITE_MODE.values(), ","))); } String writerFormat = config.getString(Key.WRITE_FORMAT, "text"); if (StringUtils.isBlank(writerFormat)) { throw DataXException.asDataXException(OcsWriterErrorCode.REQUIRED_VALUE, String.format("写入格式%s不能为空", Key.WRITE_FORMAT)); } if (!EnumUtils.isValidEnum(WRITE_FORMAT.class, writerFormat.toLowerCase())) { throw DataXException.asDataXException(OcsWriterErrorCode.ILLEGAL_PARAM_VALUE, String.format("不支持写入格式%s,仅支持%s", writerFormat, StringUtils.join(WRITE_FORMAT.values(), ","))); } int expireTime = config.getInt(Key.EXPIRE_TIME, 0); if (expireTime < 0) { throw DataXException.asDataXException(OcsWriterErrorCode.ILLEGAL_PARAM_VALUE, String.format("数据过期时间设置%s不能小于0", Key.EXPIRE_TIME)); } int batchSiz = config.getInt(Key.BATCH_SIZE, 100); if (batchSiz <= 0) { throw DataXException.asDataXException(OcsWriterErrorCode.ILLEGAL_PARAM_VALUE, String.format("批量写入大小设置%s必须大于0", Key.BATCH_SIZE)); } //fieldDelimiter不需要检查,默认为\u0001 } /** * 检查ocs服务器网络是否可达 */ private static void hostReachableCheck(Configuration config) { String proxy = config.getString(Key.PROXY); String port = config.getString(Key.PORT); String username = config.getString(Key.USER); String password = config.getString(Key.PASSWORD); AuthDescriptor ad = new AuthDescriptor(new String[] { "PLAIN" }, new PlainCallbackHandler(username, password)); try { MemcachedClient client = new MemcachedClient( new ConnectionFactoryBuilder() .setProtocol( ConnectionFactoryBuilder.Protocol.BINARY) .setAuthDescriptor(ad).build(), AddrUtil.getAddresses(proxy + ":" + port)); client.get("for_check_connectivity"); client.getVersions(); if (client.getAvailableServers().isEmpty()) { throw new RuntimeException( "没有可用的Servers: getAvailableServers() -> is empty"); } client.shutdown(); } catch (Exception e) { throw DataXException.asDataXException( OcsWriterErrorCode.HOST_UNREACHABLE, String.format("OCS[%s]服务不可用", proxy), e); } } /** * 以下为测试使用 */ @VisibleForTesting public static void paramCheck_test(Configuration configuration) { paramCheck(configuration); } @VisibleForTesting public static void hostReachableCheck_test(Configuration configuration) { hostReachableCheck(configuration); } } ================================================ FILE: ocswriter/src/main/java/com/alibaba/datax/plugin/writer/ocswriter/utils/OcsWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.ocswriter.utils; import com.alibaba.datax.common.spi.ErrorCode; public enum OcsWriterErrorCode implements ErrorCode { REQUIRED_VALUE("OcsWriterErrorCode-000", "参数不能为空"), ILLEGAL_PARAM_VALUE("OcsWriterErrorCode-001", "参数不合法"), HOST_UNREACHABLE("OcsWriterErrorCode-002", "服务不可用"), OCS_INIT_ERROR("OcsWriterErrorCode-003", "初始化ocs client失败"), DIRTY_RECORD("OcsWriterErrorCode-004", "脏数据"), SHUTDOWN_FAILED("OcsWriterErrorCode-005", "关闭ocs client失败"), COMMIT_FAILED("OcsWriterErrorCode-006", "提交数据到ocs失败"); private final String code; private final String description; private OcsWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return null; } @Override public String getDescription() { return null; } } ================================================ FILE: ocswriter/src/main/resources/plugin.json ================================================ { "name": "ocswriter", "class": "com.alibaba.datax.plugin.writer.ocswriter.OcsWriter", "description": "set|add|replace|append|prepend record into ocs.", "developer": "alibaba" } ================================================ FILE: ocswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "ocswriter", "parameter": { "proxy": "", "port": "", "userName": "", "password": "", "writeMode": "", "writeFormat": "", "fieldDelimiter": "", "expireTime": "", "indexes": "", "batchSize": "" } } ================================================ FILE: odpsreader/doc/odpsreader.md ================================================ # DataX ODPSReader --- ## 1 快速介绍 ODPSReader 实现了从 ODPS读取数据的功能,有关ODPS请参看(https://help.aliyun.com/document_detail/27800.html?spm=5176.doc27803.6.101.NxCIgY)。 在底层实现上,ODPSReader 根据你配置的 源头项目 / 表 / 分区 / 表字段 等信息,通过 `Tunnel` 从 ODPS 系统中读取数据。
注意 1、如果你需要使用ODPSReader/Writer插件,由于 AccessId/AccessKey 解密的需要,请务必使用 JDK 1.6.32 及以上版本。JDK 安装事项,请联系 PE 处理 2、ODPSReader 不是通过 ODPS SQL (select ... from ... where ... )来抽取数据的 3、注意区分你要读取的表是线上环境还是线下环境 4、目前 DataX3 依赖的 SDK 版本是: com.aliyun.odps odps-sdk-core-internal 0.13.2 ## 2 实现原理 ODPSReader 支持读取分区表、非分区表,不支持读取虚拟视图。当要读取分区表时,需要指定出具体的分区配置,比如读取 t0 表,其分区为 pt=1,ds=hangzhou 那么你需要在配置中配置该值。当要读取非分区表时,你不能提供分区配置。表字段可以依序指定全部列,也可以指定部分列,或者调整列顺序,或者指定常量字段,但是表字段中不能指定分区列(分区列不是表字段)。 注意:要特别注意 odpsServer、project、table、accessId、accessKey 的配置,因为直接影响到是否能够加载到你需要读取数据的表。很多权限问题都出现在这里。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份读出 ODPS 数据然后打印到屏幕的配置样板。 ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "odpsreader", "parameter": { "accessId": "accessId", "accessKey": "accessKey", "project": "targetProjectName", "table": "tableName", "partition": [ "pt=1,ds=hangzhou" ], "column": [ "customer_id", "nickname" ], "packageAuthorizedProject": "yourCurrentProjectName", "splitMode": "record", "odpsServer": "http://xxx/api", "tunnelServer": "http://dt.odps.aliyun.com" } }, "writer": { "name": "streamwriter", "parameter": { "fieldDelimiter": "\t", "print": "true" } } } ] } } ``` ### 3.2 参数说明 ## 参数 * **accessId** * 描述:ODPS系统登录ID
* 必选:是
* 默认值:无
* **accessKey** * 描述:ODPS系统登录Key
* 必选:是
* 默认值:无
* **project** * 描述:读取数据表所在的 ODPS 项目名称(大小写不敏感)
* 必选:是
* 默认值:无
* **table** * 描述:读取数据表的表名称(大小写不敏感)
* 必选:是
* 默认值:无
* **partition** * 描述:读取数据所在的分区信息,支持linux shell通配符,包括 * 表示0个或多个字符,?代表任意一个字符。例如现在有分区表 test,其存在 pt=1,ds=hangzhou pt=1,ds=shanghai pt=2,ds=hangzhou pt=2,ds=beijing 四个分区,如果你想读取 pt=1,ds=shanghai 这个分区的数据,那么你应该配置为: `"partition":["pt=1,ds=shanghai"]`; 如果你想读取 pt=1下的所有分区,那么你应该配置为: `"partition":["pt=1,ds=* "]`;如果你想读取整个 test 表的所有分区的数据,那么你应该配置为: `"partition":["pt=*,ds=*"]`
* 必选:如果表为分区表,则必填。如果表为非分区表,则不能填写
* 默认值:无
* **column** * 描述:读取 odps 源头表的列信息。例如现在有表 test,其字段为:id,name,age 如果你想依次读取 id,name,age 那么你应该配置为: `"column":["id","name","age"]` 或者配置为:`"column"=["*"]` 这里 * 表示依次读取表的每个字段,但是我们不推荐你配置抽取字段为 * ,因为当你的表字段顺序调整、类型变更或者个数增减,你的任务就会存在源头表列和目的表列不能对齐的风险,会直接导致你的任务运行结果不正确甚至运行失败。如果你想依次读取 name,id 那么你应该配置为: `"coulumn":["name","id"]` 如果你想在源头抽取的字段中添加常量字段(以适配目标表的字段顺序),比如你想抽取的每一行数据值为 age 列对应的值,name列对应的值,常量日期值1988-08-08 08:08:08,id 列对应的值 那么你应该配置为:`"column":["age","name","'1988-08-08 08:08:08'","id"]` 即常量列首尾用符号`'` 包住即可,我们内部实现上识别常量是通过检查你配置的每一个字段,如果发现有字段首尾都有`'`,则认为其是常量字段,其实际值为去除`'` 之后的值。 注意:ODPSReader 抽取数据表不是通过 ODPS 的 Select SQL 语句,所以不能在字段上指定函数,也不能指定分区字段名称(分区字段不属于表字段) * 必选:是
* 默认值:无
* **odpsServer** * 描述:源头表 所在 ODPS 系统的server 地址
* 必选:是
* 默认值:无
* **tunnelServer** * 描述:源头表 所在 ODPS 系统的tunnel 地址
* 必选:是
* 默认值:无
* **splitMode** * 描述:读取源头表时切分所需要的模式。默认值为 record,可不填,表示根据切分份数,按照记录数进行切分。如果你的任务目的端为 Mysql,并且是 Mysql 的多个表,那么根据现在 DataX 结构,你的源头表必须是分区表,并且每个分区依次对应目的端 Mysql 的多个分表,则此时应该配置为`"splitMode":"partition"`
* 必选:否
* 默认值:record
* **accountProvider** [待定] * 描述:读取时使用的 ODPS 账号类型。目前支持 aliyun/taobao 两种类型。默认为 aliyun,可不填
* 必选:否
* 默认值:aliyun
* **packageAuthorizedProject** * 描述:被package授权的project,即用户当前所在project
* 必选:否
* 默认值:无
* **isCompress** * 描述:是否压缩读取,bool类型: "true"表示压缩, "false"标示不压缩
* 必选:否
* 默认值:"false" : 不压缩
### 3.3 类型转换 下面列出 ODPSReader 读出类型与 DataX 内部类型的转换关系: | ODPS 数据类型| DataX 内部类型 | | -------- | ----- | | BIGINT | Long | | DOUBLE | Double | | STRING | String | | DATETIME | Date | | Boolean | Bool | ## 4 性能报告(线上环境实测) ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: use cdo_datasync; create table datax3_odpswriter_perf_10column_1kb_00( s_0 string, bool_1 boolean, bi_2 bigint, dt_3 datetime, db_4 double, s_5 string, s_6 string, s_7 string, s_8 string, s_9 string )PARTITIONED by (pt string,year string); 单行记录类似于: s_0 : 485924f6ab7f272af361cd3f7f2d23e0d764942351#$%^&fdafdasfdas%%^(*&^^&* bool_1 : true bi_2 : 1696248667889 dt_3 : 2013-07-0600: 00: 00 db_4 : 3.141592653578 s_5 : 100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209 s_6 : 100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11fdsafdsfdsa209 s_7 : 100DAFDSAFDSAHOFJDPSAWIFDISHAF;dsadsafdsahfdsajf;dsfdsa;FJDSAL;11209 s_8 : 100dafdsafdsahofjdpsawifdishaf;DSADSAFDSAHFDSAJF;dsfdsa;fjdsal;11209 s_9 : 12~!2345100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209 #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu : 24 Core Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz cache 15.36MB 2. mem : 50GB 3. net : 千兆双网卡 4. jvm : -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError 5. disc: DataX 数据不落磁盘,不统计此项 * 任务配置为: ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "odpsreader", "parameter": { "accessId": "******************************", "accessKey": "*****************************", "column": [ "*" ], "partition": [ "pt=20141010000000,year=2014" ], "odpsServer": "http://xxx/api", "project": "cdo_datasync", "table": "datax3_odpswriter_perf_10column_1kb_00", "tunnelServer": "http://xxx" } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "column": [ { "value": "485924f6ab7f272af361cd3f7f2d23e0d764942351#$%^&fdafdasfdas%%^(*&^^&*" }, { "value": "true", "type": "bool" }, { "value": "1696248667889", "type": "long" }, { "type": "date", "value": "2013-07-06 00:00:00", "dateFormat": "yyyy-mm-dd hh:mm:ss" }, { "value": "3.141592653578", "type": "double" }, { "value": "100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209" }, { "value": "100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11fdsafdsfdsa209" }, { "value": "100DAFDSAFDSAHOFJDPSAWIFDISHAF;dsadsafdsahfdsajf;dsfdsa;FJDSAL;11209" }, { "value": "100dafdsafdsahofjdpsawifdishaf;DSADSAFDSAHFDSAJF;dsfdsa;fjdsal;11209" }, { "value": "12~!2345100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209" } ] } } } ] } } ``` ### 4.2 测试报告 | 并发任务数| DataX速度(Rec/s)|DataX流量(MB/S)|网卡流量(MB/S)|DataX运行负载| |--------| --------|--------|--------|--------| |1|117507|50.20|53.7|0.62| |2|232976|99.54|108.1|0.99| |4|387382|165.51|181.3|1.98| |5|426054|182.03|202.2|2.35| |6|434793|185.76|204.7|2.77| |8|495904|211.87|230.2|2.86| |16|501596|214.31|234.7|2.84| |32|501577|214.30|234.7|2.99| |64|501625|214.32|234.7|3.22| 说明: 1. OdpsReader 影响速度最主要的是channel数目,这里到达8时已经打满网卡,过多调大反而会影响系统性能。 2. channel数目的选择,可以考虑odps表文件组织,可尝试合并小文件再进行同步调优。 ## 5 约束限制 ## FAQ(待补充) *** **Q: 你来问** A: 我来答。 *** ================================================ FILE: odpsreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT odpsreader odpsreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 org.xerial sqlite-jdbc 3.34.0 com.aliyun.odps odps-sdk-core 0.38.4-public org.mockito mockito-core 1.8.5 test org.powermock powermock-api-mockito 1.4.10 test org.powermock powermock-module-junit4 1.4.10 test org.mockito mockito-core 1.8.5 test org.powermock powermock-api-mockito 1.4.10 test org.powermock powermock-module-junit4 1.4.10 test commons-codec commons-codec 1.8 src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: odpsreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/odpsreader target/ odpsreader-0.0.1-SNAPSHOT.jar plugin/reader/odpsreader false plugin/reader/odpsreader/libs runtime ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ColumnType.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader; public enum ColumnType { PARTITION, NORMAL, CONSTANT, UNKNOWN, ; public static ColumnType asColumnType(String columnTypeString) { if ("partition".equals(columnTypeString)) { return PARTITION; } else if ("normal".equals(columnTypeString)) { return NORMAL; } else if ("constant".equals(columnTypeString)) { return CONSTANT; } else { return UNKNOWN; } } } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader; public class Constant { public final static String START_INDEX = "startIndex"; public final static String STEP_COUNT = "stepCount"; public final static String SESSION_ID = "sessionId"; public final static String IS_PARTITIONED_TABLE = "isPartitionedTable"; public static final String DEFAULT_SPLIT_MODE = "record"; public static final String PARTITION_SPLIT_MODE = "partition"; // 常量字段用COLUMN_CONSTANT_FLAG 首尾包住即可 public final static String COLUMN_CONSTANT_FLAG = "'"; public static final String PARTITION_COLUMNS = "partitionColumns"; public static final String PARSED_COLUMNS = "parsedColumns"; public static final String PARTITION_FILTER_HINT = "/*query*/"; } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/InternalColumnInfo.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader; public class InternalColumnInfo { private String columnName; private ColumnType columnType; public String getColumnName() { return columnName; } public void setColumnName(String columnName) { this.columnName = columnName; } public ColumnType getColumnType() { return columnType; } public void setColumnType(ColumnType columnType) { this.columnType = columnType; } } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader; public class Key { public final static String ACCESS_ID = "accessId"; public final static String ACCESS_KEY = "accessKey"; public final static String SECURITY_TOKEN = "securityToken"; public static final String PROJECT = "project"; public final static String TABLE = "table"; public final static String PARTITION = "partition"; public final static String ODPS_SERVER = "odpsServer"; // 线上环境不需要填写,线下环境必填 public final static String TUNNEL_SERVER = "tunnelServer"; public final static String COLUMN = "column"; // 当值为:partition 则只切分到分区;当值为:record,则当按照分区切分后达不到adviceNum时,继续按照record切分 public final static String SPLIT_MODE = "splitMode"; public final static String PACKAGE_AUTHORIZED_PROJECT = "packageAuthorizedProject"; public final static String IS_COMPRESS = "isCompress"; public final static String MAX_RETRY_TIME = "maxRetryTime"; // 分区不存在时 public final static String SUCCESS_ON_NO_PATITION="successOnNoPartition"; // preSql public final static String PRE_SQL="preSql"; // postSql public final static String POST_SQL="postSql"; } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/LocalStrings.properties ================================================ description.DATAX_R_ODPS_001=\u7F3A\u5C11\u5FC5\u586B\u53C2\u6570 description.DATAX_R_ODPS_002=\u914D\u7F6E\u503C\u4E0D\u5408\u6CD5 description.DATAX_R_ODPS_003=\u521B\u5EFAODPS Session\u5931\u8D25 description.DATAX_R_ODPS_004=\u83B7\u53D6ODPS Session\u5931\u8D25 description.DATAX_R_ODPS_005=\u8BFB\u53D6ODPS\u6570\u636E\u5931\u8D25 description.DATAX_R_ODPS_006=\u83B7\u53D6AK\u5931\u8D25 description.DATAX_R_ODPS_007=\u8BFB\u53D6\u6570\u636E\u53D1\u751F\u5F02\u5E38 description.DATAX_R_ODPS_008=\u6253\u5F00RecordReader\u5931\u8D25 description.DATAX_R_ODPS_009=ODPS\u9879\u76EE\u4E0D\u5B58\u5728 description.DATAX_R_ODPS_010=\u8868\u4E0D\u5B58\u5728 description.DATAX_R_ODPS_011=AK\u4E0D\u5B58\u5728 description.DATAX_R_ODPS_012=AK\u975E\u6CD5 description.DATAX_R_ODPS_013=AK\u62D2\u7EDD\u8BBF\u95EE description.DATAX_R_ODPS_014=splitMode\u914D\u7F6E\u9519\u8BEF description.DATAX_R_ODPS_015=ODPS\u8D26\u53F7\u7C7B\u578B\u9519\u8BEF description.DATAX_R_ODPS_016=\u4E0D\u652F\u6301\u89C6\u56FE description.DATAX_R_ODPS_017=\u5206\u533A\u914D\u7F6E\u9519\u8BEF description.DATAX_R_ODPS_018=\u5206\u533A\u4E0D\u5B58\u5728 description.DATAX_R_ODPS_019=\u6267\u884CODPS SQL\u5931\u8D25 description.DATAX_R_ODPS_020=\u6267\u884CODPS SQL\u53D1\u751F\u5F02\u5E38 solution.DATAX_R_ODPS_001=\u8BF7\u4FEE\u6539\u914D\u7F6E\u6587\u4EF6 solution.DATAX_R_ODPS_002=\u8BF7\u4FEE\u6539\u914D\u7F6E\u503C solution.DATAX_R_ODPS_003=\u8BF7\u786E\u5B9A\u914D\u7F6E\u7684AK\u6216\u8054\u7CFBODPS\u7BA1\u7406\u5458 solution.DATAX_R_ODPS_004=\u8BF7\u8054\u7CFBODPS\u7BA1\u7406\u5458 solution.DATAX_R_ODPS_005=\u8BF7\u8054\u7CFBODPS\u7BA1\u7406\u5458 solution.DATAX_R_ODPS_006=\u8BF7\u786E\u5B9A\u914D\u7F6E\u7684AK solution.DATAX_R_ODPS_007=\u8BF7\u8054\u7CFBODPS\u7BA1\u7406\u5458 solution.DATAX_R_ODPS_008=\u8BF7\u8054\u7CFBODPS\u7BA1\u7406\u5458 solution.DATAX_R_ODPS_009=\u8BF7\u786E\u5B9A\u914D\u7F6E\u7684\u9879\u76EE\u540D solution.DATAX_R_ODPS_010=\u8BF7\u786E\u5B9A\u914D\u7F6E\u7684\u8868\u540D solution.DATAX_R_ODPS_011=\u8BF7\u786E\u5B9A\u914D\u7F6E\u7684AK solution.DATAX_R_ODPS_012=\u8BF7\u4FEE\u6539AK solution.DATAX_R_ODPS_013=\u8BF7\u786E\u5B9AAK\u5728\u9879\u76EE\u4E2D\u7684\u6743\u9650 solution.DATAX_R_ODPS_014=\u8BF7\u4FEE\u6539splitMode\u503C solution.DATAX_R_ODPS_015=\u8BF7\u4FEE\u6539\u8D26\u53F7\u7C7B\u578B solution.DATAX_R_ODPS_016=\u8BF7\u4FEE\u6539\u914D\u7F6E\u6587\u4EF6 solution.DATAX_R_ODPS_017=\u8BF7\u4FEE\u6539\u5206\u533A\u503C solution.DATAX_R_ODPS_018=\u8BF7\u4FEE\u6539\u914D\u7F6E\u7684\u5206\u533A\u503C solution.DATAX_R_ODPS_019=\u8BF7\u8054\u7CFBODPS\u7BA1\u7406\u5458 solution.DATAX_R_ODPS_020=\u8BF7\u8054\u7CFBODPS\u7BA1\u7406\u5458 odpsreader.1=\u6E90\u5934\u8868:{0} \u662F\u865A\u62DF\u89C6\u56FE\uFF0CDataX \u4E0D\u652F\u6301\u8BFB\u53D6\u865A\u62DF\u89C6\u56FE. odpsreader.2=\u60A8\u6240\u914D\u7F6E\u7684 splitMode:{0} \u4E0D\u6B63\u786E. splitMode \u4EC5\u5141\u8BB8\u914D\u7F6E\u4E3A record \u6216\u8005 partition. odpsreader.3=\u5206\u533A\u4FE1\u606F\u6CA1\u6709\u914D\u7F6E.\u7531\u4E8E\u6E90\u5934\u8868:{0} \u4E3A\u5206\u533A\u8868, \u6240\u4EE5\u60A8\u9700\u8981\u914D\u7F6E\u5176\u62BD\u53D6\u7684\u8868\u7684\u5206\u533A\u4FE1\u606F. \u683C\u5F0F\u5F62\u5982:pt=hello,ds=hangzhou\uFF0C\u8BF7\u60A8\u53C2\u8003\u6B64\u683C\u5F0F\u4FEE\u6539\u8BE5\u914D\u7F6E\u9879. odpsreader.4=\u5206\u533A\u4FE1\u606F\u914D\u7F6E\u9519\u8BEF.\u6E90\u5934\u8868:{0} \u867D\u7136\u4E3A\u5206\u533A\u8868, \u4F46\u5176\u5B9E\u9645\u5206\u533A\u503C\u5E76\u4E0D\u5B58\u5728. \u8BF7\u786E\u8BA4\u6E90\u5934\u8868\u5DF2\u7ECF\u751F\u6210\u8BE5\u5206\u533A\uFF0C\u518D\u8FDB\u884C\u6570\u636E\u62BD\u53D6. odpsreader.5=\u5206\u533A\u914D\u7F6E\u9519\u8BEF\uFF0C\u6839\u636E\u60A8\u6240\u914D\u7F6E\u7684\u5206\u533A\u6CA1\u6709\u5339\u914D\u5230\u6E90\u5934\u8868\u4E2D\u7684\u5206\u533A. \u6E90\u5934\u8868\u6240\u6709\u5206\u533A\u662F:[\n{0}\n], \u60A8\u914D\u7F6E\u7684\u5206\u533A\u662F:[\n{1}\n]. \u8BF7\u60A8\u6839\u636E\u5B9E\u9645\u60C5\u51B5\u518D\u4F5C\u51FA\u4FEE\u6539. odpsreader.6=\u5206\u533A\u914D\u7F6E\u9519\u8BEF\uFF0C\u6E90\u5934\u8868:{0} \u4E3A\u975E\u5206\u533A\u8868, \u60A8\u4E0D\u80FD\u914D\u7F6E\u5206\u533A. \u8BF7\u60A8\u5220\u9664\u8BE5\u914D\u7F6E\u9879. odpsreader.7=\u6E90\u5934\u8868:{0} \u7684\u6240\u6709\u5206\u533A\u5217\u662F:[{1}] odpsreader.8=\u5206\u533A\u914D\u7F6E\u9519\u8BEF, \u60A8\u6240\u914D\u7F6E\u7684\u5206\u533A\u7EA7\u6570\u548C\u8BE5\u8868\u7684\u5B9E\u9645\u60C5\u51B5\u4E0D\u4E00\u81F4, \u6BD4\u5982\u5206\u533A:[{0}] \u662F {1} \u7EA7\u5206\u533A, \u800C\u5206\u533A:[{2}] \u662F {3} \u7EA7\u5206\u533A. DataX \u662F\u901A\u8FC7\u82F1\u6587\u9017\u53F7\u5224\u65AD\u60A8\u6240\u914D\u7F6E\u7684\u5206\u533A\u7EA7\u6570\u7684. \u6B63\u786E\u7684\u683C\u5F0F\u5F62\u5982\"pt=$'{bizdate'}, type=0\" \uFF0C\u8BF7\u60A8\u53C2\u8003\u793A\u4F8B\u4FEE\u6539\u8BE5\u914D\u7F6E\u9879. odpsreader.9=\u5206\u533A\u914D\u7F6E\u9519\u8BEF, \u60A8\u6240\u914D\u7F6E\u7684\u5206\u533A:{0} \u7684\u7EA7\u6570:{1} \u4E0E\u60A8\u8981\u8BFB\u53D6\u7684 ODPS \u6E90\u5934\u8868\u7684\u5206\u533A\u7EA7\u6570:{2} \u4E0D\u76F8\u7B49. DataX \u662F\u901A\u8FC7\u82F1\u6587\u9017\u53F7\u5224\u65AD\u60A8\u6240\u914D\u7F6E\u7684\u5206\u533A\u7EA7\u6570\u7684.\u6B63\u786E\u7684\u683C\u5F0F\u5F62\u5982\"pt=$'{bizdate'}, type=0\" \uFF0C\u8BF7\u60A8\u53C2\u8003\u793A\u4F8B\u4FEE\u6539\u8BE5\u914D\u7F6E\u9879. odpsreader.10=\u6E90\u5934\u8868:{0} \u7684\u6240\u6709\u5B57\u6BB5\u662F:[{1}] odpsreader.11=\u8FD9\u662F\u4E00\u6761\u8B66\u544A\u4FE1\u606F\uFF0C\u60A8\u914D\u7F6E\u7684 ODPS \u8BFB\u53D6\u7684\u5217\u4E3A*\uFF0C\u8FD9\u662F\u4E0D\u63A8\u8350\u7684\u884C\u4E3A\uFF0C\u56E0\u4E3A\u5F53\u60A8\u7684\u8868\u5B57\u6BB5\u4E2A\u6570\u3001\u7C7B\u578B\u6709\u53D8\u52A8\u65F6\uFF0C\u53EF\u80FD\u5F71\u54CD\u4EFB\u52A1\u6B63\u786E\u6027\u751A\u81F3\u4F1A\u8FD0\u884C\u51FA\u9519. \u5EFA\u8BAE\u60A8\u628A\u6240\u6709\u9700\u8981\u62BD\u53D6\u7684\u5217\u90FD\u914D\u7F6E\u4E0A. odpsreader.12=\u6E90\u5934\u8868:{0} \u7684\u5206\u533A:{1} \u6CA1\u6709\u5185\u5BB9\u53EF\u62BD\u53D6, \u8BF7\u60A8\u77E5\u6653. odpsreader.13=\u6E90\u5934\u8868:{0} \u7684\u5206\u533A:{1} \u8BFB\u53D6\u884C\u6570\u4E3A\u8D1F\u6570, \u8BF7\u8054\u7CFB ODPS \u7BA1\u7406\u5458\u67E5\u770B\u8868\u72B6\u6001! odpsreader.14=\u6E90\u5934\u8868:{0} \u7684\u5206\u533A:{1} \u8BFB\u53D6\u5931\u8D25, \u8BF7\u8054\u7CFB ODPS \u7BA1\u7406\u5458\u67E5\u770B\u9519\u8BEF\u8BE6\u60C5. readerproxy.1=odps-read-exception, \u91CD\u8BD5\u7B2C{0}\u6B21 readerproxy.2=\u60A8\u7684\u5206\u533A [{0}] \u89E3\u6790\u51FA\u73B0\u9519\u8BEF,\u89E3\u6790\u540E\u6B63\u786E\u7684\u914D\u7F6E\u65B9\u5F0F\u7C7B\u4F3C\u4E3A [ pt=1,dt=1 ]. readerproxy.3=\u8868\u6240\u6709\u5206\u533A\u4FE1\u606F\u4E3A: {0} \u5176\u4E2D\u627E\u4E0D\u5230 [{1}] \u5BF9\u5E94\u7684\u5206\u533A\u503C. readerproxy.4=\u60A8\u8BFB\u53D6\u5206\u533A [{0}] \u51FA\u73B0\u65E5\u671F\u8F6C\u6362\u5F02\u5E38, \u65E5\u671F\u7684\u5B57\u7B26\u4E32\u8868\u793A\u4E3A [{1}]. readerproxy.5=DataX \u62BD\u53D6 ODPS \u6570\u636E\u4E0D\u652F\u6301\u5B57\u6BB5\u7C7B\u578B\u4E3A:[{0}]. \u76EE\u524D\u652F\u6301\u62BD\u53D6\u7684\u5B57\u6BB5\u7C7B\u578B\u6709\uFF1Abigint, boolean, datetime, double, decimal, string. \u60A8\u53EF\u4EE5\u9009\u62E9\u4E0D\u62BD\u53D6 DataX \u4E0D\u652F\u6301\u7684\u5B57\u6BB5\u6216\u8005\u8054\u7CFB ODPS \u7BA1\u7406\u5458\u5BFB\u6C42\u5E2E\u52A9. ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/OdpsReader.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.FilterUtil; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.reader.odpsreader.util.*; import com.alibaba.fastjson2.JSON; import com.aliyun.odps.Column; import com.aliyun.odps.Odps; import com.aliyun.odps.Table; import com.aliyun.odps.TableSchema; import com.aliyun.odps.tunnel.TableTunnel.DownloadSession; import com.aliyun.odps.type.TypeInfo; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; public class OdpsReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OdpsReaderErrorCode.class, Locale.ENGLISH, MessageSource.timeZone); private Configuration originalConfig; private boolean successOnNoPartition; private Odps odps; private Table table; @Override public void preCheck() { this.init(); this.prepare(); } @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.successOnNoPartition = this.originalConfig.getBool(Key.SUCCESS_ON_NO_PATITION, false); //检查必要的参数配置 OdpsUtil.checkNecessaryConfig(this.originalConfig); //重试次数的配置检查 OdpsUtil.dealMaxRetryTime(this.originalConfig); //确定切分模式 dealSplitMode(this.originalConfig); this.odps = OdpsUtil.initOdps(this.originalConfig); } private void initOdpsTableInfo() { String tableName = this.originalConfig.getString(Key.TABLE); String projectName = this.originalConfig.getString(Key.PROJECT); this.table = OdpsUtil.getTable(this.odps, projectName, tableName); this.originalConfig.set(Constant.IS_PARTITIONED_TABLE, OdpsUtil.isPartitionedTable(table)); boolean isVirtualView = this.table.isVirtualView(); if (isVirtualView) { throw DataXException.asDataXException(OdpsReaderErrorCode.VIRTUAL_VIEW_NOT_SUPPORT, MESSAGE_SOURCE.message("odpsreader.1", tableName)); } this.dealPartition(this.table); this.dealColumn(this.table); } private void dealSplitMode(Configuration originalConfig) { String splitMode = originalConfig.getString(Key.SPLIT_MODE, Constant.DEFAULT_SPLIT_MODE).trim(); if (splitMode.equalsIgnoreCase(Constant.DEFAULT_SPLIT_MODE) || splitMode.equalsIgnoreCase(Constant.PARTITION_SPLIT_MODE)) { originalConfig.set(Key.SPLIT_MODE, splitMode); } else { throw DataXException.asDataXException(OdpsReaderErrorCode.SPLIT_MODE_ERROR, MESSAGE_SOURCE.message("odpsreader.2", splitMode)); } } /** * 对分区的配置处理。最终效果是所有正则配置,完全展开成实际对应的分区配置。正则规则如下: *

*

    *
  1. 如果是分区表,则必须配置分区:可以配置为*,表示整表读取;也可以配置为分别列出要读取的叶子分区.
    TODO * 未来会支持一些常用的分区正则筛选配置. 分区配置中,不能在分区所表示的数组中配置多个*,因为那样就是多次读取全表,无意义.
  2. *
  3. 如果是非分区表,则不能配置分区值.
  4. *
*/ private void dealPartition(Table table) { List userConfiguredPartitions = this.originalConfig.getList( Key.PARTITION, String.class); boolean isPartitionedTable = this.originalConfig.getBool(Constant.IS_PARTITIONED_TABLE); List partitionColumns = new ArrayList(); if (isPartitionedTable) { // 分区表,需要配置分区 if (null == userConfiguredPartitions || userConfiguredPartitions.isEmpty()) { throw DataXException.asDataXException(OdpsReaderErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsreader.3", table.getName())); } else { // 获取分区列名, 支持用户配置分区列同步 for (Column column : table.getSchema().getPartitionColumns()) { partitionColumns.add(column.getName()); } List allPartitions = OdpsUtil.getTableAllPartitions(table); List parsedPartitions = expandUserConfiguredPartition( table, allPartitions, userConfiguredPartitions, partitionColumns.size()); if (null == parsedPartitions || parsedPartitions.isEmpty()) { if (!this.successOnNoPartition) { // PARTITION_NOT_EXISTS_ERROR 这个异常ErrorCode在AdsWriter有使用,用户判断空分区Load Data任务不报错 // 其他类型的异常不要使用这个错误码 throw DataXException.asDataXException( OdpsReaderErrorCode.PARTITION_NOT_EXISTS_ERROR, MESSAGE_SOURCE.message("odpsreader.5", StringUtils.join(allPartitions, "\n"), StringUtils.join(userConfiguredPartitions, "\n"))); } else { LOG.warn( String.format( "The partition configuration is wrong, " + "but you have configured the successOnNoPartition to be true to ignore the error. " + "According to the partition you have configured, it does not match the partition in the source table. " + "All the partitions in the source table are:[\n%s\n], the partition you configured is:[\n%s\n]. " + "please revise it according to the actual situation.", StringUtils.join(allPartitions, "\n"), StringUtils.join(userConfiguredPartitions, "\n"))); } } LOG.info(String .format("expand user configured partitions are : %s", JSON.toJSONString(parsedPartitions))); this.originalConfig.set(Key.PARTITION, parsedPartitions); } } else { // 非分区表,则不能配置分区 if (null != userConfiguredPartitions && !userConfiguredPartitions.isEmpty()) { throw DataXException.asDataXException(OdpsReaderErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsreader.6", table.getName())); } } this.originalConfig.set(Constant.PARTITION_COLUMNS, partitionColumns); if (isPartitionedTable) { LOG.info(MESSAGE_SOURCE.message("odpsreader.7", table.getName(), StringUtils.join(partitionColumns, ","))); } } /** * 将用户配置的分区(可能是直接的分区配置 dt=20170101, 可能是简单正则dt=201701*, 也可能是区间过滤条件 dt>=20170101 and dt<20170130) 和ODPS * table所有的分区进行匹配,过滤出用户希望同步的分区集合 * * @param table odps table * @param allPartitions odps table所有的分区 * @param userConfiguredPartitions 用户配置的分区 * @param tableOriginalPartitionDepth odps table分区级数(一级分区,二级分区,三级分区等) * @return 返回过滤出的分区 */ private List expandUserConfiguredPartition(Table table, List allPartitions, List userConfiguredPartitions, int tableOriginalPartitionDepth) { UserConfiguredPartitionClassification userConfiguredPartitionClassification = OdpsUtil .classifyUserConfiguredPartitions(userConfiguredPartitions); if (userConfiguredPartitionClassification.isIncludeHintPartition()) { List expandUserConfiguredPartitionResult = new ArrayList(); // 处理不包含/*query*/的分区过滤 if (!userConfiguredPartitionClassification.getUserConfiguredNormalPartition().isEmpty()) { expandUserConfiguredPartitionResult.addAll(expandNoHintUserConfiguredPartition(allPartitions, userConfiguredPartitionClassification.getUserConfiguredNormalPartition(), tableOriginalPartitionDepth)); } if (!allPartitions.isEmpty()) { expandUserConfiguredPartitionResult.addAll(expandHintUserConfiguredPartition(table, allPartitions, userConfiguredPartitionClassification.getUserConfiguredHintPartition())); } return expandUserConfiguredPartitionResult; } else { return expandNoHintUserConfiguredPartition(allPartitions, userConfiguredPartitions, tableOriginalPartitionDepth); } } /** * 匹配包含 HINT 条件的过滤 * * @param table odps table * @param allPartitions odps table所有的分区 * @param userHintConfiguredPartitions 用户配置的分区 * @return 返回过滤出的分区 */ private List expandHintUserConfiguredPartition(Table table, List allPartitions, List userHintConfiguredPartitions) { try { // load odps table all partitions into sqlite memory database SqliteUtil sqliteUtil = new SqliteUtil(); sqliteUtil.loadAllPartitionsIntoSqlite(table, allPartitions); return sqliteUtil.selectUserConfiguredPartition(userHintConfiguredPartitions); } catch (Exception ex) { throw DataXException.asDataXException(OdpsReaderErrorCode.PARTITION_ERROR, String.format("Expand user configured partition has exception: %s", ex.getMessage()), ex); } } /** * 匹配没有 HINT 条件的过滤,包括 简单正则匹配(dt=201701*) 和 直接匹配(dt=20170101) * * @param allPartitions odps table所有的分区 * @param userNormalConfiguredPartitions 用户配置的分区 * @param tableOriginalPartitionDepth odps table分区级数(一级分区,二级分区,三级分区等) * @return 返回过滤出的分区 */ private List expandNoHintUserConfiguredPartition(List allPartitions, List userNormalConfiguredPartitions, int tableOriginalPartitionDepth) { // 对odps 本身的所有分区进行特殊字符的处理 LOG.info("format partition with rules: remove all space; remove all '; replace / to ,"); // 表里面已有分区量比较大,有些任务无关,没有打印 List allStandardPartitions = OdpsUtil .formatPartitions(allPartitions); // 对用户自身配置的所有分区进行特殊字符的处理 List allStandardUserConfiguredPartitions = OdpsUtil .formatPartitions(userNormalConfiguredPartitions); LOG.info("user configured partition: {}", JSON.toJSONString(userNormalConfiguredPartitions)); LOG.info("formated partition: {}", JSON.toJSONString(allStandardUserConfiguredPartitions)); /** * 对配置的分区级数(深度)进行检查 * (1)先检查用户配置的分区级数,自身级数是否相等 * (2)检查用户配置的分区级数是否与源头表的的分区级数一样 */ String firstPartition = allStandardUserConfiguredPartitions.get(0); int firstPartitionDepth = firstPartition.split(",").length; String comparedPartition = null; int comparedPartitionDepth = -1; for (int i = 1, len = allStandardUserConfiguredPartitions.size(); i < len; i++) { comparedPartition = allStandardUserConfiguredPartitions.get(i); comparedPartitionDepth = comparedPartition.split(",").length; if (comparedPartitionDepth != firstPartitionDepth) { throw DataXException.asDataXException(OdpsReaderErrorCode.PARTITION_ERROR, MESSAGE_SOURCE .message("odpsreader.8", firstPartition, firstPartitionDepth, comparedPartition, comparedPartitionDepth)); } } if (firstPartitionDepth != tableOriginalPartitionDepth) { throw DataXException.asDataXException(OdpsReaderErrorCode.PARTITION_ERROR, MESSAGE_SOURCE .message("odpsreader.9", firstPartition, firstPartitionDepth, tableOriginalPartitionDepth)); } List retPartitions = FilterUtil.filterByRegulars(allStandardPartitions, allStandardUserConfiguredPartitions); return retPartitions; } private void dealColumn(Table table) { // 用户配置的 column 之前已经确保其不为空 List userConfiguredColumns = this.originalConfig.getList( Key.COLUMN, String.class); List allColumns = OdpsUtil.getTableAllColumns(table); List allNormalColumns = OdpsUtil .getTableOriginalColumnNameList(allColumns); StringBuilder columnMeta = new StringBuilder(); for (Column column : allColumns) { columnMeta.append(column.getName()).append(":").append(column.getType()).append(","); } columnMeta.setLength(columnMeta.length() - 1); LOG.info(MESSAGE_SOURCE.message("odpsreader.10", table.getName(), columnMeta.toString())); if (1 == userConfiguredColumns.size() && "*".equals(userConfiguredColumns.get(0))) { LOG.warn(MESSAGE_SOURCE.message("odpsreader.11")); this.originalConfig.set(Key.COLUMN, allNormalColumns); } userConfiguredColumns = this.originalConfig.getList( Key.COLUMN, String.class); /** * warn: 字符串常量需要与表原生字段tableOriginalColumnNameList 分开存放 demo: * ["id","'id'","name"] */ List allPartitionColumns = this.originalConfig.getList( Constant.PARTITION_COLUMNS, String.class); List parsedColumns = OdpsUtil .parseColumns(allNormalColumns, allPartitionColumns, userConfiguredColumns); this.originalConfig.set(Constant.PARSED_COLUMNS, parsedColumns); StringBuilder sb = new StringBuilder(); sb.append("[ "); for (int i = 0, len = parsedColumns.size(); i < len; i++) { InternalColumnInfo pair = parsedColumns.get(i); sb.append(String.format(" %s : %s", pair.getColumnName(), pair.getColumnType())); if (i != len - 1) { sb.append(","); } } sb.append(" ]"); LOG.info("parsed column details: {} .", sb.toString()); } @Override public void prepare() { List preSqls = this.originalConfig.getList(Key.PRE_SQL, String.class); if (preSqls != null && !preSqls.isEmpty()) { LOG.info( String.format("Beigin to exectue preSql : %s. \n Attention: these preSqls must be idempotent!!!", JSON.toJSONString(preSqls))); long beginTime = System.currentTimeMillis(); StringBuffer preSqlBuffer = new StringBuffer(); for (String preSql : preSqls) { preSql = preSql.trim(); if (StringUtils.isNotBlank(preSql) && !preSql.endsWith(";")) { preSql = String.format("%s;", preSql); } if (StringUtils.isNotBlank(preSql)) { preSqlBuffer.append(preSql); } } if (StringUtils.isNotBlank(preSqlBuffer.toString())) { OdpsUtil.runSqlTaskWithRetry(this.odps, preSqlBuffer.toString(), "preSql"); } else { LOG.info("skip to execute the preSql: {}", JSON.toJSONString(preSqls)); } long endTime = System.currentTimeMillis(); LOG.info( String.format("Exectue odpsreader preSql successfully! cost time: %s ms.", (endTime - beginTime))); } this.initOdpsTableInfo(); } @Override public List split(int adviceNumber) { return OdpsSplitUtil.doSplit(this.originalConfig, this.odps, adviceNumber); } @Override public void post() { List postSqls = this.originalConfig.getList(Key.POST_SQL, String.class); if (postSqls != null && !postSqls.isEmpty()) { LOG.info( String.format("Beigin to exectue postSql : %s. \n Attention: these postSqls must be idempotent!!!", JSON.toJSONString(postSqls))); long beginTime = System.currentTimeMillis(); StringBuffer postSqlBuffer = new StringBuffer(); for (String postSql : postSqls) { postSql = postSql.trim(); if (StringUtils.isNotBlank(postSql) && !postSql.endsWith(";")) { postSql = String.format("%s;", postSql); } if (StringUtils.isNotBlank(postSql)) { postSqlBuffer.append(postSql); } } if (StringUtils.isNotBlank(postSqlBuffer.toString())) { OdpsUtil.runSqlTaskWithRetry(this.odps, postSqlBuffer.toString(), "postSql"); } else { LOG.info("skip to execute the postSql: {}", JSON.toJSONString(postSqls)); } long endTime = System.currentTimeMillis(); LOG.info( String.format("Exectue odpsreader postSql successfully! cost time: %s ms.", (endTime - beginTime))); } } @Override public void destroy() { } } public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OdpsReader.class); private Configuration readerSliceConf; private String tunnelServer; private Odps odps = null; private Table table = null; private String projectName = null; private String tableName = null; private boolean isPartitionedTable; private String sessionId; private boolean isCompress; private boolean successOnNoPartition; @Override public void init() { this.readerSliceConf = super.getPluginJobConf(); this.tunnelServer = this.readerSliceConf.getString( Key.TUNNEL_SERVER, null); this.odps = OdpsUtil.initOdps(this.readerSliceConf); this.projectName = this.readerSliceConf.getString(Key.PROJECT); this.tableName = this.readerSliceConf.getString(Key.TABLE); this.table = OdpsUtil.getTable(this.odps, projectName, tableName); this.isPartitionedTable = this.readerSliceConf .getBool(Constant.IS_PARTITIONED_TABLE); this.sessionId = this.readerSliceConf.getString(Constant.SESSION_ID, null); this.isCompress = this.readerSliceConf.getBool(Key.IS_COMPRESS, false); this.successOnNoPartition = this.readerSliceConf.getBool(Key.SUCCESS_ON_NO_PATITION, false); // sessionId 为空的情况是:切分级别只到 partition 的情况 String partition = this.readerSliceConf.getString(Key.PARTITION); // 没有分区读取时, 是没有sessionId这些的 if (this.isPartitionedTable && StringUtils.isBlank(partition) && this.successOnNoPartition) { LOG.warn("Partition is blank, but you config successOnNoPartition[true] ,don't need to create session"); } else if (StringUtils.isBlank(this.sessionId)) { DownloadSession session = OdpsUtil.createMasterSessionForPartitionedTable(odps, tunnelServer, projectName, tableName, this.readerSliceConf.getString(Key.PARTITION)); this.sessionId = session.getId(); } LOG.info("sessionId:{}", this.sessionId); } @Override public void prepare() { } @Override public void startRead(RecordSender recordSender) { DownloadSession downloadSession = null; String partition = this.readerSliceConf.getString(Key.PARTITION); if (this.isPartitionedTable && StringUtils.isBlank(partition) && this.successOnNoPartition) { LOG.warn(String.format( "Partition is blank,not need to be read")); recordSender.flush(); return; } if (this.isPartitionedTable) { downloadSession = OdpsUtil.getSlaveSessionForPartitionedTable(this.odps, this.sessionId, this.tunnelServer, this.projectName, this.tableName, partition); } else { downloadSession = OdpsUtil.getSlaveSessionForNonPartitionedTable(this.odps, this.sessionId, this.tunnelServer, this.projectName, this.tableName); } long start = this.readerSliceConf.getLong(Constant.START_INDEX, 0); long count = this.readerSliceConf.getLong(Constant.STEP_COUNT, downloadSession.getRecordCount()); if (count > 0) { LOG.info(String.format( "Begin to read ODPS table:%s, partition:%s, startIndex:%s, count:%s.", this.tableName, partition, start, count)); } else if (count == 0) { LOG.warn(MESSAGE_SOURCE.message("odpsreader.12", this.tableName, partition)); return; } else { throw DataXException.asDataXException(OdpsReaderErrorCode.READ_DATA_FAIL, MESSAGE_SOURCE.message("odpsreader.13", this.tableName, partition)); } TableSchema tableSchema = this.table.getSchema(); Set allColumns = new HashSet(); allColumns.addAll(tableSchema.getColumns()); allColumns.addAll(tableSchema.getPartitionColumns()); Map columnTypeMap = new HashMap(); for (Column column : allColumns) { columnTypeMap.put(column.getName(), column.getTypeInfo()); } try { List parsedColumns = this.readerSliceConf.getListWithJson(Constant.PARSED_COLUMNS, InternalColumnInfo.class); ReaderProxy readerProxy = new ReaderProxy(recordSender, downloadSession, columnTypeMap, parsedColumns, partition, this.isPartitionedTable, start, count, this.isCompress, this.readerSliceConf); readerProxy.doRead(); } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.READ_DATA_FAIL, MESSAGE_SOURCE.message("odpsreader.14", this.tableName, partition), e); } } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/OdpsReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader; import com.alibaba.datax.common.spi.ErrorCode; import com.alibaba.datax.common.util.MessageSource; public enum OdpsReaderErrorCode implements ErrorCode { REQUIRED_VALUE("DATAX_R_ODPS_001", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_001"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_001")), ILLEGAL_VALUE("DATAX_R_ODPS_002", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_002"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_002")), CREATE_DOWNLOADSESSION_FAIL("DATAX_R_ODPS_003", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_003"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_003")), GET_DOWNLOADSESSION_FAIL("DATAX_R_ODPS_004", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_004"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_004")), READ_DATA_FAIL("DATAX_R_ODPS_005", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_005"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_005")), GET_ID_KEY_FAIL("DATAX_R_ODPS_006", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_006"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_006")), ODPS_READ_EXCEPTION("DATAX_R_ODPS_007", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_007"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_007")), OPEN_RECORD_READER_FAILED("DATAX_R_ODPS_008", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_008"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_008")), ODPS_PROJECT_NOT_FOUNT("DATAX_R_ODPS_009", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_009"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_009")), //ODPS-0420111: Project not found ODPS_TABLE_NOT_FOUNT("DATAX_R_ODPS_010", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_010"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_010")), // ODPS-0130131:Table not found ODPS_ACCESS_KEY_ID_NOT_FOUND("DATAX_R_ODPS_011", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_011"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_011")), //ODPS-0410051:Invalid credentials - accessKeyId not found ODPS_ACCESS_KEY_INVALID("DATAX_R_ODPS_012", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_012"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_012")), //ODPS-0410042:Invalid signature value - User signature dose not match ODPS_ACCESS_DENY("DATAX_R_ODPS_013", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_013"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_013")), //ODPS-0420095: Access Denied - Authorization Failed [4002], You doesn't exist in project SPLIT_MODE_ERROR("DATAX_R_ODPS_014", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_014"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_014")), ACCOUNT_TYPE_ERROR("DATAX_R_ODPS_015", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_015"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_015")), VIRTUAL_VIEW_NOT_SUPPORT("DATAX_R_ODPS_016", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_016"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_016")), PARTITION_ERROR("DATAX_R_ODPS_017", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_017"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_017")), PARTITION_NOT_EXISTS_ERROR("DATAX_R_ODPS_018", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_018"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_018")), RUN_SQL_FAILED("DATAX_R_ODPS_019", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_019"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_019")), RUN_SQL_ODPS_EXCEPTION("DATAX_R_ODPS_020", MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("description.DATAX_R_ODPS_020"),MessageSource.loadResourceBundle(OdpsReaderErrorCode.class).message("solution.DATAX_R_ODPS_020")), ; private final String code; private final String description; private final String solution; private OdpsReaderErrorCode(String code, String description,String solution) { this.code = code; this.description = description; this.solution = solution; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } public String getSolution() { return solution; } @Override public String toString() { return String.format("Code:%s:%s, Solution:[%s]. ", this.code,this.description,this.solution); } } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/ReaderProxy.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.reader.odpsreader.util.OdpsUtil; import com.alibaba.fastjson2.JSON; import com.aliyun.odps.Column; import com.aliyun.odps.OdpsType; import com.aliyun.odps.data.*; import com.aliyun.odps.data.Record; import com.aliyun.odps.tunnel.TableTunnel; import com.aliyun.odps.type.ArrayTypeInfo; import com.aliyun.odps.type.MapTypeInfo; import com.aliyun.odps.type.TypeInfo; import org.apache.commons.codec.binary.Base64; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.*; public class ReaderProxy { private static final Logger LOG = LoggerFactory .getLogger(ReaderProxy.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(ReaderProxy.class); private static boolean IS_DEBUG = LOG.isDebugEnabled(); private RecordSender recordSender; private TableTunnel.DownloadSession downloadSession; private Map columnTypeMap; private List parsedColumns; private String partition; private boolean isPartitionTable; private long start; private long count; private boolean isCompress; private static final String NULL_INDICATOR = null; // TODO 没有支持用户可配置 // TODO 没有timezone private SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); // 读取 jvm 默认时区 private Calendar calendarForDate = null; private boolean useDateWithCalendar = true; private Calendar initCalendar(Configuration config) { // 理论上不会有其他选择,有配置化可以随时应急 String calendarType = config.getString("calendarType", "iso8601"); Boolean lenient = config.getBool("calendarLenient", true); // 默认jvm时区 TimeZone timeZone = TimeZone.getDefault(); String timeZoneStr = config.getString("calendarTimeZone"); if (StringUtils.isNotBlank(timeZoneStr)) { // 如果用户明确指定使用用户指定的 timeZone = TimeZone.getTimeZone(timeZoneStr); } Calendar calendarForDate = new Calendar.Builder().setCalendarType(calendarType).setLenient(lenient) .setTimeZone(timeZone).build(); return calendarForDate; } public ReaderProxy(RecordSender recordSender, TableTunnel.DownloadSession downloadSession, Map columnTypeMap, List parsedColumns, String partition, boolean isPartitionTable, long start, long count, boolean isCompress, Configuration taskConfig) { this.recordSender = recordSender; this.downloadSession = downloadSession; this.columnTypeMap = columnTypeMap; this.parsedColumns = parsedColumns; this.partition = partition; this.isPartitionTable = isPartitionTable; this.start = start; this.count = count; this.isCompress = isCompress; this.calendarForDate = this.initCalendar(taskConfig); this.useDateWithCalendar = taskConfig.getBool("useDateWithCalendar", true); } // warn: odps 分区列和正常列不能重名, 所有列都不不区分大小写 public void doRead() { try { LOG.info("start={}, count={}",start, count); List userConfigNormalColumns = OdpsUtil.getNormalColumns(this.parsedColumns, this.columnTypeMap); RecordReader recordReader = null; // fix #ODPS-52184/10332469, updateColumnsSize表示如果用户指定的读取源表列数100列以内的话,则进行列裁剪优化; int updateColumnsSize = 100; if(userConfigNormalColumns.size() <= updateColumnsSize){ recordReader = OdpsUtil.getRecordReader(downloadSession, start, count, isCompress, userConfigNormalColumns); } else { recordReader = OdpsUtil.getRecordReader(downloadSession, start, count, isCompress); } Record odpsRecord; Map partitionMap = this .parseCurrentPartitionValue(); int retryTimes = 1; while (true) { try { odpsRecord = recordReader.read(); } catch(Exception e) { //odps read 异常后重试10次 LOG.warn("warn : odps read exception: {}", e.getMessage()); if(retryTimes < 10) { try { Thread.sleep(2000); } catch (InterruptedException ignored) { } recordReader = downloadSession.openRecordReader(start, count, isCompress); LOG.warn(MESSAGE_SOURCE.message("readerproxy.1", retryTimes)); retryTimes++; continue; } else { throw DataXException.asDataXException(OdpsReaderErrorCode.ODPS_READ_EXCEPTION, e); } } //记录已经读取的点 start++; count--; if (odpsRecord != null) { com.alibaba.datax.common.element.Record dataXRecord = recordSender .createRecord(); // warn: for PARTITION||NORMAL columnTypeMap's key // sets(columnName) is big than parsedColumns's left // sets(columnName), always contain for (InternalColumnInfo pair : this.parsedColumns) { String columnName = pair.getColumnName(); switch (pair.getColumnType()) { case PARTITION: String partitionColumnValue = this .getPartitionColumnValue(partitionMap, columnName); this.odpsColumnToDataXField(odpsRecord, dataXRecord, this.columnTypeMap.get(columnName), partitionColumnValue, true); break; case NORMAL: this.odpsColumnToDataXField(odpsRecord, dataXRecord, this.columnTypeMap.get(columnName), columnName, false); break; case CONSTANT: dataXRecord.addColumn(new StringColumn(columnName)); break; default: break; } } recordSender.sendToWriter(dataXRecord); } else { break; } } //fixed, 避免recordReader.close失败,跟鸣天确认过,可以不用关闭RecordReader try { recordReader.close(); } catch (Exception e) { LOG.warn("recordReader close exception", e); } } catch (DataXException e) { throw e; } catch (Exception e) { // warn: if dirty throw DataXException.asDataXException( OdpsReaderErrorCode.READ_DATA_FAIL, e); } } private Map parseCurrentPartitionValue() { Map partitionMap = new HashMap(); if (this.isPartitionTable) { String[] splitedPartition = this.partition.split(","); for (String eachPartition : splitedPartition) { String[] partitionDetail = eachPartition.split("="); // warn: check partition like partition=1 if (2 != partitionDetail.length) { throw DataXException .asDataXException( OdpsReaderErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("readerproxy.2", eachPartition)); } // warn: translate to lower case, it's more comfortable to // compare whit user's input columns String partitionName = partitionDetail[0].toLowerCase(); String partitionValue = partitionDetail[1]; partitionMap.put(partitionName, partitionValue); } } if (IS_DEBUG) { LOG.debug(String.format("partition value details: %s", com.alibaba.fastjson2.JSON.toJSONString(partitionMap))); } return partitionMap; } private String getPartitionColumnValue(Map partitionMap, String partitionColumnName) { // warn: to lower case partitionColumnName = partitionColumnName.toLowerCase(); // it's will never happen, but add this checking if (!partitionMap.containsKey(partitionColumnName)) { String errorMessage = MESSAGE_SOURCE.message("readerproxy.3", com.alibaba.fastjson2.JSON.toJSONString(partitionMap), partitionColumnName); throw DataXException.asDataXException( OdpsReaderErrorCode.READ_DATA_FAIL, errorMessage); } return partitionMap.get(partitionColumnName); } /** * TODO warn: odpsRecord 的 String 可能获取出来的是 binary * * warn: there is no dirty data in reader plugin, so do not handle dirty * data with TaskPluginCollector * * warn: odps only support BIGINT && String partition column actually * * @param odpsRecord * every line record of odps table * @param dataXRecord * every datax record, to be send to writer. method getXXX() case sensitive * @param typeInfo * odps column type * @param columnNameValue * for partition column it's column value, for normal column it's * column name * @param isPartitionColumn * true means partition column and false means normal column * */ private void odpsColumnToDataXField(Record odpsRecord, com.alibaba.datax.common.element.Record dataXRecord, TypeInfo typeInfo, String columnNameValue, boolean isPartitionColumn) { ArrayRecord record = (ArrayRecord) odpsRecord; OdpsType type = typeInfo.getOdpsType(); switch (type) { case BIGINT: { if (isPartitionColumn) { dataXRecord.addColumn(new LongColumn(columnNameValue)); } else { dataXRecord.addColumn(new LongColumn(record .getBigint(columnNameValue))); } break; } case BOOLEAN: { if (isPartitionColumn) { dataXRecord.addColumn(new BoolColumn(columnNameValue)); } else { dataXRecord.addColumn(new BoolColumn(record .getBoolean(columnNameValue))); } break; } case DATE: case DATETIME: { // odps分区列,目前支持TINYINT、SMALLINT、INT、BIGINT、VARCHAR和STRING类型 if (isPartitionColumn) { try { dataXRecord.addColumn(new DateColumn(ColumnCast .string2Date(new StringColumn(columnNameValue)))); } catch (ParseException e) { String errMessage = MESSAGE_SOURCE.message("readerproxy.4", this.partition, columnNameValue); LOG.error(errMessage); throw DataXException.asDataXException( OdpsReaderErrorCode.READ_DATA_FAIL, errMessage, e); } } else { if (com.aliyun.odps.OdpsType.DATETIME == type) { dataXRecord.addColumn(new DateColumn(record .getDatetime(columnNameValue))); } else { if (this.useDateWithCalendar) { dataXRecord.addColumn(new DateColumn(record. getDate(columnNameValue, this.calendarForDate))); } else { dataXRecord.addColumn(new DateColumn(record .getDate(columnNameValue))); } } } break; } case DOUBLE: { if (isPartitionColumn) { dataXRecord.addColumn(new DoubleColumn(columnNameValue)); } else { dataXRecord.addColumn(new DoubleColumn(record .getDouble(columnNameValue))); } break; } case DECIMAL: { if(isPartitionColumn) { dataXRecord.addColumn(new DoubleColumn(columnNameValue)); } else { dataXRecord.addColumn(new DoubleColumn(record.getDecimal(columnNameValue))); } break; } case STRING: { if (isPartitionColumn) { dataXRecord.addColumn(new StringColumn(columnNameValue)); } else { dataXRecord.addColumn(new StringColumn(record .getString(columnNameValue))); } break; } case TINYINT: if (isPartitionColumn) { dataXRecord.addColumn(new LongColumn(columnNameValue)); } else { Byte value = record.getTinyint(columnNameValue); Integer intValue = value != null ? value.intValue() : null; dataXRecord.addColumn(new LongColumn(intValue)); } break; case SMALLINT: { if (isPartitionColumn) { dataXRecord.addColumn(new LongColumn(columnNameValue)); } else { Short value = record.getSmallint(columnNameValue); Long valueInLong = null; if (null != value) { valueInLong = value.longValue(); } dataXRecord.addColumn(new LongColumn(valueInLong)); } break; } case INT: { if (isPartitionColumn) { dataXRecord.addColumn(new LongColumn(columnNameValue)); } else { dataXRecord.addColumn(new LongColumn(record .getInt(columnNameValue))); } break; } case FLOAT: { if (isPartitionColumn) { dataXRecord.addColumn(new DoubleColumn(columnNameValue)); } else { dataXRecord.addColumn(new DoubleColumn(record .getFloat(columnNameValue))); } break; } case VARCHAR: { if (isPartitionColumn) { dataXRecord.addColumn(new StringColumn(columnNameValue)); } else { Varchar value = record.getVarchar(columnNameValue); String columnValue = value != null ? value.getValue() : null; dataXRecord.addColumn(new StringColumn(columnValue)); } break; } case TIMESTAMP: { if (isPartitionColumn) { try { dataXRecord.addColumn(new DateColumn(ColumnCast .string2Date(new StringColumn(columnNameValue)))); } catch (ParseException e) { String errMessage = MESSAGE_SOURCE.message("readerproxy.4", this.partition, columnNameValue); LOG.error(errMessage); throw DataXException.asDataXException( OdpsReaderErrorCode.READ_DATA_FAIL, errMessage, e); } } else { dataXRecord.addColumn(new DateColumn(record .getTimestamp(columnNameValue))); } break; } case BINARY: { if (isPartitionColumn) { dataXRecord.addColumn(new BytesColumn(columnNameValue.getBytes())); } else { // dataXRecord.addColumn(new BytesColumn(record // .getBinary(columnNameValue).data())); Binary binaryData = record.getBinary(columnNameValue); if (null == binaryData) { dataXRecord.addColumn(new BytesColumn(null)); } else { dataXRecord.addColumn(new BytesColumn(binaryData.data())); } } break; } case ARRAY: { if (isPartitionColumn) { dataXRecord.addColumn(new StringColumn(columnNameValue)); } else { List arrayValue = record.getArray(columnNameValue); if (arrayValue == null) { dataXRecord.addColumn(new StringColumn(null)); } else { dataXRecord.addColumn(new StringColumn(JSON.toJSONString(transOdpsArrayToJavaList(arrayValue, (ArrayTypeInfo)typeInfo)))); } } break; } case MAP: { if (isPartitionColumn) { dataXRecord.addColumn(new StringColumn(columnNameValue)); } else { Map mapValue = record.getMap(columnNameValue); if (mapValue == null) { dataXRecord.addColumn(new StringColumn(null)); } else { dataXRecord.addColumn(new StringColumn(JSON.toJSONString(transOdpsMapToJavaMap(mapValue, (MapTypeInfo)typeInfo)))); } } break; } case STRUCT: { if (isPartitionColumn) { dataXRecord.addColumn(new StringColumn(columnNameValue)); } else { Struct structValue = record.getStruct(columnNameValue); if (structValue == null) { dataXRecord.addColumn(new StringColumn(null)); } else { dataXRecord.addColumn(new StringColumn(JSON.toJSONString(transOdpsStructToJavaMap(structValue)))); } } break; } default: throw DataXException.asDataXException( OdpsReaderErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("readerproxy.5", type)); } } private List transOdpsArrayToJavaList(List odpsArray, ArrayTypeInfo typeInfo) { TypeInfo eleType = typeInfo.getElementTypeInfo(); List result = new ArrayList(); switch (eleType.getOdpsType()) { // warn:array [1.2, 3.4] 被转为了:"["1.2", "3.4"]", 本来应该被转换成 "[1.2, 3.4]" // 注意回归Case覆盖 case BIGINT: case DOUBLE: case INT: case FLOAT: case DECIMAL: case TINYINT: case SMALLINT: for (Object item : odpsArray) { Object object = item; result.add(object == null ? NULL_INDICATOR : object); } return result; case BOOLEAN: // 未调整array 问题 case STRING: case VARCHAR: case CHAR: case TIMESTAMP: case DATE: for (Object item : odpsArray) { Object object = item; result.add(object == null ? NULL_INDICATOR : object.toString()); } return result; /** * 日期类型 */ case DATETIME: for (Object item : odpsArray) { Date dateVal = (Date) item; result.add(dateVal == null ? NULL_INDICATOR : dateFormat.format(dateVal)); } return result; /** * 字节数组 */ case BINARY: for (Object item : odpsArray) { Binary binaryVal = (Binary) item; result.add(binaryVal == null ? NULL_INDICATOR : Base64.encodeBase64(binaryVal.data())); } return result; /** * 日期间隔 */ case INTERVAL_DAY_TIME: for (Object item : odpsArray) { IntervalDayTime dayTimeVal = (IntervalDayTime) item; result.add(dayTimeVal == null ? NULL_INDICATOR : transIntervalDayTimeToJavaMap(dayTimeVal)); } return result; /** * 年份间隔 */ case INTERVAL_YEAR_MONTH: for (Object item : odpsArray) { IntervalYearMonth yearMonthVal = (IntervalYearMonth) item; result.add(yearMonthVal == null ? NULL_INDICATOR : transIntervalYearMonthToJavaMap(yearMonthVal)); } return result; /** * 结构体 */ case STRUCT: for (Object item : odpsArray) { Struct structVal = (Struct) item; result.add(structVal == null ? NULL_INDICATOR : transOdpsStructToJavaMap(structVal)); } return result; /** * MAP类型 */ case MAP: for (Object item : odpsArray) { Map mapVal = (Map) item; result.add(mapVal == null ? NULL_INDICATOR : transOdpsMapToJavaMap(mapVal, (MapTypeInfo) eleType)); } return result; /** * ARRAY类型 */ case ARRAY: for (Object item : odpsArray) { List arrayVal = (List) item; result.add(arrayVal == null ? NULL_INDICATOR : transOdpsArrayToJavaList(arrayVal, (ArrayTypeInfo) eleType)); } return result; default: throw new IllegalArgumentException("decode record failed. column type: " + eleType.getTypeName()); } } private Map transOdpsMapToJavaMap(Map odpsMap, MapTypeInfo typeInfo) { TypeInfo keyType = typeInfo.getKeyTypeInfo(); TypeInfo valueType = typeInfo.getValueTypeInfo(); Map result = new HashMap(); Set entrySet = null; switch (valueType.getOdpsType()) { case BIGINT: case DOUBLE: case BOOLEAN: case STRING: case DECIMAL: case TINYINT: case SMALLINT: case INT: case FLOAT: case CHAR: case VARCHAR: case DATE: case TIMESTAMP: switch (keyType.getOdpsType()) { case DATETIME: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Object value = item.getValue(); result.put(dateFormat.format((Date)item.getKey()), value == null ? NULL_INDICATOR : value.toString()); } return result; case BINARY: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Object value = item.getValue(); result.put(Base64.encodeBase64(((Binary)item.getKey()).data()), value == null ? NULL_INDICATOR : value.toString()); } return result; default: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Object value = item.getValue(); result.put(item.getKey(), value == null ? NULL_INDICATOR : value.toString()); } return result; } /** * 日期类型 */ case DATETIME: switch (keyType.getOdpsType()) { case DATETIME: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Date dateVal = (Date) item.getValue(); result.put(dateFormat.format((Date)item.getKey()), dateVal == null ? NULL_INDICATOR : dateFormat.format(dateVal)); } return result; case BINARY: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Date dateVal = (Date) item.getValue(); result.put(Base64.encodeBase64(((Binary)item.getKey()).data()), dateVal == null ? NULL_INDICATOR : dateFormat.format(dateVal)); } return result; default: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Date dateVal = (Date) item.getValue(); result.put(item.getKey(), dateVal == null ? NULL_INDICATOR : dateFormat.format(dateVal)); } return result; } /** * 字节数组 */ case BINARY: switch (keyType.getOdpsType()) { case DATETIME: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Binary binaryVal = (Binary) item.getValue(); result.put(dateFormat.format((Date)item.getKey()), binaryVal == null ? NULL_INDICATOR : Base64.encodeBase64(binaryVal.data())); } return result; case BINARY: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Binary binaryVal = (Binary) item.getValue(); result.put(Base64.encodeBase64(((Binary)item.getKey()).data()), binaryVal == null ? NULL_INDICATOR : Base64.encodeBase64(binaryVal.data())); } return result; default: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Binary binaryVal = (Binary) item.getValue(); result.put(item.getKey(), binaryVal == null ? NULL_INDICATOR : Base64.encodeBase64(binaryVal.data())); } return result; } /** * 日期间隔 */ case INTERVAL_DAY_TIME: switch (keyType.getOdpsType()) { case DATETIME: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { IntervalDayTime dayTimeVal = (IntervalDayTime) item.getValue(); result.put(dateFormat.format((Date)item.getKey()), dayTimeVal == null ? NULL_INDICATOR : transIntervalDayTimeToJavaMap(dayTimeVal)); } return result; case BINARY: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { IntervalDayTime dayTimeVal = (IntervalDayTime) item.getValue(); result.put(Base64.encodeBase64(((Binary)item.getKey()).data()), dayTimeVal == null ? NULL_INDICATOR : transIntervalDayTimeToJavaMap(dayTimeVal)); } return result; default: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { IntervalDayTime dayTimeVal = (IntervalDayTime) item.getValue(); result.put(item.getKey(), dayTimeVal == null ? NULL_INDICATOR : transIntervalDayTimeToJavaMap(dayTimeVal)); } return result; } /** * 年份间隔 */ case INTERVAL_YEAR_MONTH: switch (keyType.getOdpsType()) { case DATETIME: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { IntervalYearMonth yearMonthVal = (IntervalYearMonth) item.getValue(); result.put(dateFormat.format((Date)item.getKey()), yearMonthVal == null ? NULL_INDICATOR : transIntervalYearMonthToJavaMap(yearMonthVal)); } return result; case BINARY: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { IntervalYearMonth yearMonthVal = (IntervalYearMonth) item.getValue(); result.put(Base64.encodeBase64(((Binary)item.getKey()).data()), yearMonthVal == null ? NULL_INDICATOR : transIntervalYearMonthToJavaMap(yearMonthVal)); } return result; default: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { IntervalYearMonth yearMonthVal = (IntervalYearMonth) item.getValue(); result.put(item.getKey(), yearMonthVal == null ? NULL_INDICATOR : transIntervalYearMonthToJavaMap(yearMonthVal)); } return result; } /** * 结构体 */ case STRUCT: switch (keyType.getOdpsType()) { case DATETIME: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Struct structVal = (Struct) item.getValue(); result.put(dateFormat.format((Date)item.getKey()), structVal == null ? NULL_INDICATOR : transOdpsStructToJavaMap(structVal)); } return result; case BINARY: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Struct structVal = (Struct) item.getValue(); result.put(Base64.encodeBase64(((Binary)item.getKey()).data()), structVal == null ? NULL_INDICATOR : transOdpsStructToJavaMap(structVal)); } return result; default: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Struct structVal = (Struct) item.getValue(); result.put(item.getKey(), structVal == null ? NULL_INDICATOR : transOdpsStructToJavaMap(structVal)); } return result; } /** * MAP类型 */ case MAP: switch (keyType.getOdpsType()) { case DATETIME: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Map mapVal = (Map) item.getValue(); result.put(dateFormat.format((Date)item.getKey()),mapVal == null ? NULL_INDICATOR : transOdpsMapToJavaMap(mapVal, (MapTypeInfo) valueType)); } return result; case BINARY: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Map mapVal = (Map) item.getValue(); result.put(Base64.encodeBase64(((Binary)item.getKey()).data()), mapVal == null ? NULL_INDICATOR : transOdpsMapToJavaMap(mapVal, (MapTypeInfo) valueType)); } return result; default: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { Map mapVal = (Map) item.getValue(); result.put(item.getKey(), mapVal == null ? NULL_INDICATOR : transOdpsMapToJavaMap(mapVal, (MapTypeInfo) valueType)); } return result; } /** * ARRAY类型 */ case ARRAY: switch (keyType.getOdpsType()) { case DATETIME: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { List arrayVal = (List) item.getValue(); result.put(dateFormat.format((Date)item.getKey()),arrayVal == null ? NULL_INDICATOR : transOdpsArrayToJavaList(arrayVal, (ArrayTypeInfo) valueType)); } return result; case BINARY: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { List arrayVal = (List) item.getValue(); result.put(Base64.encodeBase64(((Binary)item.getKey()).data()), arrayVal == null ? NULL_INDICATOR : transOdpsArrayToJavaList(arrayVal, (ArrayTypeInfo) valueType)); } return result; default: entrySet = odpsMap.entrySet(); for (Map.Entry item : entrySet) { List arrayVal = (List) item.getValue(); result.put(item.getKey(), arrayVal == null ? NULL_INDICATOR : transOdpsArrayToJavaList(arrayVal, (ArrayTypeInfo) valueType)); } return result; } default: throw new IllegalArgumentException("decode record failed. column type: " + valueType.getTypeName()); } } private Map transIntervalDayTimeToJavaMap(IntervalDayTime dayTime) { Map result = new HashMap(); result.put("totalSeconds", dayTime.getTotalSeconds()); result.put("nanos", (long)dayTime.getNanos()); return result; } private Map transOdpsStructToJavaMap(Struct odpsStruct) { Map result = new HashMap(); for (int i = 0; i < odpsStruct.getFieldCount(); i++) { String fieldName = odpsStruct.getFieldName(i); Object fieldValue = odpsStruct.getFieldValue(i); TypeInfo fieldType = odpsStruct.getFieldTypeInfo(i); switch (fieldType.getOdpsType()) { case BIGINT: case DOUBLE: case BOOLEAN: case STRING: case DECIMAL: case TINYINT: case SMALLINT: case INT: case FLOAT: case VARCHAR: case CHAR: case TIMESTAMP: case DATE: result.put(fieldName, fieldValue == null ? NULL_INDICATOR : fieldValue.toString()); break; /** * 日期类型 */ case DATETIME: Date dateVal = (Date) fieldValue; result.put(fieldName, dateVal == null ? NULL_INDICATOR : dateFormat.format(dateVal)); break; /** * 字节数组 */ case BINARY: Binary binaryVal = (Binary) fieldValue; result.put(fieldName, binaryVal == null ? NULL_INDICATOR : Base64.encodeBase64(binaryVal.data())); break; /** * 日期间隔 */ case INTERVAL_DAY_TIME: IntervalDayTime dayTimeVal = (IntervalDayTime) fieldValue; result.put(fieldName, dayTimeVal == null ? NULL_INDICATOR : transIntervalDayTimeToJavaMap(dayTimeVal)); break; /** * 年份间隔 */ case INTERVAL_YEAR_MONTH: IntervalYearMonth yearMonthVal = (IntervalYearMonth) fieldValue; result.put(fieldName, yearMonthVal == null ? NULL_INDICATOR : transIntervalYearMonthToJavaMap(yearMonthVal)); break; /** * 结构体 */ case STRUCT: Struct structVal = (Struct) fieldValue; result.put(fieldName, structVal == null ? NULL_INDICATOR : transOdpsStructToJavaMap(structVal)); break; /** * MAP类型 */ case MAP: Map mapVal = (Map) fieldValue; result.put(fieldName, mapVal == null ? NULL_INDICATOR : transOdpsMapToJavaMap(mapVal, (MapTypeInfo) fieldType)); break; /** * ARRAY类型 */ case ARRAY: List arrayVal = (List) fieldValue; result.put(fieldName, arrayVal == null ? NULL_INDICATOR : transOdpsArrayToJavaList(arrayVal, (ArrayTypeInfo) fieldType)); break; default: throw new IllegalArgumentException("decode record failed. column type: " + fieldType.getTypeName()); } } return result; } private Map transIntervalYearMonthToJavaMap(IntervalYearMonth yearMonth) { Map result = new HashMap(); result.put("years", yearMonth.getYears()); result.put("months", yearMonth.getMonths()); return result; } } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/LocalStrings.properties ================================================ descipher.1=\u957F\u5EA6\u4E0D\u662F\u5076\u6570 idandkeyutil.1=\u4ECE\u73AF\u5883\u53D8\u91CF\u4E2D\u83B7\u53D6accessId/accessKey \u5931\u8D25, accessId=[{0}] idandkeyutil.2=\u65E0\u6CD5\u83B7\u53D6\u5230accessId/accessKey. \u5B83\u4EEC\u65E2\u4E0D\u5B58\u5728\u4E8E\u60A8\u7684\u914D\u7F6E\u4E2D\uFF0C\u4E5F\u4E0D\u5B58\u5728\u4E8E\u73AF\u5883\u53D8\u91CF\u4E2D. odpssplitutil.1=\u60A8\u6240\u914D\u7F6E\u7684\u5206\u533A\u4E0D\u80FD\u4E3A\u7A7A\u767D. odpssplitutil.2=\u5207\u5206\u7684 recordCount \u4E0D\u80FD\u4E3A\u8D1F\u6570.recordCount={0} odpssplitutil.3=\u5207\u5206\u7684 adviceNum \u4E0D\u80FD\u4E3A\u8D1F\u6570.adviceNum={0} odpssplitutil.4=\u6CE8\u610F: \u7531\u4E8E\u60A8\u914D\u7F6E\u4E86successOnNoPartition\u503C\u4E3Atrue (\u5373\u5F53\u5206\u533A\u503C\u4E0D\u5B58\u5728\u65F6, \u540C\u6B65\u4EFB\u52A1\u4E0D\u62A5\u9519), \u60A8\u8BBE\u7F6E\u7684\u5206\u533A\u65E0\u6CD5\u5339\u914D\u5230ODPS\u8868\u4E2D\u5BF9\u5E94\u7684\u5206\u533A, \u540C\u6B65\u4EFB\u52A1\u7EE7\u7EED... odpsutil.1=datax\u83B7\u53D6\u4E0D\u5230\u6E90\u8868\u7684\u5217\u4FE1\u606F\uFF0C \u7531\u4E8E\u60A8\u672A\u914D\u7F6E\u8BFB\u53D6\u6E90\u5934\u8868\u7684\u5217\u4FE1\u606F. datax\u65E0\u6CD5\u77E5\u9053\u8BE5\u62BD\u53D6\u8868\u7684\u54EA\u4E9B\u5B57\u6BB5\u7684\u6570\u636E\uFF0C \u6B63\u786E\u7684\u914D\u7F6E\u65B9\u5F0F\u662F\u7ED9 column \u914D\u7F6E\u4E0A\u60A8\u9700\u8981\u8BFB\u53D6\u7684\u5217\u540D\u79F0,\u7528\u82F1\u6587\u9017\u53F7\u5206\u9694. odpsutil.2=\u60A8\u6240\u914D\u7F6E\u7684maxRetryTime \u503C\u9519\u8BEF. \u8BE5\u503C\u4E0D\u80FD\u5C0F\u4E8E1, \u4E14\u4E0D\u80FD\u5927\u4E8E {0}. \u63A8\u8350\u7684\u914D\u7F6E\u65B9\u5F0F\u662F\u7ED9maxRetryTime \u914D\u7F6E1-11\u4E4B\u95F4\u7684\u67D0\u4E2A\u503C. \u8BF7\u60A8\u68C0\u67E5\u914D\u7F6E\u5E76\u505A\u51FA\u76F8\u5E94\u4FEE\u6539. odpsutil.3=\u4E0D\u652F\u6301\u7684\u8D26\u53F7\u7C7B\u578B:[{0}]. \u8D26\u53F7\u7C7B\u578B\u76EE\u524D\u4EC5\u652F\u6301aliyun, taobao. odpsutil.4=\u60A8\u6240\u914D\u7F6E\u7684\u5206\u533A\u4E0D\u80FD\u4E3A\u7A7A\u767D. odpsutil.5=\u6E90\u5934\u8868\u7684\u5217\u914D\u7F6E\u9519\u8BEF. \u60A8\u6240\u914D\u7F6E\u7684\u5217 [{0}] \u4E0D\u5B58\u5728. odpsutil.6=open RecordReader\u5931\u8D25. \u8BF7\u8054\u7CFB ODPS \u7BA1\u7406\u5458\u5904\u7406. odpsutil.7=\u52A0\u8F7D ODPS \u6E90\u5934\u8868:{0} \u5931\u8D25. \u8BF7\u68C0\u67E5\u60A8\u914D\u7F6E\u7684 ODPS \u6E90\u5934\u8868\u7684 [project] \u662F\u5426\u6B63\u786E. odpsutil.8=\u52A0\u8F7D ODPS \u6E90\u5934\u8868:{0} \u5931\u8D25. \u8BF7\u68C0\u67E5\u60A8\u914D\u7F6E\u7684 ODPS \u6E90\u5934\u8868\u7684 [table] \u662F\u5426\u6B63\u786E. odpsutil.9=\u52A0\u8F7D ODPS \u6E90\u5934\u8868:{0} \u5931\u8D25. \u8BF7\u68C0\u67E5\u60A8\u914D\u7F6E\u7684 ODPS \u6E90\u5934\u8868\u7684 [accessId] [accessKey]\u662F\u5426\u6B63\u786E. odpsutil.10=\u52A0\u8F7D ODPS \u6E90\u5934\u8868:{0} \u5931\u8D25. \u8BF7\u68C0\u67E5\u60A8\u914D\u7F6E\u7684 ODPS \u6E90\u5934\u8868\u7684 [accessKey] \u662F\u5426\u6B63\u786E. odpsutil.11=\u52A0\u8F7D ODPS \u6E90\u5934\u8868:{0} \u5931\u8D25. \u8BF7\u68C0\u67E5\u60A8\u914D\u7F6E\u7684 ODPS \u6E90\u5934\u8868\u7684 [accessId] [accessKey] [project]\u662F\u5426\u5339\u914D. odpsutil.12=\u52A0\u8F7D ODPS \u6E90\u5934\u8868:{0} \u5931\u8D25. \u8BF7\u68C0\u67E5\u60A8\u914D\u7F6E\u7684 ODPS \u6E90\u5934\u8868\u7684 project,table,accessId,accessKey,odpsServer\u7B49\u503C. odpsutil.13=\u6267\u884C ODPS SQL\u5931\u8D25, \u8FD4\u56DE\u503C\u4E3A:{0}. \u8BF7\u4ED4\u7EC6\u68C0\u67E5ODPS SQL\u662F\u5426\u6B63\u786E, \u5982\u679C\u68C0\u67E5\u65E0\u8BEF, \u8BF7\u8054\u7CFB ODPS \u503C\u73ED\u540C\u5B66\u5904\u7406. SQL \u5185\u5BB9\u4E3A:[\n{1}\n]. odpsutil.14=\u6267\u884C ODPS SQL \u65F6\u629B\u51FA\u5F02\u5E38, \u8BF7\u4ED4\u7EC6\u68C0\u67E5ODPS SQL\u662F\u5426\u6B63\u786E, \u5982\u679C\u68C0\u67E5\u65E0\u8BEF, \u8BF7\u8054\u7CFB ODPS \u503C\u73ED\u540C\u5B66\u5904\u7406. SQL \u5185\u5BB9\u4E3A:[\n{0}\n]. ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/OdpsExceptionMsg.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader.util; /** * Created by hongjiao.hj on 2015/6/9. */ public class OdpsExceptionMsg { public static final String ODPS_PROJECT_NOT_FOUNT = "ODPS-0420111: Project not found"; public static final String ODPS_TABLE_NOT_FOUNT = "ODPS-0130131:Table not found"; public static final String ODPS_ACCESS_KEY_ID_NOT_FOUND = "ODPS-0410051:Invalid credentials - accessKeyId not found"; public static final String ODPS_ACCESS_KEY_INVALID = "ODPS-0410042:Invalid signature value - User signature dose not match"; public static final String ODPS_ACCESS_DENY = "ODPS-0420095: Access Denied - Authorization Failed [4002], You doesn't exist in project"; } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/OdpsSplitUtil.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.common.util.RangeSplitUtil; import com.alibaba.datax.plugin.reader.odpsreader.Constant; import com.alibaba.datax.plugin.reader.odpsreader.Key; import com.alibaba.datax.plugin.reader.odpsreader.OdpsReaderErrorCode; import com.aliyun.odps.Odps; import com.aliyun.odps.tunnel.TableTunnel.DownloadSession; import org.apache.commons.lang3.tuple.ImmutablePair; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public final class OdpsSplitUtil { private static final Logger LOG = LoggerFactory.getLogger(OdpsSplitUtil.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OdpsSplitUtil.class); public static List doSplit(Configuration originalConfig, Odps odps, int adviceNum) { boolean isPartitionedTable = originalConfig.getBool(Constant.IS_PARTITIONED_TABLE); if (isPartitionedTable) { // 分区表 return splitPartitionedTable(odps, originalConfig, adviceNum); } else { // 非分区表 return splitForNonPartitionedTable(odps, adviceNum, originalConfig); } } private static List splitPartitionedTable(Odps odps, Configuration originalConfig, int adviceNum) { List splittedConfigs = new ArrayList(); List partitions = originalConfig.getList(Key.PARTITION, String.class); if ((null == partitions || partitions.isEmpty()) && originalConfig.getBool(Key.SUCCESS_ON_NO_PATITION, false)) { Configuration tempConfig = originalConfig.clone(); tempConfig.set(Key.PARTITION, null); splittedConfigs.add(tempConfig); LOG.warn(MESSAGE_SOURCE.message("odpssplitutil.4")); return splittedConfigs; } if (null == partitions || partitions.isEmpty()) { throw DataXException.asDataXException(OdpsReaderErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpssplitutil.1")); } //splitMode 默认为 record String splitMode = originalConfig.getString(Key.SPLIT_MODE); Configuration tempConfig = null; if (partitions.size() > adviceNum || Constant.PARTITION_SPLIT_MODE.equals(splitMode)) { // 此时不管 splitMode 是什么,都不需要再进行切分了 // 注意:此处没有把 sessionId 设置到 config 中去,所以后续在 task 中获取 sessionId 时,需要针对这种情况重新创建 sessionId for (String onePartition : partitions) { tempConfig = originalConfig.clone(); tempConfig.set(Key.PARTITION, onePartition); splittedConfigs.add(tempConfig); } return splittedConfigs; } else { // 还需要计算对每个分区,切分份数等信息 int eachPartitionShouldSplittedNumber = calculateEachPartitionShouldSplittedNumber( adviceNum, partitions.size()); for (String onePartition : partitions) { List configs = splitOnePartition(odps, onePartition, eachPartitionShouldSplittedNumber, originalConfig); splittedConfigs.addAll(configs); } return splittedConfigs; } } private static int calculateEachPartitionShouldSplittedNumber( int adviceNumber, int partitionNumber) { double tempNum = 1.0 * adviceNumber / partitionNumber; return (int) Math.ceil(tempNum); } private static List splitForNonPartitionedTable(Odps odps, int adviceNum, Configuration sliceConfig) { List params = new ArrayList(); String tunnelServer = sliceConfig.getString(Key.TUNNEL_SERVER); String tableName = sliceConfig.getString(Key.TABLE); String projectName = sliceConfig.getString(Key.PROJECT); DownloadSession session = OdpsUtil.createMasterSessionForNonPartitionedTable(odps, tunnelServer, projectName, tableName); String id = session.getId(); long count = session.getRecordCount(); List> splitResult = splitRecordCount(count, adviceNum); for (Pair pair : splitResult) { Configuration iParam = sliceConfig.clone(); iParam.set(Constant.SESSION_ID, id); iParam.set(Constant.START_INDEX, pair.getLeft().longValue()); iParam.set(Constant.STEP_COUNT, pair.getRight().longValue()); params.add(iParam); } return params; } private static List splitOnePartition(Odps odps, String onePartition, int adviceNum, Configuration sliceConfig) { List params = new ArrayList(); String tunnelServer = sliceConfig.getString(Key.TUNNEL_SERVER); String tableName = sliceConfig.getString(Key.TABLE); String projectName = sliceConfig.getString(Key.PROJECT); DownloadSession session = OdpsUtil.createMasterSessionForPartitionedTable(odps, tunnelServer, projectName, tableName, onePartition); String id = session.getId(); long count = session.getRecordCount(); List> splitResult = splitRecordCount(count, adviceNum); for (Pair pair : splitResult) { Configuration iParam = sliceConfig.clone(); iParam.set(Key.PARTITION, onePartition); iParam.set(Constant.SESSION_ID, id); iParam.set(Constant.START_INDEX, pair.getLeft().longValue()); iParam.set(Constant.STEP_COUNT, pair.getRight().longValue()); params.add(iParam); } return params; } /** * Pair left: startIndex, right: stepCount */ private static List> splitRecordCount(long recordCount, int adviceNum) { if(recordCount<0){ throw new IllegalArgumentException(MESSAGE_SOURCE.message("odpssplitutil.2", recordCount)); } if(adviceNum<1){ throw new IllegalArgumentException(MESSAGE_SOURCE.message("odpssplitutil.3", adviceNum)); } List> result = new ArrayList>(); // 为了适配 RangeSplitUtil 的处理逻辑,起始值从0开始计算 if (recordCount == 0) { result.add(ImmutablePair.of(0L, 0L)); return result; } long[] tempResult = RangeSplitUtil.doLongSplit(0L, recordCount - 1, adviceNum); tempResult[tempResult.length - 1]++; for (int i = 0; i < tempResult.length - 1; i++) { result.add(ImmutablePair.of(tempResult[i], (tempResult[i + 1] - tempResult[i]))); } return result; } } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/OdpsUtil.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.DataXCaseEnvUtil; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.reader.odpsreader.ColumnType; import com.alibaba.datax.plugin.reader.odpsreader.Constant; import com.alibaba.datax.plugin.reader.odpsreader.InternalColumnInfo; import com.alibaba.datax.plugin.reader.odpsreader.Key; import com.alibaba.datax.plugin.reader.odpsreader.OdpsReaderErrorCode; import com.aliyun.odps.*; import com.aliyun.odps.Column; import com.aliyun.odps.account.Account; import com.aliyun.odps.account.AliyunAccount; import com.aliyun.odps.account.StsAccount; import com.aliyun.odps.data.RecordReader; import com.aliyun.odps.task.SQLTask; import com.aliyun.odps.tunnel.TableTunnel; import com.aliyun.odps.type.TypeInfo; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.MutablePair; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; import java.util.concurrent.Callable; public final class OdpsUtil { private static final Logger LOG = LoggerFactory.getLogger(OdpsUtil.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OdpsUtil.class); public static int MAX_RETRY_TIME = 10; public static void checkNecessaryConfig(Configuration originalConfig) { originalConfig.getNecessaryValue(Key.ODPS_SERVER, OdpsReaderErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.PROJECT, OdpsReaderErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.TABLE, OdpsReaderErrorCode.REQUIRED_VALUE); if (null == originalConfig.getList(Key.COLUMN) || originalConfig.getList(Key.COLUMN, String.class).isEmpty()) { throw DataXException.asDataXException(OdpsReaderErrorCode.REQUIRED_VALUE, MESSAGE_SOURCE.message("odpsutil.1")); } } public static void dealMaxRetryTime(Configuration originalConfig) { int maxRetryTime = originalConfig.getInt(Key.MAX_RETRY_TIME, OdpsUtil.MAX_RETRY_TIME); if (maxRetryTime < 1 || maxRetryTime > OdpsUtil.MAX_RETRY_TIME) { throw DataXException.asDataXException(OdpsReaderErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpsutil.2", OdpsUtil.MAX_RETRY_TIME)); } MAX_RETRY_TIME = maxRetryTime; } public static Odps initOdps(Configuration originalConfig) { String odpsServer = originalConfig.getString(Key.ODPS_SERVER); String accessId = originalConfig.getString(Key.ACCESS_ID); String accessKey = originalConfig.getString(Key.ACCESS_KEY); String project = originalConfig.getString(Key.PROJECT); String securityToken = originalConfig.getString(Key.SECURITY_TOKEN); String packageAuthorizedProject = originalConfig.getString(Key.PACKAGE_AUTHORIZED_PROJECT); String defaultProject; if (StringUtils.isBlank(packageAuthorizedProject)) { defaultProject = project; } else { defaultProject = packageAuthorizedProject; } Account account = null; if (StringUtils.isNotBlank(securityToken)) { account = new StsAccount(accessId, accessKey, securityToken); } else { account = new AliyunAccount(accessId, accessKey); } Odps odps = new Odps(account); boolean isPreCheck = originalConfig.getBool("dryRun", false); if (isPreCheck) { odps.getRestClient().setConnectTimeout(3); odps.getRestClient().setReadTimeout(3); odps.getRestClient().setRetryTimes(2); } odps.setDefaultProject(defaultProject); odps.setEndpoint(odpsServer); odps.setUserAgent("DATAX"); return odps; } public static Table getTable(Odps odps, String projectName, String tableName) { final Table table = odps.tables().get(projectName, tableName); try { //通过这种方式检查表是否存在,失败重试。重试策略:每秒钟重试一次,最大重试3次 return RetryUtil.executeWithRetry(new Callable() { @Override public Table call() throws Exception { table.reload(); return table; } }, DataXCaseEnvUtil.getRetryTimes(3), DataXCaseEnvUtil.getRetryInterval(1000), DataXCaseEnvUtil.getRetryExponential(false)); } catch (Exception e) { throwDataXExceptionWhenReloadTable(e, tableName); } return table; } public static boolean isPartitionedTable(Table table) { return getPartitionDepth(table) > 0; } public static int getPartitionDepth(Table table) { TableSchema tableSchema = table.getSchema(); return tableSchema.getPartitionColumns().size(); } public static List getTableAllPartitions(Table table) { List tableAllPartitions = table.getPartitions(); List retPartitions = new ArrayList(); if (null != tableAllPartitions) { for (Partition partition : tableAllPartitions) { retPartitions.add(partition.getPartitionSpec().toString()); } } return retPartitions; } public static List getTableAllColumns(Table table) { TableSchema tableSchema = table.getSchema(); return tableSchema.getColumns(); } public static List getTableOriginalColumnNameList( List columns) { List tableOriginalColumnNameList = new ArrayList(); for (Column column : columns) { tableOriginalColumnNameList.add(column.getName()); } return tableOriginalColumnNameList; } public static String formatPartition(String partition) { if (StringUtils.isBlank(partition)) { throw DataXException.asDataXException(OdpsReaderErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpsutil.4")); } else { return partition.trim().replaceAll(" *= *", "=") .replaceAll(" */ *", ",").replaceAll(" *, *", ",") .replaceAll("'", ""); } } public static List formatPartitions(List partitions) { if (null == partitions || partitions.isEmpty()) { return Collections.emptyList(); } else { List formattedPartitions = new ArrayList(); for (String partition : partitions) { formattedPartitions.add(formatPartition(partition)); } return formattedPartitions; } } /** * 将用户配置的分区分类成两类: * (1) 包含 HINT 的区间过滤; * (2) 不包含 HINT 的普通模式 * @param userConfiguredPartitions * @return */ public static UserConfiguredPartitionClassification classifyUserConfiguredPartitions(List userConfiguredPartitions){ UserConfiguredPartitionClassification userConfiguredPartitionClassification = new UserConfiguredPartitionClassification(); List userConfiguredHintPartition = new ArrayList(); List userConfiguredNormalPartition = new ArrayList(); boolean isIncludeHintPartition = false; for (String userConfiguredPartition : userConfiguredPartitions){ if (StringUtils.isNotBlank(userConfiguredPartition)){ if (userConfiguredPartition.trim().toLowerCase().startsWith(Constant.PARTITION_FILTER_HINT)) { userConfiguredHintPartition.add(userConfiguredPartition.trim()); isIncludeHintPartition = true; }else { userConfiguredNormalPartition.add(userConfiguredPartition.trim()); } } } userConfiguredPartitionClassification.setIncludeHintPartition(isIncludeHintPartition); userConfiguredPartitionClassification.setUserConfiguredHintPartition(userConfiguredHintPartition); userConfiguredPartitionClassification.setUserConfiguredNormalPartition(userConfiguredNormalPartition); return userConfiguredPartitionClassification; } public static List parseColumns( List allNormalColumns, List allPartitionColumns, List userConfiguredColumns) { List parsededColumns = new ArrayList(); // warn: upper & lower case for (String column : userConfiguredColumns) { InternalColumnInfo pair = new InternalColumnInfo(); // if constant column if (OdpsUtil.checkIfConstantColumn(column)) { // remove first and last ' pair.setColumnName(column.substring(1, column.length() - 1)); pair.setColumnType(ColumnType.CONSTANT); parsededColumns.add(pair); continue; } // if normal column, warn: in o d p s normal columns can not // repeated in partitioning columns int index = OdpsUtil.indexOfIgnoreCase(allNormalColumns, column); if (0 <= index) { pair.setColumnName(allNormalColumns.get(index)); pair.setColumnType(ColumnType.NORMAL); parsededColumns.add(pair); continue; } // if partition column index = OdpsUtil.indexOfIgnoreCase(allPartitionColumns, column); if (0 <= index) { pair.setColumnName(allPartitionColumns.get(index)); pair.setColumnType(ColumnType.PARTITION); parsededColumns.add(pair); continue; } // not exist column throw DataXException.asDataXException( OdpsReaderErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpsutil.5", column)); } return parsededColumns; } private static int indexOfIgnoreCase(List columnCollection, String column) { int index = -1; for (int i = 0; i < columnCollection.size(); i++) { if (columnCollection.get(i).equalsIgnoreCase(column)) { index = i; break; } } return index; } public static boolean checkIfConstantColumn(String column) { if (column.length() >= 2 && column.startsWith(Constant.COLUMN_CONSTANT_FLAG) && column.endsWith(Constant.COLUMN_CONSTANT_FLAG)) { return true; } else { return false; } } public static TableTunnel.DownloadSession createMasterSessionForNonPartitionedTable(Odps odps, String tunnelServer, final String projectName, final String tableName) { final TableTunnel tunnel = new TableTunnel(odps); if (StringUtils.isNoneBlank(tunnelServer)) { tunnel.setEndpoint(tunnelServer); } try { return RetryUtil.executeWithRetry(new Callable() { @Override public TableTunnel.DownloadSession call() throws Exception { return tunnel.createDownloadSession( projectName, tableName); } }, DataXCaseEnvUtil.getRetryTimes(MAX_RETRY_TIME), DataXCaseEnvUtil.getRetryInterval(1000), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.CREATE_DOWNLOADSESSION_FAIL, e); } } public static TableTunnel.DownloadSession getSlaveSessionForNonPartitionedTable(Odps odps, final String sessionId, String tunnelServer, final String projectName, final String tableName) { final TableTunnel tunnel = new TableTunnel(odps); if (StringUtils.isNoneBlank(tunnelServer)) { tunnel.setEndpoint(tunnelServer); } try { return RetryUtil.executeWithRetry(new Callable() { @Override public TableTunnel.DownloadSession call() throws Exception { return tunnel.getDownloadSession( projectName, tableName, sessionId); } }, DataXCaseEnvUtil.getRetryTimes(MAX_RETRY_TIME), DataXCaseEnvUtil.getRetryInterval(1000), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.GET_DOWNLOADSESSION_FAIL, e); } } public static TableTunnel.DownloadSession createMasterSessionForPartitionedTable(Odps odps, String tunnelServer, final String projectName, final String tableName, String partition) { final TableTunnel tunnel = new TableTunnel(odps); if (StringUtils.isNoneBlank(tunnelServer)) { tunnel.setEndpoint(tunnelServer); } final PartitionSpec partitionSpec = new PartitionSpec(partition); try { return RetryUtil.executeWithRetry(new Callable() { @Override public TableTunnel.DownloadSession call() throws Exception { return tunnel.createDownloadSession( projectName, tableName, partitionSpec); } }, DataXCaseEnvUtil.getRetryTimes(MAX_RETRY_TIME), DataXCaseEnvUtil.getRetryInterval(1000), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.CREATE_DOWNLOADSESSION_FAIL, e); } } public static TableTunnel.DownloadSession getSlaveSessionForPartitionedTable(Odps odps, final String sessionId, String tunnelServer, final String projectName, final String tableName, String partition) { final TableTunnel tunnel = new TableTunnel(odps); if (StringUtils.isNoneBlank(tunnelServer)) { tunnel.setEndpoint(tunnelServer); } final PartitionSpec partitionSpec = new PartitionSpec(partition); try { return RetryUtil.executeWithRetry(new Callable() { @Override public TableTunnel.DownloadSession call() throws Exception { return tunnel.getDownloadSession( projectName, tableName, partitionSpec, sessionId); } }, DataXCaseEnvUtil.getRetryTimes(MAX_RETRY_TIME), DataXCaseEnvUtil.getRetryInterval(1000), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.GET_DOWNLOADSESSION_FAIL, e); } } /** * odpsreader采用的直接读取所有列的downloadSession */ public static RecordReader getRecordReader(final TableTunnel.DownloadSession downloadSession, final long start, final long count, final boolean isCompress) { try { return RetryUtil.executeWithRetry(new Callable() { @Override public RecordReader call() throws Exception { return downloadSession.openRecordReader(start, count, isCompress); } }, DataXCaseEnvUtil.getRetryTimes(MAX_RETRY_TIME), DataXCaseEnvUtil.getRetryInterval(1000), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.OPEN_RECORD_READER_FAILED, MESSAGE_SOURCE.message("odpsutil.6"), e); } } /** * odpsreader采用的指定读取某些列的downloadSession */ public static RecordReader getRecordReader(final TableTunnel.DownloadSession downloadSession, final long start, final long count, final boolean isCompress, final List columns) { try { return RetryUtil.executeWithRetry(new Callable() { @Override public RecordReader call() throws Exception { return downloadSession.openRecordReader(start, count, isCompress, columns); } }, DataXCaseEnvUtil.getRetryTimes(MAX_RETRY_TIME), DataXCaseEnvUtil.getRetryInterval(1000), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.OPEN_RECORD_READER_FAILED, MESSAGE_SOURCE.message("odpsutil.6"), e); } } /** * table.reload() 方法抛出的 odps 异常 转化为更清晰的 datax 异常 抛出 */ public static void throwDataXExceptionWhenReloadTable(Exception e, String tableName) { if (e.getMessage() != null) { if (e.getMessage().contains(OdpsExceptionMsg.ODPS_PROJECT_NOT_FOUNT)) { throw DataXException.asDataXException(OdpsReaderErrorCode.ODPS_PROJECT_NOT_FOUNT, MESSAGE_SOURCE.message("odpsutil.7", tableName), e); } else if (e.getMessage().contains(OdpsExceptionMsg.ODPS_TABLE_NOT_FOUNT)) { throw DataXException.asDataXException(OdpsReaderErrorCode.ODPS_TABLE_NOT_FOUNT, MESSAGE_SOURCE.message("odpsutil.8", tableName), e); } else if (e.getMessage().contains(OdpsExceptionMsg.ODPS_ACCESS_KEY_ID_NOT_FOUND)) { throw DataXException.asDataXException(OdpsReaderErrorCode.ODPS_ACCESS_KEY_ID_NOT_FOUND, MESSAGE_SOURCE.message("odpsutil.9", tableName), e); } else if (e.getMessage().contains(OdpsExceptionMsg.ODPS_ACCESS_KEY_INVALID)) { throw DataXException.asDataXException(OdpsReaderErrorCode.ODPS_ACCESS_KEY_INVALID, MESSAGE_SOURCE.message("odpsutil.10", tableName), e); } else if (e.getMessage().contains(OdpsExceptionMsg.ODPS_ACCESS_DENY)) { throw DataXException.asDataXException(OdpsReaderErrorCode.ODPS_ACCESS_DENY, MESSAGE_SOURCE.message("odpsutil.11", tableName), e); } } throw DataXException.asDataXException(OdpsReaderErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpsutil.12", tableName), e); } public static List getNormalColumns(List parsedColumns, Map columnTypeMap) { List userConfigNormalColumns = new ArrayList(); Set columnNameSet = new HashSet(); for (InternalColumnInfo columnInfo : parsedColumns) { if (columnInfo.getColumnType() == ColumnType.NORMAL) { String columnName = columnInfo.getColumnName(); if (!columnNameSet.contains(columnName)) { Column column = new Column(columnName, columnTypeMap.get(columnName)); userConfigNormalColumns.add(column); columnNameSet.add(columnName); } } } return userConfigNormalColumns; } /** * 执行odps preSql和postSql * * @param odps: odps client * @param sql : 要执行的odps sql语句, 因为会有重试, 所以sql 必须为幂等的 * @param tag : "preSql" or "postSql" */ public static void runSqlTaskWithRetry(final Odps odps, final String sql, final String tag){ //重试次数 int retryTimes = 10; //重试间隔(ms) long sleepTimeInMilliSecond = 1000L; try { RetryUtil.executeWithRetry(new Callable() { @Override public Void call() throws Exception { long beginTime = System.currentTimeMillis(); runSqlTask(odps, sql, tag); long endIime = System.currentTimeMillis(); LOG.info(String.format("exectue odps sql: %s finished, cost time : %s ms", sql, (endIime - beginTime))); return null; } }, DataXCaseEnvUtil.getRetryTimes(retryTimes), DataXCaseEnvUtil.getRetryInterval(sleepTimeInMilliSecond), DataXCaseEnvUtil.getRetryExponential(true)); } catch (Exception e) { String errMessage = String.format("Retry %s times to exectue sql :[%s] failed! Exception: %s", retryTimes, e.getMessage()); throw DataXException.asDataXException(OdpsReaderErrorCode.RUN_SQL_ODPS_EXCEPTION, errMessage, e); } } public static void runSqlTask(Odps odps, String sql, String tag) { if (StringUtils.isBlank(sql)) { return; } String taskName = String.format("datax_odpsreader_%s_%s", tag, UUID.randomUUID().toString().replace('-', '_')); LOG.info("Try to start sqlTask:[{}] to run odps sql:[\n{}\n] .", taskName, sql); Instance instance; Instance.TaskStatus status; try { Map hints = new HashMap(); hints.put("odps.sql.submit.mode", "script"); instance = SQLTask.run(odps, odps.getDefaultProject(), sql, taskName, hints, null); instance.waitForSuccess(); status = instance.getTaskStatus().get(taskName); if (!Instance.TaskStatus.Status.SUCCESS.equals(status.getStatus())) { throw DataXException.asDataXException(OdpsReaderErrorCode.RUN_SQL_FAILED, MESSAGE_SOURCE.message("odpsutil.13", sql)); } } catch (DataXException e) { throw e; } catch (Exception e) { throw DataXException.asDataXException(OdpsReaderErrorCode.RUN_SQL_ODPS_EXCEPTION, MESSAGE_SOURCE.message("odpsutil.14", sql), e); } } } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/SqliteUtil.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader.util; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; import java.sql.Statement; import java.util.ArrayList; import java.util.List; import com.alibaba.datax.plugin.reader.odpsreader.Constant; import com.aliyun.odps.Partition; import com.aliyun.odps.Table; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class SqliteUtil { private static final Logger LOGGER = LoggerFactory.getLogger(SqliteUtil.class); private Connection connection = null; private Statement stmt = null; private String partitionName = "partitionName"; private String createSQLTemplate = "Create Table DataXODPSReaderPPR (" + partitionName +" String, %s)"; private String insertSQLTemplate = "Insert Into DataXODPSReaderPPR Values (%s)"; private String selectSQLTemplate = "Select * From DataXODPSReaderPPR Where %s"; public SqliteUtil() throws ClassNotFoundException, SQLException { Class.forName("org.sqlite.JDBC"); this.connection = DriverManager.getConnection("jdbc:sqlite::memory:"); this.stmt = this.connection.createStatement(); } public void loadAllPartitionsIntoSqlite(Table table, List allOriginPartitions) throws SQLException { List partitionColumnList = new ArrayList(); String partition = allOriginPartitions.get(0); String[] partitionSpecs = partition.split(","); List partitionKeyList = new ArrayList(); for (String partitionKeyValue : partitionSpecs) { String partitionKey = partitionKeyValue.split("=")[0]; partitionColumnList.add(String.format("%s String", partitionKey)); partitionKeyList.add(partitionKey); } String createSQL = String.format(createSQLTemplate, StringUtils.join(partitionColumnList.toArray(), ",")); LOGGER.info(createSQL); this.stmt.execute(createSQL); insertAllOriginPartitionIntoSqlite(table, partitionKeyList); } /** * 根据用户配置的过滤条件, 从sqlite中select出符合的partition列表 * @param userHintConfiguredPartitions * @return */ public List selectUserConfiguredPartition(List userHintConfiguredPartitions) throws SQLException { List selectedPartitionsFromSqlite = new ArrayList(); for (String partitionWhereConditions : userHintConfiguredPartitions) { String selectUserConfiguredPartitionsSql = String.format(selectSQLTemplate, StringUtils.remove(partitionWhereConditions, Constant.PARTITION_FILTER_HINT)); LOGGER.info(selectUserConfiguredPartitionsSql); ResultSet rs = stmt.executeQuery(selectUserConfiguredPartitionsSql); while (rs.next()) { selectedPartitionsFromSqlite.add(getPartitionsValue(rs)); } } return selectedPartitionsFromSqlite; } private String getPartitionsValue (ResultSet rs) throws SQLException { List partitions = new ArrayList(); ResultSetMetaData rsMetaData = rs.getMetaData(); Integer columnCounter = rs.getMetaData().getColumnCount(); for (int columnIndex = 2; columnIndex <= columnCounter; columnIndex++) { partitions.add(String.format("%s=%s", rsMetaData.getColumnName(columnIndex), rs.getString(columnIndex))); } return StringUtils.join(partitions, ","); } /** * 将odps table里所有partition值载入sqlite中 * @param table * @param partitionKeyList * @throws SQLException */ private void insertAllOriginPartitionIntoSqlite(Table table, List partitionKeyList) throws SQLException { List partitions = table.getPartitions(); for (Partition partition : partitions){ List partitionColumnValue = new ArrayList(); partitionColumnValue.add("\""+partition.getPartitionSpec().toString()+"\""); for (String partitionKey : partitionKeyList) { partitionColumnValue.add("\""+partition.getPartitionSpec().get(partitionKey)+"\""); } String insertPartitionValueSql = String.format(insertSQLTemplate, StringUtils.join(partitionColumnValue, ",")); this.stmt.execute(insertPartitionValueSql); } } } ================================================ FILE: odpsreader/src/main/java/com/alibaba/datax/plugin/reader/odpsreader/util/UserConfiguredPartitionClassification.java ================================================ package com.alibaba.datax.plugin.reader.odpsreader.util; import java.util.List; public class UserConfiguredPartitionClassification { //包含/*query*/的partition, 例如: /*query*/ dt>=20170101 and dt<= 20170109 private List userConfiguredHintPartition; //不包含/*query*/的partition, 例如: dt=20170101 或者 dt=201701* private List userConfiguredNormalPartition; //是否包含hint的partition private boolean isIncludeHintPartition; public List getUserConfiguredHintPartition() { return userConfiguredHintPartition; } public void setUserConfiguredHintPartition(List userConfiguredHintPartition) { this.userConfiguredHintPartition = userConfiguredHintPartition; } public List getUserConfiguredNormalPartition() { return userConfiguredNormalPartition; } public void setUserConfiguredNormalPartition(List userConfiguredNormalPartition) { this.userConfiguredNormalPartition = userConfiguredNormalPartition; } public boolean isIncludeHintPartition() { return isIncludeHintPartition; } public void setIncludeHintPartition(boolean includeHintPartition) { isIncludeHintPartition = includeHintPartition; } } ================================================ FILE: odpsreader/src/main/resources/plugin.json ================================================ { "name": "odpsreader", "class": "com.alibaba.datax.plugin.reader.odpsreader.OdpsReader", "description": { "useScene": "prod.", "mechanism": "TODO", "warn": "TODO" }, "developer": "alibaba" } ================================================ FILE: odpsreader/src/main/resources/plugin_job_template.json ================================================ { "name": "odpsreader", "parameter": { "accessId": "", "accessKey": "", "project": "", "table": "", "partition": [], "column": [], "packageAuthorizedProject": "", "splitMode": "", "odpsServer": "" } } ================================================ FILE: odpswriter/doc/odpswriter.md ================================================ # DataX ODPS写入 --- ## 1 快速介绍 ODPSWriter插件用于实现往ODPS插入或者更新数据,主要提供给etl开发同学将业务数据导入odps,适合于TB,GB数量级的数据传输,如果需要传输PB量级的数据,请选择dt task工具 ; ## 2 实现原理 在底层实现上,ODPSWriter是通过DT Tunnel写入ODPS系统的,有关ODPS的更多技术细节请参看 ODPS主站 https://data.aliyun.com/product/odps 和ODPS产品文档 https://help.aliyun.com/product/27797.html 目前 DataX3 依赖的 SDK 版本是: com.aliyun.odps odps-sdk-core-internal 0.13.2 注意: **如果你需要使用ODPSReader/Writer插件,请务必使用JDK 1.6-32及以上版本** 使用java -version查看Java版本号 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到ODPS导入的数据。 ```json { "job": { "setting": { "speed": { "byte": 1048576 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "value": "DataX", "type": "string" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 100000 } }, "writer": { "name": "odpswriter", "parameter": { "project": "chinan_test", "table": "odps_write_test00_partitioned", "partition": "school=SiChuan-School,class=1", "column": [ "id", "name" ], "accessId": "xxx", "accessKey": "xxxx", "truncate": true, "odpsServer": "http://sxxx/api", "tunnelServer": "http://xxx" } } } ] } } ``` ### 3.2 参数说明 * **accessId** * 描述:ODPS系统登录ID
* 必选:是
* 默认值:无
* **accessKey** * 描述:ODPS系统登录Key
* 必选:是
* 默认值:无
* **project** * 描述:ODPS表所属的project,注意:Project只能是字母+数字组合,请填写英文名称。在云端等用户看到的ODPS项目中文名只是显示名,请务必填写底层真实地Project英文标识名。
* 必选:是
* 默认值:无
* **table** * 描述:写入数据的表名,不能填写多张表,因为DataX不支持同时导入多张表。
* 必选:是
* 默认值:无
* **partition** * 描述:需要写入数据表的分区信息,必须指定到最后一级分区。把数据写入一个三级分区表,必须配置到最后一级分区,例如pt=20150101/type=1/biz=2。
* 必选:**如果是分区表,该选项必填,如果非分区表,该选项不可填写。** * 默认值:空
* **column** * 描述:需要导入的字段列表,当导入全部字段时,可以配置为"column": ["*"], 当需要插入部分odps列填写部分列,例如"column": ["id", "name"]。ODPSWriter支持列筛选、列换序,例如表有a,b,c三个字段,用户只同步c,b两个字段。可以配置成["c","b"], 在导入过程中,字段a自动补空,设置为null。
* 必选:否
* 默认值:无
* **truncate** * 描述:ODPSWriter通过配置"truncate": true,保证写入的幂等性,即当出现写入失败再次运行时,ODPSWriter将清理前述数据,并导入新数据,这样可以保证每次重跑之后的数据都保持一致。
**truncate选项不是原子操作!ODPS SQL无法做到原子性。因此当多个任务同时向一个Table/Partition清理分区时候,可能出现并发时序问题,请务必注意!**针对这类问题,我们建议尽量不要多个作业DDL同时操作同一份分区,或者在多个并发作业启动前,提前创建分区。 * 必选:是
* 默认值:无
* **odpsServer** * 描述:ODPS的server地址,线上地址为 http://service.odps.aliyun.com/api
* 必选:是
* 默认值:无
* **tunnelServer** * 描述:ODPS的tunnelserver地址,线上地址为 http://dt.odps.aliyun.com
* 必选:是,
* 默认值:无
### 3.3 类型转换 类似ODPSReader,目前ODPSWriter支持大部分ODPS类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出ODPSWriter针对ODPS类型转换列表: | DataX 内部类型| ODPS 数据类型 | | -------- | ----- | | Long |bigint | | Double |double | | String |string | | Date |datetime | | Boolean |bool | ## 4 插件特点 ### 4.1 关于列筛选的问题 * ODPS本身不支持列筛选、重排序、补空等等,但是DataX ODPSWriter完成了上述需求,支持列筛选、重排序、补空。例如需要导入的字段列表,当导入全部字段时,可以配置为"column": ["*"],odps表有a,b,c三个字段,用户只同步c,b两个字段,在列配置中可以写成"column": ["c","b"],表示会把reader的第一列和第二列导入odps的c字段和b字段,而odps表中新插入记录的a字段会被置为null. ### 4.2 列配置错误的处理 * 为了保证写入数据的可靠性,避免多余列数据丢失造成数据质量故障。对于写入多余的列,ODPSWriter将报错。例如ODPS表字段为a,b,c,但是ODPSWriter写入的字段为多于3列的话ODPSWriter将报错。 ### 4.3 分区配置注意事项 * ODPSWriter只提供 **写入到最后一级分区** 功能,不支持写入按照某个字段进行分区路由等功能。假设表一共有3级分区,那么在分区配置中就必须指明写入到某个三级分区,例如把数据写入一个表的第三级分区,可以配置为 pt=20150101/type=1/biz=2,但是不能配置为pt=20150101/type=1或者pt=20150101。 ### 4.4 任务重跑和failover * ODPSWriter通过配置"truncate": true,保证写入的幂等性,即当出现写入失败再次运行时,ODPSWriter将清理前述数据,并导入新数据,这样可以保证每次重跑之后的数据都保持一致。如果在运行过程中因为其他的异常导致了任务中断,是不能保证数据的原子性的,数据不会回滚也不会自动重跑,需要用户利用幂等性这一特点重跑去确保保证数据的完整性。**truncate为true的情况下,会将指定分区\表的数据全部清理,请谨慎使用!** ## 5 性能报告(线上环境实测) ### 5.1 环境准备 #### 5.1.1 数据特征 建表语句: use cdo_datasync; create table datax3_odpswriter_perf_10column_1kb_00( s_0 string, bool_1 boolean, bi_2 bigint, dt_3 datetime, db_4 double, s_5 string, s_6 string, s_7 string, s_8 string, s_9 string )PARTITIONED by (pt string,year string); 单行记录类似于: s_0 : 485924f6ab7f272af361cd3f7f2d23e0d764942351#$%^&fdafdasfdas%%^(*&^^&* bool_1 : true bi_2 : 1696248667889 dt_3 : 2013-07-0600: 00: 00 db_4 : 3.141592653578 s_5 : 100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209 s_6 : 100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11fdsafdsfdsa209 s_7 : 100DAFDSAFDSAHOFJDPSAWIFDISHAF;dsadsafdsahfdsajf;dsfdsa;FJDSAL;11209 s_8 : 100dafdsafdsahofjdpsawifdishaf;DSADSAFDSAHFDSAJF;dsfdsa;fjdsal;11209 s_9 : 12~!2345100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209 #### 5.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu : 24 Core Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz cache 15.36MB 2. mem : 50GB 3. net : 千兆双网卡 4. jvm : -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError 5. disc: DataX 数据不落磁盘,不统计此项 * 任务配置为: ``` { "job": { "setting": { "speed": { "channel": "1,2,4,5,6,8,16,32,64" } }, "content": [ { "reader": { "name": "odpsreader", "parameter": { "accessId": "******************************", "accessKey": "*****************************", "column": [ "*" ], "partition": [ "pt=20141010000000,year=2014" ], "odpsServer": "http://service.odps.aliyun.com/api", "project": "cdo_datasync", "table": "datax3_odpswriter_perf_10column_1kb_00", "tunnelServer": "http://dt.odps.aliyun.com" } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "column": [ { "value": "485924f6ab7f272af361cd3f7f2d23e0d764942351#$%^&fdafdasfdas%%^(*&^^&*" }, { "value": "true", "type": "bool" }, { "value": "1696248667889", "type": "long" }, { "type": "date", "value": "2013-07-06 00:00:00", "dateFormat": "yyyy-mm-dd hh:mm:ss" }, { "value": "3.141592653578", "type": "double" }, { "value": "100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209" }, { "value": "100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11fdsafdsfdsa209" }, { "value": "100DAFDSAFDSAHOFJDPSAWIFDISHAF;dsadsafdsahfdsajf;dsfdsa;FJDSAL;11209" }, { "value": "100dafdsafdsahofjdpsawifdishaf;DSADSAFDSAHFDSAJF;dsfdsa;fjdsal;11209" }, { "value": "12~!2345100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209" } ] } } } ] } } ``` ### 5.2 测试报告 | 并发任务数|blockSizeInMB| DataX速度(Rec/s)|DataX流量(MB/S)|网卡流量(MB/S)|DataX运行负载| |--------| --------|--------|--------|--------|--------| |1|32|30303|13.03|14.5|0.12| |1|64|38461|16.54|16.5|0.44| |1|128|46454|20.55|26.7|0.47| |1|256|52631|22.64|26.7|0.47| |1|512|58823|25.30|28.7|0.44| |4|32|114816|49.38|55.3|0.75| |4|64|147577|63.47|71.3|0.82| |4|128|177744|76.45|83.2|0.97| |4|256|173913|74.80|80.1|1.01| |4|512|200000|86.02|95.1|1.41| |8|32|204480|87.95|92.7|1.16| |8|64|294224|126.55|135.3|1.65| |8|128|365475|157.19|163.7|2.89| |8|256|394713|169.83|176.7|2.72| |8|512|241691|103.95|125.7|2.29| |16|32|420838|181.01|198.0|2.56| |16|64|458144|197.05|217.4|2.85| |16|128|443219|190.63|210.5|3.29| |16|256|315235|135.58|140.0|0.95| |16|512|OOM||||| 说明: 1. OdpsWriter 影响速度的是channel 和 blockSizeInMB。blockSizeInMB 取`32` 和 `64`时,速度比较稳定,过分大的 blockSizeInMB 可能造成速度波动以及内存OOM。 2. channel 和 blockSizeInMB 对速度的影响都很明显,建议综合考虑配合选择。 3. channel 数目的选择,可以综合考虑源端数据特征进行选择,对于StreamReader,在16个channel时将网卡打满。 ## 6 FAQ #### 1 导数据到 odps 的日志中有以下报错,该怎么处理呢?"ODPS-0420095: Access Denied - Authorization Failed [4002], You doesn‘t exist in project example_dev“ 解决办法 :找ODPS Prject 的 owner给用户的云账号授权,授权语句: grant Describe,Select,Alter,Update on table [tableName] to user XXX #### 2 可以导入数据到odps的视图吗? 目前不支持通过视图到数据到odps,视图是ODPS非实体化数据存储对象,技术上无法向视图导入数据。 ================================================ FILE: odpswriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT odpswriter odpswriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.aliyun.odps odps-sdk-core 0.38.4-public commons-httpclient commons-httpclient 3.1 org.mockito mockito-core 1.8.5 test org.powermock powermock-api-mockito 1.4.10 test org.powermock powermock-module-junit4 1.4.10 test org.aspectj aspectjweaver 1.8.10 commons-codec commons-codec 1.8 src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: odpswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/odpswriter target/ odpswriter-0.0.1-SNAPSHOT.jar plugin/writer/odpswriter false plugin/writer/odpswriter/libs runtime ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter; public class Constant { public static final String COLUMN_POSITION = "columnPosition"; /* * 每个task独立维护一个proxy列表,一共会生成 task并发量 * 分区数量 的proxy,每个proxy会创建 blocksizeInMB(一般是64M) 大小的数组 * 因此极易OOM, * 假设默认情况下768M的内存,实际最多只能创建 12 个proxy,8G内存最多只能创建126个proxy,所以最多只允许创建一定数量的proxy,对应到分区数量 1:1 * * blockSizeInMB 减小可以减少内存消耗,但是意味着更高频率的网络请求,会对odps服务器造成较大压力 * * 另外,可以考虑proxy不用常驻内存,但是需要增加复杂的控制逻辑 * 但是一般情况下用户作为分区值得数据是有规律的,比如按照时间,2020-08的数据已经同步完成了,并且后面没有这个分区的数据了,对应的proxy还放在内存中, * 会造成很大的内存浪费。所以有必要对某些proxy进行回收。 * * 这里采用是否回收某个proxy的标准是:在最近时间内是否有过数据传输。 * * * 需要注意的问题! * 多个任务公用一个proxy,写入时需要抢锁,多并发的性能会受到很大影响,相当于单个分区时串行写入 * 这个对性能影响很大,需要避免这种方式,还是尽量各个task有独立的proxy,只是需要去控制内存的使用,只能是控制每个task保有的proxy数量了 * * 还可以考虑修改proxy的数组大小,但是设置太小不确定会不会影响性能。可以测试一下 */ public static final Long PROXY_MAX_IDLE_TIME_MS =60 * 1000L; // 60s没有动作就回收 public static final Long MAX_PARTITION_CNT = 200L; public static final int UTF8_ENCODED_CHAR_MAX_SIZE = 6; public static final int DEFAULT_FIELD_MAX_SIZE = 8 * 1024 * 1024; } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/DateTransForm.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter; public class DateTransForm { /** * 列名称 */ private String colName; /** * 之前是什么格式 */ private String fromFormat; /** * 要转换成什么格式 */ private String toFormat; public DateTransForm(String colName, String fromFormat, String toFormat) { this.colName = colName; this.fromFormat = fromFormat; this.toFormat = toFormat; } public String getColName() { return colName; } public void setColName(String colName) { this.colName = colName; } public String getFromFormat() { return fromFormat; } public void setFromFormat(String fromFormat) { this.fromFormat = fromFormat; } public String getToFormat() { return toFormat; } public void setToFormat(String toFormat) { this.toFormat = toFormat; } @Override public String toString() { return "DateTransForm{" + "colName='" + colName + '\'' + ", fromFormat='" + fromFormat + '\'' + ", toFormat='" + toFormat + '\'' + '}'; } } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter; public final class Key { public final static String ODPS_SERVER = "odpsServer"; public final static String TUNNEL_SERVER = "tunnelServer"; public final static String ACCESS_ID = "accessId"; public final static String ACCESS_KEY = "accessKey"; public final static String SECURITY_TOKEN = "securityToken"; public final static String PROJECT = "project"; public final static String TABLE = "table"; public final static String PARTITION = "partition"; public final static String COLUMN = "column"; public final static String TRUNCATE = "truncate"; public final static String MAX_RETRY_TIME = "maxRetryTime"; public final static String BLOCK_SIZE_IN_MB = "blockSizeInMB"; //boolean 类型,default:false public final static String EMPTY_AS_NULL = "emptyAsNull"; public final static String IS_COMPRESS = "isCompress"; // preSql public final static String PRE_SQL="preSql"; // postSql public final static String POST_SQL="postSql"; public final static String CONSISTENCY_COMMIT = "consistencyCommit"; public final static String UPLOAD_ID = "uploadId"; public final static String TASK_COUNT = "taskCount"; /** * support dynamic partition,支持动态分区,即根据读取到的record的某一列或几列来确定该record应该存入哪个分区 * 1. 如何确定根据哪些列:根据目的表哪几列是分区列,再根据对应的column来路由 * 2. 何时创建upload session:由于是动态分区,因此无法在初始化时确定分区,也就无法在初始化时创建 upload session,只有再读取到具体record之后才能创建 * 3. 缓存 upload sesseion:每当出现新的分区,则创建新的session,同时将该分区对应的session缓存下来,以备下次又有需要存入该分区的记录 * 4. 参数检查:不必要检查分区是否配置 */ public final static String SUPPORT_DYNAMIC_PARTITION = "supportDynamicPartition"; /** * 动态分区下,用户如果将源表的某一个时间列映射到分区列,存在如下需求场景:源表的该时间列精确到秒,当时同步到odps表时,只想保留到天,并存入对应的天分区 * 格式: * "partitionColumnMapping":[ * { * "name":"pt", // 必填 * "srcDateFormat":"YYYY-MM-dd hh:mm:ss", // 可选,可能源表中的时间列是 String 类型,此时必须通过 fromDateFormat 来指定源表中该列的日期格式 * "dateFormat":"YYYY-MM-dd" // 必填 * }, * { * ... * }, * * ... * ] */ public final static String PARTITION_COL_MAPPING = "partitionColumnMapping"; public final static String PARTITION_COL_MAPPING_NAME = "name"; public final static String PARTITION_COL_MAPPING_SRC_COL_DATEFORMAT = "srcDateFormat"; public final static String PARTITION_COL_MAPPING_DATEFORMAT = "dateFormat"; public final static String WRITE_TIMEOUT_IN_MS = "writeTimeoutInMs"; public static final String OVER_LENGTH_RULE = "overLengthRule"; //截断后保留的最大长度 public static final String MAX_FIELD_LENGTH = "maxFieldLength"; //odps本身支持的最大长度 public static final String MAX_ODPS_FIELD_LENGTH = "maxOdpsFieldLength"; public static final String ENABLE_OVER_LENGTH_OUTPUT = "enableOverLengthOutput"; public static final String MAX_OVER_LENGTH_OUTPUT_COUNT = "maxOverLengthOutputCount"; //动态分区写入模式下,内存使用率达到80%则flush时间间隔,单位分钟 public static final String DYNAMIC_PARTITION_MEM_USAGE_FLUSH_INTERVAL_IN_MINUTE = "dynamicPartitionMemUsageFlushIntervalInMinute"; } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/LocalStrings.properties ================================================ errorcode.required_value=\u60a8\u7f3a\u5931\u4e86\u5fc5\u987b\u586b\u5199\u7684\u53c2\u6570\u503c. errorcode.illegal_value=\u60a8\u914d\u7f6e\u7684\u503c\u4e0d\u5408\u6cd5. errorcode.unsupported_column_type=DataX \u4e0d\u652f\u6301\u5199\u5165 ODPS \u7684\u76ee\u7684\u8868\u7684\u6b64\u79cd\u6570\u636e\u7c7b\u578b. errorcode.table_truncate_error=\u6e05\u7a7a ODPS \u76ee\u7684\u8868\u65f6\u51fa\u9519. errorcode.create_master_upload_fail=\u521b\u5efa ODPS \u7684 uploadSession \u5931\u8d25. errorcode.get_slave_upload_fail=\u83b7\u53d6 ODPS \u7684 uploadSession \u5931\u8d25. errorcode.get_id_key_fail=\u83b7\u53d6 accessId/accessKey \u5931\u8d25. errorcode.get_partition_fail=\u83b7\u53d6 ODPS \u76ee\u7684\u8868\u7684\u6240\u6709\u5206\u533a\u5931\u8d25. errorcode.add_partition_failed=\u6dfb\u52a0\u5206\u533a\u5230 ODPS \u76ee\u7684\u8868\u5931\u8d25. errorcode.writer_record_fail=\u5199\u5165\u6570\u636e\u5230 ODPS \u76ee\u7684\u8868\u5931\u8d25. errorcode.commit_block_fail=\u63d0\u4ea4 block \u5230 ODPS \u76ee\u7684\u8868\u5931\u8d25. errorcode.run_sql_failed=\u6267\u884c ODPS Sql \u5931\u8d25. errorcode.check_if_partitioned_table_failed=\u68c0\u67e5 ODPS \u76ee\u7684\u8868:%s \u662f\u5426\u4e3a\u5206\u533a\u8868\u5931\u8d25. errorcode.run_sql_odps_exception=\u6267\u884c ODPS Sql \u65f6\u629b\u51fa\u5f02\u5e38, \u53ef\u91cd\u8bd5 errorcode.account_type_error=\u8d26\u53f7\u7c7b\u578b\u9519\u8bef. errorcode.partition_error=\u5206\u533a\u914d\u7f6e\u9519\u8bef. errorcode.column_not_exist=\u7528\u6237\u914d\u7f6e\u7684\u5217\u4e0d\u5b58\u5728. errorcode.odps_project_not_fount=\u60a8\u914d\u7f6e\u7684\u503c\u4e0d\u5408\u6cd5, odps project \u4e0d\u5b58\u5728. errorcode.odps_table_not_fount=\u60a8\u914d\u7f6e\u7684\u503c\u4e0d\u5408\u6cd5, odps table \u4e0d\u5b58\u5728 errorcode.odps_access_key_id_not_found=\u60a8\u914d\u7f6e\u7684\u503c\u4e0d\u5408\u6cd5, odps accessId,accessKey \u4e0d\u5b58\u5728 errorcode.odps_access_key_invalid=\u60a8\u914d\u7f6e\u7684\u503c\u4e0d\u5408\u6cd5, odps accessKey \u9519\u8bef errorcode.odps_access_deny=\u62d2\u7edd\u8bbf\u95ee, \u60a8\u4e0d\u5728 \u60a8\u914d\u7f6e\u7684 project \u4e2d odpswriter.1=\u8d26\u53f7\u7c7b\u578b\u9519\u8bef\uff0c\u56e0\u4e3a\u4f60\u7684\u8d26\u53f7 [{0}] \u4e0d\u662fdatax\u76ee\u524d\u652f\u6301\u7684\u8d26\u53f7\u7c7b\u578b\uff0c\u76ee\u524d\u4ec5\u652f\u6301aliyun, taobao\u8d26\u53f7\uff0c\u8bf7\u4fee\u6539\u60a8\u7684\u8d26\u53f7\u4fe1\u606f. odpswriter.2=\u8fd9\u662f\u4e00\u6761\u9700\u8981\u6ce8\u610f\u7684\u4fe1\u606f \u7531\u4e8e\u60a8\u7684\u4f5c\u4e1a\u914d\u7f6e\u4e86\u5199\u5165 ODPS \u7684\u76ee\u7684\u8868\u65f6emptyAsNull=true, \u6240\u4ee5 DataX\u5c06\u4f1a\u628a\u957f\u5ea6\u4e3a0\u7684\u7a7a\u5b57\u7b26\u4e32\u4f5c\u4e3a java \u7684 null \u5199\u5165 ODPS. odpswriter.3=\u60a8\u914d\u7f6e\u7684blockSizeInMB:{0} \u53c2\u6570\u9519\u8bef. \u6b63\u786e\u7684\u914d\u7f6e\u662f[1-512]\u4e4b\u95f4\u7684\u6574\u6570. \u8bf7\u4fee\u6539\u6b64\u53c2\u6570\u7684\u503c\u4e3a\u8be5\u533a\u95f4\u5185\u7684\u6570\u503c odpswriter.4=\u5199\u5165 ODPS \u76ee\u7684\u8868\u5931\u8d25. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpswriterproxy.1=\u4eb2\uff0c\u914d\u7f6e\u4e2d\u7684\u6e90\u8868\u7684\u5217\u4e2a\u6570\u548c\u76ee\u7684\u7aef\u8868\u4e0d\u4e00\u81f4\uff0c\u6e90\u8868\u4e2d\u60a8\u914d\u7f6e\u7684\u5217\u6570\u662f:{0} \u5927\u4e8e\u76ee\u7684\u7aef\u7684\u5217\u6570\u662f:{1} , \u8fd9\u6837\u4f1a\u5bfc\u81f4\u6e90\u5934\u6570\u636e\u65e0\u6cd5\u6b63\u786e\u5bfc\u5165\u76ee\u7684\u7aef, \u8bf7\u68c0\u67e5\u60a8\u7684\u914d\u7f6e\u5e76\u4fee\u6539. odpswriterproxy.2=\u6e90\u8868\u7684\u5217\u4e2a\u6570\u5c0f\u4e8e\u76ee\u7684\u8868\u7684\u5217\u4e2a\u6570\uff0c\u6e90\u8868\u5217\u6570\u662f:{0} \u76ee\u7684\u8868\u5217\u6570\u662f:{1} , \u6570\u76ee\u4e0d\u5339\u914d. DataX \u4f1a\u628a\u76ee\u7684\u7aef\u591a\u51fa\u7684\u5217\u7684\u503c\u8bbe\u7f6e\u4e3a\u7a7a\u503c. \u5982\u679c\u8fd9\u4e2a\u9ed8\u8ba4\u914d\u7f6e\u4e0d\u7b26\u5408\u60a8\u7684\u671f\u671b\uff0c\u8bf7\u4fdd\u6301\u6e90\u8868\u548c\u76ee\u7684\u8868\u914d\u7f6e\u7684\u5217\u6570\u76ee\u4fdd\u6301\u4e00\u81f4. odpswriterproxy.3=Odps decimal \u7c7b\u578b\u7684\u6574\u6570\u4f4d\u4e2a\u6570\u4e0d\u80fd\u8d85\u8fc735 odpswriterproxy.4=\u5199\u5165 ODPS \u76ee\u7684\u8868\u65f6\u9047\u5230\u4e86\u810f\u6570\u636e: \u7b2c[{0}]\u4e2a\u5b57\u6bb5 {1} \u7684\u6570\u636e\u51fa\u73b0\u9519\u8bef\uff0c\u8bf7\u68c0\u67e5\u8be5\u6570\u636e\u5e76\u4f5c\u51fa\u4fee\u6539 \u6216\u8005\u60a8\u53ef\u4ee5\u589e\u5927\u9600\u503c\uff0c\u5ffd\u7565\u8fd9\u6761\u8bb0\u5f55. ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriter.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.statistics.PerfRecord; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.ListUtil; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.writer.odpswriter.model.PartitionInfo; import com.alibaba.datax.plugin.writer.odpswriter.model.UserDefinedFunction; import com.alibaba.datax.plugin.writer.odpswriter.util.*; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import com.aliyun.odps.Odps; import com.aliyun.odps.Table; import com.aliyun.odps.TableSchema; import com.aliyun.odps.tunnel.TableTunnel; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.MutablePair; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.lang.management.ManagementFactory; import java.lang.management.MemoryUsage; import java.util.*; import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicLong; import java.util.stream.Collectors; import static com.alibaba.datax.plugin.writer.odpswriter.util.CustomPartitionUtils.getListWithJson; /** * 已修改为:每个 task 各自创建自己的 upload,拥有自己的 uploadId,并在 task 中完成对对应 block 的提交。 */ public class OdpsWriter extends Writer { public static HashSet partitionsDealedTruncate = new HashSet<>(); static final Object lockForPartitionDealedTruncate = new Object(); public static AtomicInteger partitionCnt = new AtomicInteger(0); public static Long maxPartitionCnt; public static AtomicLong globalTotalTruncatedRecordNumber = new AtomicLong(0); public static Long maxOutputOverLengthRecord; public static int maxOdpsFieldLength = Constant.DEFAULT_FIELD_MAX_SIZE; public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OdpsWriter.class); private static final boolean IS_DEBUG = LOG.isDebugEnabled(); private Configuration originalConfig; private Odps odps; private Table table; private String projectName; private String tableName; private String tunnelServer; private String partition; private boolean truncate; private String uploadId; private TableTunnel.UploadSession masterUpload; private int blockSizeInMB; private boolean consistencyCommit; private boolean supportDynamicPartition; public void preCheck() { this.init(); this.doPreCheck(); } public void doPreCheck() { //检查列信息是否正确 List allColumns = OdpsUtil.getAllColumns(this.table.getSchema()); LOG.info("allColumnList: {} .", StringUtils.join(allColumns, ',')); List allPartColumns = OdpsUtil.getAllPartColumns(this.table.getSchema()); LOG.info("allPartColumnsList: {} .", StringUtils.join(allPartColumns, ',')); dealColumn(this.originalConfig, allColumns, allPartColumns); //检查分区信息是否正确 if (!supportDynamicPartition) { OdpsUtil.preCheckPartition(this.odps, this.table, this.partition, this.truncate); } } @Override public void init() { this.originalConfig = super.getPluginJobConf(); OdpsUtil.checkNecessaryConfig(this.originalConfig); OdpsUtil.dealMaxRetryTime(this.originalConfig); this.projectName = this.originalConfig.getString(Key.PROJECT); this.tableName = this.originalConfig.getString(Key.TABLE); this.tunnelServer = this.originalConfig.getString(Key.TUNNEL_SERVER, null); // init odps config this.odps = OdpsUtil.initOdpsProject(this.originalConfig); //检查表等配置是否正确 this.table = OdpsUtil.getTable(odps, this.projectName, this.tableName); // 处理动态分区参数,以及动态分区相关配置是否合法,如果没有配置动态分区,则根据列映射配置决定是否启用 this.dealDynamicPartition(); //check isCompress this.originalConfig.getBool(Key.IS_COMPRESS, false); // 如果不是动态分区写入,则检查分区配置,动态分区写入不用检查 if (!this.supportDynamicPartition) { this.partition = OdpsUtil.formatPartition(this.originalConfig .getString(Key.PARTITION, ""), true); this.originalConfig.set(Key.PARTITION, this.partition); } this.truncate = this.originalConfig.getBool(Key.TRUNCATE); this.consistencyCommit = this.originalConfig.getBool(Key.CONSISTENCY_COMMIT, false); boolean emptyAsNull = this.originalConfig.getBool(Key.EMPTY_AS_NULL, false); this.originalConfig.set(Key.EMPTY_AS_NULL, emptyAsNull); if (emptyAsNull) { LOG.warn(MESSAGE_SOURCE.message("odpswriter.2")); } this.blockSizeInMB = this.originalConfig.getInt(Key.BLOCK_SIZE_IN_MB, 64); if (this.blockSizeInMB < 8) { this.blockSizeInMB = 8; } this.originalConfig.set(Key.BLOCK_SIZE_IN_MB, this.blockSizeInMB); LOG.info("blockSizeInMB={}.", this.blockSizeInMB); maxPartitionCnt = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax() / 1024 / 1024 / this.blockSizeInMB; if (maxPartitionCnt < Constant.MAX_PARTITION_CNT) { maxPartitionCnt = Constant.MAX_PARTITION_CNT; } LOG.info("maxPartitionCnt={}", maxPartitionCnt); if (IS_DEBUG) { LOG.debug("After master init(), job config now is: [\n{}\n] .", this.originalConfig.toJSON()); } } private void dealDynamicPartition() { /* * 如果显示配置了 supportDynamicPartition,则以配置为准 * 如果没有配置,表为分区表且 列映射中包所有含分区列 */ List partitionCols = OdpsUtil.getAllPartColumns(this.table.getSchema()); List configCols = this.originalConfig.getList(Key.COLUMN, String.class); LOG.info("partition columns:{}", partitionCols); LOG.info("config columns:{}", configCols); LOG.info("support dynamic partition:{}",this.originalConfig.getBool(Key.SUPPORT_DYNAMIC_PARTITION)); LOG.info("partition format type:{}",this.originalConfig.getString("partitionFormatType")); if (this.originalConfig.getKeys().contains(Key.SUPPORT_DYNAMIC_PARTITION)) { this.supportDynamicPartition = this.originalConfig.getBool(Key.SUPPORT_DYNAMIC_PARTITION); if (supportDynamicPartition) { // 自定义分区 if("custom".equalsIgnoreCase(originalConfig.getString("partitionFormatType"))){ List partitions = getListWithJson(originalConfig,"customPartitionColumns",PartitionInfo.class); // 自定义分区配置必须与实际分区列完全一致 if (!ListUtil.checkIfAllSameValue(partitions.stream().map(item->item.getName()).collect(Collectors.toList()), partitionCols)) { throw DataXException.asDataXException("custom partition config is not same as real partition info."); } } else { // 设置动态分区写入为真--检查是否所有分区列都配置在了列映射中,不满足则抛出异常 if (!ListUtil.checkIfBInA(configCols, partitionCols, false)) { throw DataXException.asDataXException("You config supportDynamicPartition as true, but didn't config all partition columns"); } } } else { // 设置动态分区写入为假--确保列映射中没有配置分区列,配置则抛出异常 if (ListUtil.checkIfHasSameValue(configCols, partitionCols)) { throw DataXException.asDataXException("You should config all partition columns in column param, or you can specify a static partition param"); } } } else { if (OdpsUtil.isPartitionedTable(table)) { // 分区表,列映射配置了分区,同时检查所有分区列要么都被配置,要么都没有配置 if (ListUtil.checkIfBInA(configCols, partitionCols, false)) { // 所有的partition 列都配置在了column中 this.supportDynamicPartition = true; } else { // 并非所有partition列都配置在了column中,此时还需检查是否只配置了部分,如果只配置了部分,则报错 if (ListUtil.checkIfHasSameValue(configCols, partitionCols)) { throw DataXException.asDataXException("You should config all partition columns in column param, or you can specify a static partition param"); } // 分区列没有配置任何分区列,则设置为false this.supportDynamicPartition = false; } } else { LOG.info("{} is not a partition tale, set supportDynamicParition as false", this.tableName); this.supportDynamicPartition = false; } } // 分布式下不支持动态分区写入,如果是分布式模式则报错 LOG.info("current run mode: {}", System.getProperty("datax.executeMode")); if (supportDynamicPartition && StringUtils.equalsIgnoreCase("distribute", System.getProperty("datax.executeMode"))) { LOG.error("Distribute mode don't support dynamic partition writing"); System.exit(1); } } @Override public void prepare() { // init odps config this.odps = OdpsUtil.initOdpsProject(this.originalConfig); List preSqls = this.originalConfig.getList(Key.PRE_SQL, String.class); if (preSqls != null && !preSqls.isEmpty()) { LOG.info(String.format("Beigin to exectue preSql : %s. \n Attention: these preSqls must be idempotent!!!", JSON.toJSONString(preSqls))); long beginTime = System.currentTimeMillis(); for (String preSql : preSqls) { preSql = preSql.trim(); if (!preSql.endsWith(";")) { preSql = String.format("%s;", preSql); } OdpsUtil.runSqlTaskWithRetry(this.odps, preSql, "preSql"); } long endTime = System.currentTimeMillis(); LOG.info(String.format("Exectue odpswriter preSql successfully! cost time: %s ms.", (endTime - beginTime))); } //检查表等配置是否正确 this.table = OdpsUtil.getTable(odps, this.projectName, this.tableName); // 如果是动态分区写入,因为无需配置分区信息,因此也无法在任务初始化时进行 truncate if (!supportDynamicPartition) { OdpsUtil.dealTruncate(this.odps, this.table, this.partition, this.truncate); } } /** * 此处主要是对 uploadId进行设置,以及对 blockId 的开始值进行设置。 *

* 对 blockId 需要同时设置开始值以及下一个 blockId 的步长值(INTERVAL_STEP)。 */ @Override public List split(int mandatoryNumber) { List configurations = new ArrayList(); // 此处获取到 masterUpload 只是为了拿到 RecordSchema,以完成对 column 的处理 TableTunnel tableTunnel = new TableTunnel(this.odps); if (StringUtils.isNoneBlank(tunnelServer)) { tableTunnel.setEndpoint(tunnelServer); } TableSchema schema = this.table.getSchema(); List allColumns = OdpsUtil.getAllColumns(schema); LOG.info("allColumnList: {} .", StringUtils.join(allColumns, ',')); List allPartColumns = OdpsUtil.getAllPartColumns(this.table.getSchema()); LOG.info("allPartColumnsList: {} .", StringUtils.join(allPartColumns, ',')); dealColumn(this.originalConfig, allColumns, allPartColumns); this.originalConfig.set("allColumns", allColumns); // 动态分区模式下,无法事先根据分区创建好 session, if (!supportDynamicPartition) { this.masterUpload = OdpsUtil.createMasterTunnelUpload( tableTunnel, this.projectName, this.tableName, this.partition); this.uploadId = this.masterUpload.getId(); LOG.info("Master uploadId:[{}].", this.uploadId); } for (int i = 0; i < mandatoryNumber; i++) { Configuration tempConfig = this.originalConfig.clone(); // 非动态分区模式下,设置了统一提交,则需要克隆主 upload session,否则各个 task "各自为战" if (!supportDynamicPartition && this.consistencyCommit) { tempConfig.set(Key.UPLOAD_ID, uploadId); tempConfig.set(Key.TASK_COUNT, mandatoryNumber); } // 设置task的supportDynamicPartition属性 tempConfig.set(Key.SUPPORT_DYNAMIC_PARTITION, this.supportDynamicPartition); configurations.add(tempConfig); } if (IS_DEBUG) { LOG.debug("After master split, the job config now is:[\n{}\n].", this.originalConfig); } return configurations; } private void dealColumn(Configuration originalConfig, List allColumns, List allPartColumns) { //之前已经检查了userConfiguredColumns 一定不为空 List userConfiguredColumns = originalConfig.getList(Key.COLUMN, String.class); // 动态分区下column不支持配置* if (supportDynamicPartition && userConfiguredColumns.contains("*")) { throw DataXException.asDataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, "In dynamic partition write mode you can't specify column with *."); } if (1 == userConfiguredColumns.size() && "*".equals(userConfiguredColumns.get(0))) { userConfiguredColumns = allColumns; originalConfig.set(Key.COLUMN, allColumns); } else { //检查列是否重复,大小写不敏感(所有写入,都是不允许写入段的列重复的) ListUtil.makeSureNoValueDuplicate(userConfiguredColumns, false); //检查列是否存在,大小写不敏感 if (supportDynamicPartition) { List allColumnList = new ArrayList(); allColumnList.addAll(allColumns); allColumnList.addAll(allPartColumns); ListUtil.makeSureBInA(allColumnList, userConfiguredColumns, false); } else { ListUtil.makeSureBInA(allColumns, userConfiguredColumns, false); } } // 获取配置的所有数据列在目标表中所有数据列中的真正位置, -1 代表该列为分区列 List columnPositions = OdpsUtil.parsePosition(allColumns, allPartColumns, userConfiguredColumns); originalConfig.set(Constant.COLUMN_POSITION, columnPositions); } @Override public void post() { if (supportDynamicPartition) { LOG.info("Total create partition cnt:{}", partitionCnt); } if (!supportDynamicPartition && this.consistencyCommit) { LOG.info("Master which uploadId=[{}] begin to commit blocks.", this.uploadId); OdpsUtil.masterComplete(this.masterUpload); LOG.info("Master which uploadId=[{}] commit blocks ok.", this.uploadId); } List postSqls = this.originalConfig.getList(Key.POST_SQL, String.class); if (postSqls != null && !postSqls.isEmpty()) { LOG.info(String.format("Beigin to exectue postSql : %s. \n Attention: these postSqls must be idempotent!!!", JSON.toJSONString(postSqls))); long beginTime = System.currentTimeMillis(); for (String postSql : postSqls) { postSql = postSql.trim(); if (!postSql.endsWith(";")) { postSql = String.format("%s;", postSql); } OdpsUtil.runSqlTaskWithRetry(this.odps, postSql, "postSql"); } long endTime = System.currentTimeMillis(); LOG.info(String.format("Exectue odpswriter postSql successfully! cost time: %s ms.", (endTime - beginTime))); } LOG.info("truncated record count: {}", globalTotalTruncatedRecordNumber.intValue() ); } @Override public void destroy() { } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory .getLogger(Task.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OdpsWriter.class); private static final boolean IS_DEBUG = LOG.isDebugEnabled(); private Configuration sliceConfig; private Odps odps; private String projectName; private String tableName; private String tunnelServer; private String partition; private boolean emptyAsNull; private boolean isCompress; private TableTunnel.UploadSession managerUpload; private TableTunnel.UploadSession workerUpload; private String uploadId = null; private List blocks; private int blockSizeInMB; private boolean consistencyCommit; private int taskId; private int taskCount; private Integer failoverState = 0; //0 未failover 1准备failover 2已提交,不能failover private byte[] lock = new byte[0]; private List allColumns; /* * Partition 和 session 的对应关系,处理 record 时,路由到哪个分区,则通过对应的 proxy 上传 * Key 为 所有分区列的值按配置顺序拼接 */ private HashMap>> partitionUploadSessionHashMap; private Boolean supportDynamicPartition; private TableTunnel tableTunnel; private Table table; /** * 保存分区列格式转换规则,只支持源表是 Date 列,或者内容为日期的 String 列 */ private HashMap dateTransFormMap; private Long writeTimeOutInMs; private String overLengthRule; private int maxFieldLength; private Boolean enableOverLengthOutput; /** * 动态分区写入模式下,内存使用率达到80%则flush时间间隔,单位分钟 * 默认5分钟做flush, 避免出现频繁的flush导致小文件问题 */ private int dynamicPartitionMemUsageFlushIntervalInMinute = 1; private long latestFlushTime = 0; @Override public void init() { this.sliceConfig = super.getPluginJobConf(); // 默认十分钟超时时间 this.writeTimeOutInMs = this.sliceConfig.getLong(Key.WRITE_TIMEOUT_IN_MS, 10 * 60 * 1000); this.projectName = this.sliceConfig.getString(Key.PROJECT); this.tableName = this.sliceConfig.getString(Key.TABLE); this.tunnelServer = this.sliceConfig.getString(Key.TUNNEL_SERVER, null); this.partition = OdpsUtil.formatPartition(this.sliceConfig .getString(Key.PARTITION, ""), true); this.sliceConfig.set(Key.PARTITION, this.partition); this.emptyAsNull = this.sliceConfig.getBool(Key.EMPTY_AS_NULL); this.blockSizeInMB = this.sliceConfig.getInt(Key.BLOCK_SIZE_IN_MB); this.isCompress = this.sliceConfig.getBool(Key.IS_COMPRESS, false); if (this.blockSizeInMB < 1 || this.blockSizeInMB > 512) { throw DataXException.asDataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpswriter.3", this.blockSizeInMB)); } this.taskId = this.getTaskId(); this.taskCount = this.sliceConfig.getInt(Key.TASK_COUNT, 0); this.supportDynamicPartition = this.sliceConfig.getBool(Key.SUPPORT_DYNAMIC_PARTITION, false); if (!supportDynamicPartition) { this.consistencyCommit = this.sliceConfig.getBool(Key.CONSISTENCY_COMMIT, false); if (consistencyCommit) { this.uploadId = this.sliceConfig.getString(Key.UPLOAD_ID); if (this.uploadId == null || this.uploadId.isEmpty()) { throw DataXException.asDataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpswriter.3", this.uploadId)); } } } else { this.partitionUploadSessionHashMap = new HashMap<>(); // 根据 partColFormats 参数初始化 dateTransFormMap String dateTransListStr = this.sliceConfig.getString(Key.PARTITION_COL_MAPPING); if (StringUtils.isNotBlank(dateTransListStr)) { this.dateTransFormMap = new HashMap<>(); JSONArray dateTransFormJsonArray = JSONArray.parseArray(dateTransListStr); for (Object dateTransFormJson : dateTransFormJsonArray) { DateTransForm dateTransForm = new DateTransForm( ((JSONObject)dateTransFormJson).getString(Key.PARTITION_COL_MAPPING_NAME), ((JSONObject)dateTransFormJson).getString(Key.PARTITION_COL_MAPPING_SRC_COL_DATEFORMAT), ((JSONObject)dateTransFormJson).getString(Key.PARTITION_COL_MAPPING_DATEFORMAT)); this.dateTransFormMap.put(((JSONObject)dateTransFormJson).getString(Key.PARTITION_COL_MAPPING_NAME), dateTransForm); } } } this.allColumns = this.sliceConfig.getList("allColumns", String.class); this.overLengthRule = this.sliceConfig.getString(Key.OVER_LENGTH_RULE, "keepOn").toUpperCase(); this.maxFieldLength = this.sliceConfig.getInt(Key.MAX_FIELD_LENGTH, Constant.DEFAULT_FIELD_MAX_SIZE); this.enableOverLengthOutput = this.sliceConfig.getBool(Key.ENABLE_OVER_LENGTH_OUTPUT, true); maxOutputOverLengthRecord = this.sliceConfig.getLong(Key.MAX_OVER_LENGTH_OUTPUT_COUNT); maxOdpsFieldLength = this.sliceConfig.getInt(Key.MAX_ODPS_FIELD_LENGTH, Constant.DEFAULT_FIELD_MAX_SIZE); this.dynamicPartitionMemUsageFlushIntervalInMinute = this.sliceConfig.getInt(Key.DYNAMIC_PARTITION_MEM_USAGE_FLUSH_INTERVAL_IN_MINUTE, 1); if (IS_DEBUG) { LOG.debug("After init in task, sliceConfig now is:[\n{}\n].", this.sliceConfig); } } @Override public void prepare() { this.odps = OdpsUtil.initOdpsProject(this.sliceConfig); this.tableTunnel = new TableTunnel(this.odps); if (! supportDynamicPartition ) { if (StringUtils.isNoneBlank(tunnelServer)) { tableTunnel.setEndpoint(tunnelServer); } if (this.consistencyCommit) { this.managerUpload = OdpsUtil.getSlaveTunnelUpload(this.tableTunnel, this.projectName, this.tableName, this.partition, this.uploadId); } else { this.managerUpload = OdpsUtil.createMasterTunnelUpload(this.tableTunnel, this.projectName, this.tableName, this.partition); this.uploadId = this.managerUpload.getId(); } LOG.info("task uploadId:[{}].", this.uploadId); this.workerUpload = OdpsUtil.getSlaveTunnelUpload(this.tableTunnel, this.projectName, this.tableName, this.partition, uploadId); } else { this.table = OdpsUtil.getTable(this.odps, this.projectName, this.tableName); } } @Override public void startWrite(RecordReceiver recordReceiver) { blocks = new ArrayList(); List currentWriteBlocks; AtomicLong blockId = new AtomicLong(0); List columnPositions = this.sliceConfig.getList(Constant.COLUMN_POSITION, Integer.class); try { TaskPluginCollector taskPluginCollector = super.getTaskPluginCollector(); OdpsWriterProxy proxy; // 可以配置化,保平安 boolean checkWithGetSize = this.sliceConfig.getBool("checkWithGetSize", true); if (!supportDynamicPartition) { if (this.consistencyCommit) { proxy = new OdpsWriterProxy(this.workerUpload, this.blockSizeInMB, blockId, taskId, taskCount, columnPositions, taskPluginCollector, this.emptyAsNull, this.isCompress, checkWithGetSize, this.allColumns, this.writeTimeOutInMs, this.sliceConfig, this.overLengthRule, this.maxFieldLength, this.enableOverLengthOutput); } else { proxy = new OdpsWriterProxy(this.workerUpload, this.blockSizeInMB, blockId, columnPositions, taskPluginCollector, this.emptyAsNull, this.isCompress, checkWithGetSize, this.allColumns, false, this.writeTimeOutInMs, this.sliceConfig, this.overLengthRule, this.maxFieldLength, this.enableOverLengthOutput); } currentWriteBlocks = blocks; } else { proxy = null; currentWriteBlocks = null; } com.alibaba.datax.common.element.Record dataXRecord = null; PerfRecord blockClose = new PerfRecord(super.getTaskGroupId(), super.getTaskId(), PerfRecord.PHASE.ODPS_BLOCK_CLOSE); blockClose.start(); long blockCloseUsedTime = 0; boolean columnCntChecked = false; while ((dataXRecord = recordReceiver.getFromReader()) != null) { if (supportDynamicPartition) { if (!columnCntChecked) { // 动态分区模式下,读写两端的column数量必须相同 if (dataXRecord.getColumnNumber() != this.sliceConfig.getList(Key.COLUMN).size()) { throw DataXException.asDataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, "In dynamic partition write mode you must make sure reader and writer has same column count."); } columnCntChecked = true; } // 如果是动态分区模式,则需要根据record内容来选择proxy String partitionFormatType = sliceConfig.getString("partitionFormatType"); String partition; if("custom".equalsIgnoreCase(partitionFormatType)){ List partitions = getListWithJson(sliceConfig,"customPartitionColumns",PartitionInfo.class); List functions = getListWithJson(sliceConfig,"customPartitionFunctions",UserDefinedFunction.class); partition = CustomPartitionUtils.generate(dataXRecord,functions, partitions,sliceConfig.getList(Key.COLUMN, String.class)); }else{ partition = OdpsUtil.getPartColValFromDataXRecord(dataXRecord, columnPositions, this.sliceConfig.getList(Key.COLUMN, String.class), this.dateTransFormMap); partition = OdpsUtil.formatPartition(partition, false); } Pair> proxyBlocksPair = this.partitionUploadSessionHashMap.get(partition); if (null != proxyBlocksPair) { proxy = proxyBlocksPair.getLeft(); currentWriteBlocks = proxyBlocksPair.getRight(); if (null == proxy || null == currentWriteBlocks) { throw DataXException.asDataXException("Get OdpsWriterProxy failed."); } } else { /* * 第一次写入该目标分区:处理truncate * truncate 为 true,且还没有被truncate过,则truncate,加互斥锁 */ Boolean truncate = this.sliceConfig.getBool(Key.TRUNCATE); if (truncate && !partitionsDealedTruncate.contains(partition)) { synchronized (lockForPartitionDealedTruncate) { if (!partitionsDealedTruncate.contains(partition)) { LOG.info("Start to truncate partition {}", partition); OdpsUtil.dealTruncate(this.odps, this.table, partition, truncate); partitionsDealedTruncate.add(partition); } /* * 判断分区是否创建过多,如果创建过多,则报错 */ if (partitionCnt.addAndGet(1) > maxPartitionCnt) { throw new DataXException("Create too many partitions. Please make sure you config the right partition column"); } } } TableTunnel.UploadSession uploadSession = OdpsUtil.createMasterTunnelUpload(tableTunnel, this.projectName, this.tableName, partition); proxy = new OdpsWriterProxy(uploadSession, this.blockSizeInMB, blockId, columnPositions, taskPluginCollector, this.emptyAsNull, this.isCompress, checkWithGetSize, this.allColumns, true, this.writeTimeOutInMs, this.sliceConfig, this.overLengthRule, this.maxFieldLength, this.enableOverLengthOutput); currentWriteBlocks = new ArrayList<>(); partitionUploadSessionHashMap.put(partition, new MutablePair<>(proxy, currentWriteBlocks)); } } blockCloseUsedTime += proxy.writeOneRecord(dataXRecord, currentWriteBlocks); // 动态分区写入模式下,如果内存使用达到一定程度 80%,清理较久没有活动且缓存较多数据的分区 if (supportDynamicPartition) { boolean isNeedFush = checkIfNeedFlush(); if (isNeedFush) { LOG.info("====The memory used exceed 80%, start to clear...==="); int releaseCnt = 0; int remainCnt = 0; for (String onePartition : partitionUploadSessionHashMap.keySet()) { OdpsWriterProxy oneIdleProxy = partitionUploadSessionHashMap.get(onePartition) == null ? null : partitionUploadSessionHashMap.get(onePartition).getLeft(); if (oneIdleProxy == null) { continue; } Long idleTime = System.currentTimeMillis() - oneIdleProxy.getLastActiveTime(); if (idleTime > Constant.PROXY_MAX_IDLE_TIME_MS || oneIdleProxy.getCurrentTotalBytes() > (this.blockSizeInMB*1014*1024 / 2)) { // 如果空闲一定时间,先把数据写出 LOG.info("{} partition has no data last {} seconds, so release its uploadSession", onePartition, Constant.PROXY_MAX_IDLE_TIME_MS / 1000); currentWriteBlocks = partitionUploadSessionHashMap.get(onePartition).getRight(); blockCloseUsedTime += oneIdleProxy.writeRemainingRecord(currentWriteBlocks); // 再清除 partitionUploadSessionHashMap.put(onePartition, null); releaseCnt++; } else { remainCnt++; } } // 释放的不足够多,再释放一次,这次随机释放,直到释放数量达到一半 for (String onePartition : partitionUploadSessionHashMap.keySet()) { if (releaseCnt >= remainCnt) { break; } if (partitionUploadSessionHashMap.get(onePartition) != null) { OdpsWriterProxy oneIdleProxy = partitionUploadSessionHashMap.get(onePartition).getLeft(); currentWriteBlocks = partitionUploadSessionHashMap.get(onePartition).getRight(); blockCloseUsedTime += oneIdleProxy.writeRemainingRecord(currentWriteBlocks); partitionUploadSessionHashMap.put(onePartition, null); releaseCnt++; remainCnt--; } } this.latestFlushTime = System.currentTimeMillis(); LOG.info("===complete==="); } } } // 对所有分区进行剩余 records 写入 if (supportDynamicPartition) { for (String partition : partitionUploadSessionHashMap.keySet()) { if (partitionUploadSessionHashMap.get(partition) == null) { continue; } proxy = partitionUploadSessionHashMap.get(partition).getLeft(); currentWriteBlocks = partitionUploadSessionHashMap.get(partition).getRight(); blockCloseUsedTime += proxy.writeRemainingRecord(currentWriteBlocks); blockClose.end(blockCloseUsedTime); } } else { blockCloseUsedTime += proxy.writeRemainingRecord(blocks); blockClose.end(blockCloseUsedTime); } } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.WRITER_RECORD_FAIL, MESSAGE_SOURCE.message("odpswriter.4"), e); } } private boolean checkIfNeedFlush() { //检查是否到达flush时间,超过flush间隔时间 boolean isArriveFlushTime = (System.currentTimeMillis() - this.latestFlushTime) > this.dynamicPartitionMemUsageFlushIntervalInMinute * 60 * 1000; if (!isArriveFlushTime) { //如果flush时间没有到,直接return掉 return false; } MemoryUsage memoryUsage = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage(); boolean isMemUsageExceed = (double)memoryUsage.getUsed() / memoryUsage.getMax() > 0.8f; return isMemUsageExceed; } @Override public void post() { synchronized (lock) { if (failoverState == 0) { failoverState = 2; if (! supportDynamicPartition) { if (! this.consistencyCommit) { LOG.info("Slave which uploadId=[{}] begin to commit blocks:[\n{}\n].", this.uploadId, StringUtils.join(blocks, ",")); OdpsUtil.masterCompleteBlocks(this.managerUpload, blocks.toArray(new Long[0])); LOG.info("Slave which uploadId=[{}] commit blocks ok.", this.uploadId); } else { LOG.info("Slave which uploadId=[{}] begin to check blocks:[\n{}\n].", this.uploadId, StringUtils.join(blocks, ",")); OdpsUtil.checkBlockComplete(this.managerUpload, blocks.toArray(new Long[0])); LOG.info("Slave which uploadId=[{}] check blocks ok.", this.uploadId); } } else { for (String partition : partitionUploadSessionHashMap.keySet()) { OdpsWriterProxy proxy = partitionUploadSessionHashMap.get(partition).getLeft(); List blocks = partitionUploadSessionHashMap.get(partition).getRight(); TableTunnel.UploadSession uploadSession = proxy.getSlaveUpload(); LOG.info("Slave which uploadId=[{}] begin to check blocks:[\n{}\n].", uploadSession.getId(), StringUtils.join(blocks, ",")); OdpsUtil.masterCompleteBlocks(uploadSession, blocks.toArray(new Long[0])); LOG.info("Slave which uploadId=[{}] check blocks ok.", uploadSession.getId()); } } } else { throw DataXException.asDataXException(CommonErrorCode.SHUT_DOWN_TASK, ""); } } } @Override public void destroy() { } @Override public boolean supportFailOver() { synchronized (lock) { if (failoverState == 0) { failoverState = 1; return true; } return false; } } } } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter; import com.alibaba.datax.common.spi.ErrorCode; import com.alibaba.datax.common.util.MessageSource; public enum OdpsWriterErrorCode implements ErrorCode { REQUIRED_VALUE("OdpsWriter-00", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.required_value")), ILLEGAL_VALUE("OdpsWriter-01", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.illegal_value")), UNSUPPORTED_COLUMN_TYPE("OdpsWriter-02", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.unsupported_column_type")), TABLE_TRUNCATE_ERROR("OdpsWriter-03", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.table_truncate_error")), CREATE_MASTER_UPLOAD_FAIL("OdpsWriter-04", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.create_master_upload_fail")), GET_SLAVE_UPLOAD_FAIL("OdpsWriter-05", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.get_slave_upload_fail")), GET_ID_KEY_FAIL("OdpsWriter-06", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.get_id_key_fail")), GET_PARTITION_FAIL("OdpsWriter-07", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.get_partition_fail")), ADD_PARTITION_FAILED("OdpsWriter-08", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.add_partition_failed")), WRITER_RECORD_FAIL("OdpsWriter-09", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.writer_record_fail")), COMMIT_BLOCK_FAIL("OdpsWriter-10", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.commit_block_fail")), RUN_SQL_FAILED("OdpsWriter-11", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.run_sql_failed")), CHECK_IF_PARTITIONED_TABLE_FAILED("OdpsWriter-12", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.check_if_partitioned_table_failed")), RUN_SQL_ODPS_EXCEPTION("OdpsWriter-13", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.run_sql_odps_exception")), ACCOUNT_TYPE_ERROR("OdpsWriter-30", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.account_type_error")), PARTITION_ERROR("OdpsWriter-31", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.partition_error")), COLUMN_NOT_EXIST("OdpsWriter-32", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.column_not_exist")), ODPS_PROJECT_NOT_FOUNT("OdpsWriter-100", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.odps_project_not_fount")), //ODPS-0420111: Project not found ODPS_TABLE_NOT_FOUNT("OdpsWriter-101", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.odps_table_not_fount")), // ODPS-0130131:Table not found ODPS_ACCESS_KEY_ID_NOT_FOUND("OdpsWriter-102", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.odps_access_key_id_not_found")), //ODPS-0410051:Invalid credentials - accessKeyId not found ODPS_ACCESS_KEY_INVALID("OdpsWriter-103", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.odps_access_key_invalid")), //ODPS-0410042:Invalid signature value - User signature dose not match; ODPS_ACCESS_DENY("OdpsWriter-104", MessageSource.loadResourceBundle(OdpsWriterErrorCode.class).message("errorcode.odps_access_deny")) //ODPS-0420095: Access Denied - Authorization Failed [4002], You doesn't exist in project ; private final String code; private final String description; private OdpsWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/OdpsWriterProxy.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.plugin.writer.odpswriter.util.OdpsUtil; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import com.aliyun.odps.OdpsType; import com.aliyun.odps.TableSchema; import com.aliyun.odps.data.ArrayRecord; import com.aliyun.odps.data.Binary; import com.aliyun.odps.data.Char; import com.aliyun.odps.data.IntervalDayTime; import com.aliyun.odps.data.IntervalYearMonth; import com.aliyun.odps.data.Record; import com.aliyun.odps.data.SimpleStruct; import com.aliyun.odps.data.Struct; import com.aliyun.odps.data.Varchar; import com.aliyun.odps.tunnel.TableTunnel; import com.aliyun.odps.tunnel.TunnelException; import com.aliyun.odps.tunnel.io.ProtobufRecordPack; import com.aliyun.odps.type.ArrayTypeInfo; import com.aliyun.odps.type.CharTypeInfo; import com.aliyun.odps.type.MapTypeInfo; import com.aliyun.odps.type.StructTypeInfo; import com.aliyun.odps.type.TypeInfo; import com.aliyun.odps.type.VarcharTypeInfo; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.math.BigDecimal; import java.sql.Timestamp; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Calendar; import java.util.Date; import org.apache.commons.codec.binary.Base64; import org.apache.commons.lang3.StringUtils; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Set; import java.util.TimeZone; import java.util.concurrent.atomic.AtomicLong; public class OdpsWriterProxy { private static final Logger LOG = LoggerFactory.getLogger(OdpsWriterProxy.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OdpsWriterProxy.class); private volatile boolean printColumnLess;// 是否打印对于源头字段数小于 ODPS 目的表的行的日志 private TaskPluginCollector taskPluginCollector; private TableTunnel.UploadSession slaveUpload; private TableSchema schema; private int maxBufferSize; private ProtobufRecordPack protobufRecordPack; private int protobufCapacity; private AtomicLong blockId; private List columnPositions; private List tableOriginalColumnTypeList; private boolean emptyAsNull; private boolean isCompress; private int taskId; private int taskCOUNT; private boolean consistencyCommit = false; private boolean checkWithGetSize = true; private List allColumns; private String overLengthRule; private int maxFieldLength; private Boolean enableOverLengthOutput; /** * 记录最近一次活动时间,动态分区写入模式下,超过一定时间不活动,则关闭这个proxy */ private Long lastActiveTime; /** * 写block超时时间 */ private Long writeTimeoutInMs; private SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); // 读取 jvm 默认时区 private Calendar calendarForDate = null; private boolean useDateWithCalendar = true; private Calendar initCalendar(Configuration config) { // 理论上不会有其他选择,有配置化可以随时应急 String calendarType = config.getString("calendarType", "iso8601"); Boolean lenient = config.getBool("calendarLenient", true); // 默认jvm时区 TimeZone timeZone = TimeZone.getDefault(); String timeZoneStr = config.getString("calendarTimeZone"); if (StringUtils.isNotBlank(timeZoneStr)) { // 如果用户明确指定使用用户指定的 timeZone = TimeZone.getTimeZone(timeZoneStr); } Calendar calendarForDate = new Calendar.Builder().setCalendarType(calendarType).setLenient(lenient) .setTimeZone(timeZone).build(); return calendarForDate; } public OdpsWriterProxy(TableTunnel.UploadSession slaveUpload, int blockSizeInMB, AtomicLong blockId, List columnPositions, TaskPluginCollector taskPluginCollector, boolean emptyAsNull, boolean isCompress, boolean checkWithGetSize, List allColumns, boolean initBufSizeZero, Long writeTimeoutInMs, Configuration taskConfig, String overLengthRule, int maxFieldLength, Boolean enableOverLengthOutput) throws IOException, TunnelException { this.slaveUpload = slaveUpload; this.schema = this.slaveUpload.getSchema(); this.tableOriginalColumnTypeList = OdpsUtil.getTableOriginalColumnTypeList(this.schema); this.blockId = blockId; this.columnPositions = columnPositions; this.taskPluginCollector = taskPluginCollector; this.emptyAsNull = emptyAsNull; this.isCompress = isCompress; // 初始化与 buffer 区相关的值 this.maxBufferSize = (blockSizeInMB - 4) * 1024 * 1024; if (initBufSizeZero) { // 动态分区下初始化为0,随着写入的reord变多慢慢增加 this.protobufCapacity = 0; } else { this.protobufCapacity = blockSizeInMB * 1024 * 1024; } this.protobufRecordPack = new ProtobufRecordPack(this.schema, null, this.protobufCapacity); this.printColumnLess = true; this.checkWithGetSize = checkWithGetSize; this.allColumns = allColumns; this.overLengthRule = overLengthRule; this.maxFieldLength = maxFieldLength; this.enableOverLengthOutput = enableOverLengthOutput; this.writeTimeoutInMs = writeTimeoutInMs; this.calendarForDate = this.initCalendar(taskConfig); this.useDateWithCalendar = taskConfig.getBool("useDateWithCalendar", true); } public OdpsWriterProxy(TableTunnel.UploadSession slaveUpload, int blockSizeInMB, AtomicLong blockId, int taskId, int taskCount, List columnPositions, TaskPluginCollector taskPluginCollector, boolean emptyAsNull, boolean isCompress, boolean checkWithGetSize, List allColumns, Long writeTimeoutInMs, Configuration taskConfig, String overLengthRule, int maxFieldLength, Boolean enableOverLengthOutput) throws IOException, TunnelException { this.slaveUpload = slaveUpload; this.schema = this.slaveUpload.getSchema(); this.tableOriginalColumnTypeList = OdpsUtil.getTableOriginalColumnTypeList(this.schema); this.blockId = blockId; this.columnPositions = columnPositions; this.taskPluginCollector = taskPluginCollector; this.emptyAsNull = emptyAsNull; this.isCompress = isCompress; // 初始化与 buffer 区相关的值 this.maxBufferSize = (blockSizeInMB - 4) * 1024 * 1024; this.protobufCapacity = blockSizeInMB * 1024 * 1024; this.protobufRecordPack = new ProtobufRecordPack(this.schema, null, this.protobufCapacity); printColumnLess = true; this.taskId = taskId; this.taskCOUNT = taskCount; this.consistencyCommit = true; this.checkWithGetSize = checkWithGetSize; this.allColumns = allColumns; this.overLengthRule = overLengthRule; this.maxFieldLength = maxFieldLength; this.enableOverLengthOutput = enableOverLengthOutput; this.writeTimeoutInMs = writeTimeoutInMs; this.calendarForDate = this.initCalendar(taskConfig); this.useDateWithCalendar = taskConfig.getBool("useDateWithCalendar", true); } public long getCurrentBlockId() { if (this.consistencyCommit) { return this.taskId + this.taskCOUNT * (this.blockId.get()); } else { return this.blockId.get(); } } public TableTunnel.UploadSession getSlaveUpload() { return this.slaveUpload; } public long writeOneRecord(com.alibaba.datax.common.element.Record dataXRecord, List blocks) throws Exception { this.lastActiveTime = System.currentTimeMillis(); Record record = dataxRecordToOdpsRecord(dataXRecord); if (null == record) { return 0; } protobufRecordPack.append(record); if (protobufRecordPack.getProtobufStream().size() >= maxBufferSize) { long startTimeInNs = System.nanoTime(); OdpsUtil.slaveWriteOneBlock(this.slaveUpload, protobufRecordPack, getCurrentBlockId(), this.writeTimeoutInMs); LOG.info("write block {} ok.", getCurrentBlockId()); blocks.add(getCurrentBlockId()); protobufRecordPack.reset(); this.blockId.incrementAndGet(); return System.nanoTime() - startTimeInNs; } return 0; } public long writeRemainingRecord(List blocks) throws Exception { // complete protobuf stream, then write to http // protobufRecordPack.getTotalBytes() 慕明: getTotalBytes并不一定保证能拿到写入的字节数,按你们的逻辑应该是用getTotalBytesWritten // if (protobufRecordPack.getTotalBytes() != 0) { boolean hasRemindData = false; if (this.checkWithGetSize) { hasRemindData = protobufRecordPack.getSize() != 0; } else { hasRemindData = protobufRecordPack.getTotalBytes() != 0; } if (hasRemindData) { long startTimeInNs = System.nanoTime(); OdpsUtil.slaveWriteOneBlock(this.slaveUpload, protobufRecordPack, getCurrentBlockId(), this.writeTimeoutInMs); LOG.info("write block {} ok.", getCurrentBlockId()); blocks.add(getCurrentBlockId()); // reset the buffer for next block protobufRecordPack.reset(); return System.nanoTime() - startTimeInNs; } return 0; } public Record dataxRecordToOdpsRecord(com.alibaba.datax.common.element.Record dataXRecord) throws Exception { int sourceColumnCount = dataXRecord.getColumnNumber(); ArrayRecord odpsRecord = (ArrayRecord) slaveUpload.newRecord(); int userConfiguredColumnNumber = this.columnPositions.size(); if (sourceColumnCount > userConfiguredColumnNumber) { throw DataXException.asDataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpswriterproxy.1", sourceColumnCount, userConfiguredColumnNumber)); } else if (sourceColumnCount < userConfiguredColumnNumber) { if (printColumnLess) { LOG.warn(MESSAGE_SOURCE.message("odpswriterproxy.2", sourceColumnCount, userConfiguredColumnNumber)); } printColumnLess = false; } int currentIndex = 0; int sourceIndex = 0; try { com.alibaba.datax.common.element.Column columnValue; for (; sourceIndex < sourceColumnCount; sourceIndex++) { // 跳过分区列 if (this.columnPositions.get(sourceIndex) == -1) { continue; } currentIndex = columnPositions.get(sourceIndex); TypeInfo typeInfo = this.tableOriginalColumnTypeList.get(currentIndex); OdpsType type = typeInfo.getOdpsType(); String typeName = typeInfo.getTypeName(); columnValue = dataXRecord.getColumn(sourceIndex); if (columnValue == null) { continue; } // for compatible dt lib, "" as null if (this.emptyAsNull && columnValue instanceof StringColumn && "".equals(columnValue.asString())) { continue; } switch (type) { case STRING: String newValue = (String)OdpsUtil.processOverLengthData(columnValue.asString(), OdpsType.STRING, this.overLengthRule, this.maxFieldLength, this.enableOverLengthOutput); odpsRecord.setString(currentIndex, newValue); break; case BIGINT: odpsRecord.setBigint(currentIndex, columnValue.asLong()); break; case BOOLEAN: odpsRecord.setBoolean(currentIndex, columnValue.asBoolean()); break; case DATETIME: odpsRecord.setDatetime(currentIndex, columnValue.asDate()); // Date datetimeData = columnValue.asDate(); // if (null == datetimeData) { // odpsRecord.setDatetime(currentIndex, null); // } else { // Timestamp dateDataForOdps = new Timestamp(datetimeData.getTime()); // if (datetimeData instanceof java.sql.Timestamp) { // dateDataForOdps.setNanos(((java.sql.Timestamp)datetimeData).getNanos()); // } // odpsRecord.setDatetime(currentIndex, dateDataForOdps); // } break; case DATE: Date dateData = columnValue.asDate(); if (null == dateData) { odpsRecord.setDatetime(currentIndex, null); } else { if (this.useDateWithCalendar) { odpsRecord.setDate(currentIndex, new java.sql.Date(dateData.getTime()), this.calendarForDate); } else { odpsRecord.setDatetime(currentIndex, new java.sql.Date(dateData.getTime())); } } break; case DOUBLE: odpsRecord.setDouble(currentIndex, columnValue.asDouble()); break; case FLOAT: Double floatValue = columnValue.asDouble(); if (null == floatValue) { ((ArrayRecord) odpsRecord).setFloat(currentIndex, null); } else { ((ArrayRecord) odpsRecord).setFloat(currentIndex, floatValue.floatValue()); } break; case DECIMAL: odpsRecord.setDecimal(currentIndex, columnValue.asBigDecimal()); String columnStr = columnValue.asString(); if (columnStr != null && columnStr.indexOf(".") >= 36) { throw new Exception(MESSAGE_SOURCE.message("odpswriterproxy.3")); } break; case TINYINT: Long tinyintValueStr = columnValue.asLong(); if (null == tinyintValueStr) { ((ArrayRecord) odpsRecord).setTinyint(currentIndex, null); } else { ((ArrayRecord) odpsRecord).setTinyint(currentIndex, Byte.valueOf(String.valueOf(tinyintValueStr))); } break; case SMALLINT: Long smallIntValue = columnValue.asLong(); if (null == smallIntValue) { ((ArrayRecord) odpsRecord).setSmallint(currentIndex, null); } else { ((ArrayRecord) odpsRecord).setSmallint(currentIndex, smallIntValue.shortValue()); } break; case INT: Long intValue = columnValue.asLong(); if (null == intValue) { ((ArrayRecord) odpsRecord).setInt(currentIndex, null); } else { ((ArrayRecord) odpsRecord).setInt(currentIndex, intValue.intValue()); } break; case VARCHAR: // warn: columnValue.asString() 为 null 时 , odps sdk 有 BUG // 不能用 Varchar 的默认构造函数,不然有 NPE String varcharValueStr = columnValue.asString(); Varchar varcharData = null; if (varcharValueStr != null){ varcharData = new Varchar(columnValue.asString()); } ((ArrayRecord) odpsRecord).setVarchar(currentIndex, varcharData); break; case CHAR: String charValueStr = columnValue.asString(); Char charData = null; if (charValueStr != null ){ charData = new Char(charValueStr); } ((ArrayRecord) odpsRecord).setChar(currentIndex, charData); break; case TIMESTAMP: Date timestampData = columnValue.asDate(); if (null == timestampData) { ((ArrayRecord) odpsRecord).setTimestamp(currentIndex, null); } else { Timestamp timestampDataForOdps = new Timestamp(timestampData.getTime()); if (timestampData instanceof java.sql.Timestamp) { // 纳秒 timestampDataForOdps.setNanos(((java.sql.Timestamp)timestampData).getNanos()); } // warn优化:如果原来类型就是Timestamp,直接使用就少创建了一个对象 ((ArrayRecord) odpsRecord).setTimestamp(currentIndex, timestampDataForOdps); } break; case BINARY: Binary newBinaryData = (Binary)OdpsUtil.processOverLengthData(new Binary(columnValue.asBytes()), OdpsType.BINARY, this.overLengthRule, this.maxFieldLength, this.enableOverLengthOutput); ((ArrayRecord) odpsRecord).setBinary(currentIndex,columnValue.asBytes() == null ? null : newBinaryData); break; case ARRAY: JSONArray arrayJson = JSON.parseArray(columnValue.asString()); ((ArrayRecord) odpsRecord).setArray(currentIndex, parseArray(arrayJson, (ArrayTypeInfo) typeInfo)); break; case MAP: JSONObject mapJson = JSON.parseObject(columnValue.asString()); ((ArrayRecord) odpsRecord).setMap(currentIndex, parseMap(mapJson, (MapTypeInfo) typeInfo)); break; case STRUCT: JSONObject structJson = JSON.parseObject(columnValue.asString()); ((ArrayRecord) odpsRecord).setStruct(currentIndex, parseStruct(structJson, (StructTypeInfo) typeInfo)); break; default: break; } } return odpsRecord; } catch (Exception e) { String dirtyColumnName = ""; try { dirtyColumnName = this.allColumns.get(currentIndex); } catch (Exception ignoreEx) { // ignore } String message = MESSAGE_SOURCE.message("odpswriterproxy.4", sourceIndex, dirtyColumnName); this.taskPluginCollector.collectDirtyRecord(dataXRecord, e, message); return null; } } private List parseArray(JSONArray jsonArray, ArrayTypeInfo arrayTypeInfo) throws ParseException { if (null == jsonArray) { return null; } List result = new ArrayList(); switch (arrayTypeInfo.getElementTypeInfo().getOdpsType()) { case BIGINT: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getLong(i)); } return result; /** * 双精度浮点 */ case DOUBLE: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getDouble(i)); } return result; /** * 布尔型 */ case BOOLEAN: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getBoolean(i)); } return result; /** * 日期类型 */ case DATETIME: // TODO 精度 for (int i = 0; i < jsonArray.size(); i++) { result.add(dateFormat.parse(jsonArray.getString(i))); } return result; /** * 字符串类型 */ case STRING: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getString(i)); } return result; /** * 精确小数类型 */ case DECIMAL: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getBigDecimal(i)); } return result; /** * 1字节有符号整型 */ case TINYINT: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getByte(i)); } return result; /** * 2字节有符号整型 */ case SMALLINT: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getShort(i)); } return result; /** * 4字节有符号整型 */ case INT: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getInteger(i)); } return result; /** * 单精度浮点 */ case FLOAT: for (int i = 0; i < jsonArray.size(); i++) { result.add(jsonArray.getFloat(i)); } return result; /** * 固定长度字符串 */ case CHAR: for (int i = 0; i < jsonArray.size(); i++) { result.add(new Char(jsonArray.getString(i), ((CharTypeInfo) arrayTypeInfo.getElementTypeInfo()).getLength())); } return result; /** * 可变长度字符串 */ case VARCHAR: for (int i = 0; i < jsonArray.size(); i++) { result.add(new Varchar(jsonArray.getString(i), ((VarcharTypeInfo) arrayTypeInfo.getElementTypeInfo()).getLength())); } return result; /** * 时间类型 */ case DATE: // TODO string -> date need timezone // TODO how to use odps Record for (int i = 0; i < jsonArray.size(); i++) { result.add(java.sql.Date.valueOf(jsonArray.getString(i))); } return result; /** * 时间戳 */ case TIMESTAMP: for (int i = 0; i < jsonArray.size(); i++) { result.add(Timestamp.valueOf(jsonArray.getString(i))); } return result; /** * 字节数组 */ case BINARY: for (int i = 0; i < jsonArray.size(); i++) { result.add(Base64.decodeBase64(jsonArray.getString(i))); } return result; /** * 日期间隔 */ case INTERVAL_DAY_TIME: for (int i = 0; i < jsonArray.size(); i++) { JSONObject json = jsonArray.getJSONObject(i); result.add(new IntervalDayTime(json.getInteger("totalSeconds"), json.getInteger("nanos"))); } return result; /** * 年份间隔 */ case INTERVAL_YEAR_MONTH: for (int i = 0; i < jsonArray.size(); i++) { JSONObject json = jsonArray.getJSONObject(i); result.add(new IntervalYearMonth(json.getInteger("years"), json.getInteger("months"))); } return result; /** * 结构体 */ case STRUCT: for (int i = 0; i < jsonArray.size(); i++) { result.add( parseStruct(jsonArray.getJSONObject(i), (StructTypeInfo) arrayTypeInfo.getElementTypeInfo())); } return result; /** * MAP类型 */ case MAP: for (int i = 0; i < jsonArray.size(); i++) { result.add(parseMap(jsonArray.getJSONObject(i), (MapTypeInfo) arrayTypeInfo.getElementTypeInfo())); } return result; /** * ARRAY类型 */ case ARRAY: for (int i = 0; i < jsonArray.size(); i++) { result.add(parseArray(jsonArray.getJSONArray(i), (ArrayTypeInfo) arrayTypeInfo.getElementTypeInfo())); } return result; default: return result; } } private Map parseMap(JSONObject json, MapTypeInfo typeInfo) throws ParseException { if (json == null) { return null; } Map keyMap = new HashMap(); Set keys = json.keySet(); switch (typeInfo.getKeyTypeInfo().getOdpsType()) { case BIGINT: for (String item : keys) { keyMap.put(Long.parseLong(item), item); } break; /** * 双精度浮点 */ case DOUBLE: for (String item : keys) { keyMap.put(Double.parseDouble(item), item); } break; /** * 布尔型 */ case BOOLEAN: for (String item : keys) { keyMap.put(Boolean.parseBoolean(item), item); } break; /** * 日期类型 */ case DATETIME: // TODO 精度 for (String item : keys) { keyMap.put(dateFormat.parse(item), item); } break; /** * 字符串类型 */ case STRING: for (String item : keys) { keyMap.put(item, item); } break; /** * 精确小数类型 */ case DECIMAL: for (String item : keys) { keyMap.put(new BigDecimal(item), item); } break; /** * 1字节有符号整型 */ case TINYINT: for (String item : keys) { keyMap.put(Byte.parseByte(item), item); } break; /** * 2字节有符号整型 */ case SMALLINT: for (String item : keys) { keyMap.put(Short.parseShort(item), item); } break; /** * 4字节有符号整型 */ case INT: for (String item : keys) { keyMap.put(Integer.parseInt(item), item); } break; /** * 单精度浮点 */ case FLOAT: for (String item : keys) { keyMap.put(Float.parseFloat(item), item); } break; /** * 固定长度字符串 */ case CHAR: for (String item : keys) { keyMap.put(new Char(item, ((CharTypeInfo) typeInfo.getKeyTypeInfo()).getLength()), item); } break; /** * 可变长度字符串 */ case VARCHAR: for (String item : keys) { keyMap.put(new Varchar(item, ((VarcharTypeInfo) typeInfo.getKeyTypeInfo()).getLength()), item); } break; /** * 时间类型 */ case DATE: // TODO string -> date need timezone // TODO how to use odps Record for (String item : keys) { keyMap.put(java.sql.Date.valueOf(item), item); } break; /** * 时间戳 */ case TIMESTAMP: for (String item : keys) { keyMap.put(Timestamp.valueOf(item), item); } break; /** * 字节数组 */ case BINARY: for (String item : keys) { keyMap.put(new Binary(Base64.decodeBase64(item)), item); } break; /** * 日期间隔 */ case INTERVAL_DAY_TIME: for (String item : keys) { JSONObject jsonObject = JSON.parseObject(item); keyMap.put(new IntervalDayTime(jsonObject.getInteger("totalSeconds"), jsonObject.getInteger("nanos")), item); } break; /** * 年份间隔 */ case INTERVAL_YEAR_MONTH: for (String item : keys) { JSONObject jsonObject = JSON.parseObject(item); keyMap.put(new IntervalYearMonth(jsonObject.getInteger("years"), jsonObject.getInteger("months")), item); } break; default: break; // TODO throw an exception } Map result = new HashMap(); // process map value switch (typeInfo.getValueTypeInfo().getOdpsType()) { case BIGINT: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getLong(item.getValue())); } return result; /** * 双精度浮点 */ case DOUBLE: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getDouble(item.getValue())); } return result; /** * 布尔型 */ case BOOLEAN: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getBoolean(item.getValue())); } return result; /** * 日期类型 */ case DATETIME: // TODO 精度 for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), dateFormat.parse(json.getString(item.getValue()))); } return result; /** * 字符串类型 */ case STRING: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getString(item.getValue())); } return result; /** * 精确小数类型 */ case DECIMAL: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getBigDecimal(item.getValue())); } return result; /** * 1字节有符号整型 */ case TINYINT: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getByte(item.getValue())); } return result; /** * 2字节有符号整型 */ case SMALLINT: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getShort(item.getValue())); } return result; /** * 4字节有符号整型 */ case INT: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getInteger(item.getValue())); } return result; /** * 单精度浮点 */ case FLOAT: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), json.getFloat(item.getValue())); } return result; /** * 固定长度字符串 */ case CHAR: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), new Char(json.getString(item.getValue()), ((CharTypeInfo) typeInfo.getValueTypeInfo()).getLength())); } return result; /** * 可变长度字符串 */ case VARCHAR: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), new Varchar(json.getString(item.getValue()), ((VarcharTypeInfo) typeInfo.getValueTypeInfo()).getLength())); } return result; /** * 时间类型 */ case DATE: // TODO string -> date need timezone // TODO how to use odps Record for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), java.sql.Date.valueOf(json.getString(item.getValue()))); } return result; /** * 时间戳 */ case TIMESTAMP: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), Timestamp.valueOf(json.getString(item.getValue()))); } return result; /** * 字节数组 */ case BINARY: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), new Binary(Base64.decodeBase64(json.getString(item.getValue())))); } return result; /** * 日期间隔 */ case INTERVAL_DAY_TIME: for (Map.Entry item : keyMap.entrySet()) { JSONObject jsonObject = json.getJSONObject(item.getValue()); result.put(item.getKey(), new IntervalDayTime(jsonObject.getInteger("totalSeconds"), jsonObject.getInteger("nanos"))); } return result; /** * 年份间隔 */ case INTERVAL_YEAR_MONTH: for (Map.Entry item : keyMap.entrySet()) { JSONObject jsonObject = json.getJSONObject(item.getValue()); result.put(item.getKey(), new IntervalYearMonth(jsonObject.getInteger("years"), jsonObject.getInteger("months"))); } return result; /** * 结构体 */ case STRUCT: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), parseStruct(json.getJSONObject(item.getValue()), (StructTypeInfo) typeInfo.getValueTypeInfo())); } return result; /** * MAP类型 */ case MAP: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), parseMap(json.getJSONObject(item.getValue()), (MapTypeInfo) typeInfo.getValueTypeInfo())); } return result; /** * ARRAY类型 */ case ARRAY: for (Map.Entry item : keyMap.entrySet()) { result.put(item.getKey(), parseArray(json.getJSONArray(item.getValue()), (ArrayTypeInfo) typeInfo.getValueTypeInfo())); } return result; default: throw new IllegalArgumentException("decode record failed. column type: " + typeInfo.getTypeName()); } } public Struct parseStruct(JSONObject json, StructTypeInfo struct) throws ParseException { if (null == json) { return null; } List fieldNames = struct.getFieldNames(); List typeInfos = struct.getFieldTypeInfos(); List structValues = new ArrayList(); for (int i = 0; i < fieldNames.size(); i++) { String fieldName = fieldNames.get(i); switch (typeInfos.get(i).getOdpsType()) { case BIGINT: structValues.add(json.getLong(fieldName)); break; /** * 双精度浮点 */ case DOUBLE: structValues.add(json.getDouble(fieldName)); break; /** * 布尔型 */ case BOOLEAN: structValues.add(json.getBoolean(fieldName)); break; /** * 日期类型 */ case DATETIME: // TODO 精度 structValues.add(dateFormat.parse(json.getString(fieldName))); break; /** * 字符串类型 */ case STRING: structValues.add(json.getString(fieldName)); break; /** * 精确小数类型 */ case DECIMAL: structValues.add(json.getBigDecimal(fieldName)); break; /** * 1字节有符号整型 */ case TINYINT: structValues.add(json.getByte(fieldName)); break; /** * 2字节有符号整型 */ case SMALLINT: structValues.add(json.getShort(fieldName)); break; /** * 4字节有符号整型 */ case INT: structValues.add(json.getInteger(fieldName)); break; /** * 单精度浮点 */ case FLOAT: structValues.add(json.getFloat(fieldName)); break; /** * 固定长度字符串 */ case CHAR: structValues.add(new Char(json.getString(fieldName), ((CharTypeInfo) typeInfos.get(i)).getLength())); break; /** * 可变长度字符串 */ case VARCHAR: structValues .add(new Varchar(json.getString(fieldName), ((VarcharTypeInfo) typeInfos.get(i)).getLength())); break; /** * 时间类型 */ case DATE: // TODO string -> date need timezone // TODO how to use odps Record structValues.add(java.sql.Date.valueOf(json.getString(fieldName))); break; /** * 时间戳 */ case TIMESTAMP: structValues.add(Timestamp.valueOf(json.getString(fieldName))); break; /** * 字节数组 */ case BINARY: structValues.add(Base64.decodeBase64(json.getString(fieldName))); break; /** * 日期间隔 */ case INTERVAL_DAY_TIME: // TODO special process as map object structValues.add(new IntervalDayTime(json.getInteger("totalSeconds"), json.getInteger("nanos"))); /** * 年份间隔 */ case INTERVAL_YEAR_MONTH: structValues.add(new IntervalYearMonth(json.getInteger("years"), json.getInteger("months"))); /** * 结构体 */ case STRUCT: structValues.add(parseStruct(json.getJSONObject(fieldName), (StructTypeInfo) typeInfos.get(i))); break; /** * MAP类型 */ case MAP: structValues.add(parseMap(json.getJSONObject(fieldName), (MapTypeInfo) typeInfos.get(i))); break; /** * ARRAY类型 */ case ARRAY: structValues.add(parseArray(json.getJSONArray(fieldName), (ArrayTypeInfo) typeInfos.get(i))); break; } } SimpleStruct simpleStruct = new SimpleStruct(struct, structValues); return simpleStruct; } public Long getLastActiveTime() { return lastActiveTime; } public void setLastActiveTime(Long lastActiveTime) { this.lastActiveTime = lastActiveTime; } public Long getCurrentTotalBytes() throws IOException { return this.protobufRecordPack.getTotalBytes(); } } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/model/PartitionInfo.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter.model; public class PartitionInfo { /** * 字段名 */ private String name; /** * String */ private String type; /** * eventTime or function * yyyy/MM/dd/HH/mm * 可自定义组合 */ private String valueMode; private String value; private String comment; /** * 自定义分区有效 * eventTime / constant * function */ private String category; /** * 当 partitionType 为function时 * functionExpression 为 valueMode 对应的expression */ private String functionExpression; public String getFunctionExpression() { return functionExpression; } public void setFunctionExpression(String functionExpression) { this.functionExpression = functionExpression; } public String getCategory() { return category; } public void setCategory(String category) { this.category = category; } public String getComment() { return comment; } public void setComment(String comment) { this.comment = comment; } public String getType() { return type; } public void setType(String type) { this.type = type; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getValueMode() { return valueMode; } public void setValueMode(String valueMode) { this.valueMode = valueMode; } public String getValue() { return value; } public void setValue(String value) { this.value = value; } } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/model/UserDefinedFunction.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter.model; import java.io.Serializable; import java.util.List; public class UserDefinedFunction implements Serializable { private static final long serialVersionUID = 1L; private String name; private String expression; private String inputColumn; private List variableRule; public String getName() { return name; } public void setName(String name) { this.name = name; } public String getExpression() { return expression; } public void setExpression(String expression) { this.expression = expression; } public String getInputColumn() { return inputColumn; } public void setInputColumn(String inputColumn) { this.inputColumn = inputColumn; } public List getVariableRule() { return variableRule; } public void setVariableRule(List variableRule) { this.variableRule = variableRule; } } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/model/UserDefinedFunctionRule.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter.model; import java.io.Serializable; import java.util.List; public class UserDefinedFunctionRule implements Serializable { private static final long serialVersionUID = 1L; private String type; private List params; public String getType() { return type; } public void setType(String type) { this.type = type; } public List getParams() { return params; } public void setParams(List params) { this.params = params; } } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/CustomPartitionUtils.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter.util; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.odpswriter.model.PartitionInfo; import com.alibaba.datax.plugin.writer.odpswriter.model.UserDefinedFunction; import com.alibaba.fastjson2.JSON; import com.google.common.base.Joiner; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.Serializable; import java.util.*; import java.util.stream.Collectors; public class CustomPartitionUtils implements Serializable { private static final long serialVersionUID = 1L; protected static Logger logger = LoggerFactory.getLogger(CustomPartitionUtils.class); public static List getListWithJson(Configuration config, String path, Class clazz) { Object object = config.get(path, List.class); if (null == object) { return null; } return JSON.parseArray(JSON.toJSONString(object), clazz); } public static String generate(Record record, List functions, List partitions, List allColumns) { for (PartitionInfo partitionInfo : partitions) { partitionInfo.setValue(buildPartitionValue(partitionInfo, functions, record, allColumns)); } List partitionList = partitions.stream() .map(item -> String.format("%s='%s'", item.getName(), item.getValue())) .collect(Collectors.toList()); return Joiner.on(",").join(partitionList); } private static String buildPartitionValue(PartitionInfo partitionInfo, List functions, Record record, List allColumns) { // logger.info("try build partition value:partitionInfo:\n{},functions:\n{}", // JSON.toJSONString(partitionInfo), JSON.toJSONString(functions)); if (StringUtils.isBlank(partitionInfo.getCategory()) || "eventTime".equalsIgnoreCase(partitionInfo.getCategory()) || "constant".equalsIgnoreCase(partitionInfo.getCategory())) { // 直接输出原样字符串 return partitionInfo.getValueMode(); // throw new RuntimeException("not support partition category:" + partitionInfo.getCategory()); } throw new RuntimeException("un support partition info type:" + partitionInfo.getCategory()); } } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/LocalStrings.properties ================================================ descipher.1=\u957f\u5ea6\u4e0d\u662f\u5076\u6570 idandkeyutil.1=\u4ece\u73af\u5883\u53d8\u91cf\u4e2d\u83b7\u53d6accessId/accessKey \u5931\u8d25, accessId=[{0}] idandkeyutil.2=\u65e0\u6cd5\u83b7\u53d6\u5230accessId/accessKey. \u5b83\u4eec\u65e2\u4e0d\u5b58\u5728\u4e8e\u60a8\u7684\u914d\u7f6e\u4e2d\uff0c\u4e5f\u4e0d\u5b58\u5728\u4e8e\u73af\u5883\u53d8\u91cf\u4e2d. odpsutil.1=\u60a8\u672a\u914d\u7f6e\u5199\u5165 ODPS \u76ee\u7684\u8868\u7684\u5217\u4fe1\u606f. \u6b63\u786e\u7684\u914d\u7f6e\u65b9\u5f0f\u662f\u7ed9datax\u7684 column \u9879\u914d\u7f6e\u4e0a\u60a8\u9700\u8981\u8bfb\u53d6\u7684\u5217\u540d\u79f0,\u7528\u82f1\u6587\u9017\u53f7\u5206\u9694 \u4f8b\u5982: \"column\": [\"id\",\"name\"]. odpsutil.2=[truncate]\u662f\u5fc5\u586b\u914d\u7f6e\u9879, \u610f\u601d\u662f\u5199\u5165 ODPS \u76ee\u7684\u8868\u524d\u662f\u5426\u6e05\u7a7a\u8868/\u5206\u533a. \u8bf7\u60a8\u589e\u52a0 truncate \u7684\u914d\u7f6e\uff0c\u6839\u636e\u4e1a\u52a1\u9700\u8981\u9009\u62e9\u4e0atrue \u6216\u8005 false. odpsutil.3=\u60a8\u6240\u914d\u7f6e\u7684maxRetryTime \u503c\u9519\u8bef. \u8be5\u503c\u4e0d\u80fd\u5c0f\u4e8e1, \u4e14\u4e0d\u80fd\u5927\u4e8e {0}. \u63a8\u8350\u7684\u914d\u7f6e\u65b9\u5f0f\u662f\u7ed9maxRetryTime \u914d\u7f6e1-11\u4e4b\u95f4\u7684\u67d0\u4e2a\u503c. \u8bf7\u60a8\u68c0\u67e5\u914d\u7f6e\u5e76\u505a\u51fa\u76f8\u5e94\u4fee\u6539. odpsutil.4=\u4e0d\u652f\u6301\u7684\u8d26\u53f7\u7c7b\u578b:[{0}]. \u8d26\u53f7\u7c7b\u578b\u76ee\u524d\u4ec5\u652f\u6301aliyun, taobao. odpsutil.5=\u83b7\u53d6 ODPS \u76ee\u7684\u8868:{0} \u7684\u6240\u6709\u5206\u533a\u5931\u8d25. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.6=\u68c0\u67e5 ODPS \u76ee\u7684\u8868:{0} \u662f\u5426\u4e3a\u5206\u533a\u8868\u5931\u8d25, \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.7=\u6e05\u7a7a ODPS \u76ee\u7684\u8868:{0} \u5931\u8d25, \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.8=\u6dfb\u52a0 ODPS \u76ee\u7684\u8868\u7684\u5206\u533a\u5931\u8d25. \u9519\u8bef\u53d1\u751f\u5728\u6dfb\u52a0 ODPS \u7684\u9879\u76ee:{0} \u7684\u8868:{1} \u7684\u5206\u533a:{2}. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.9=\u521b\u5efaTunnelUpload\u5931\u8d25. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.10=\u521b\u5efaTunnelUpload\u5931\u8d25. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.11=\u83b7\u53d6TunnelUpload\u5931\u8d25. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.12=\u83b7\u53d6TunnelUpload\u5931\u8d25. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.13=Drop ODPS \u76ee\u7684\u8868\u5206\u533a\u5931\u8d25. \u9519\u8bef\u53d1\u751f\u5728\u9879\u76ee:{0} \u7684\u8868:{1} \u7684\u5206\u533a:{2} .\u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.14=ODPS \u76ee\u7684\u8868\u81ea\u8eab\u7684 partition:{0} \u683c\u5f0f\u4e0d\u5bf9. \u6b63\u786e\u7684\u683c\u5f0f\u5f62\u5982: pt=1,ds=hangzhou odpsutil.15=ODPS \u76ee\u7684\u8868\u5728\u8fd0\u884c ODPS SQL\u5931\u8d25, \u8fd4\u56de\u503c\u4e3a:{0}. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. SQL \u5185\u5bb9\u4e3a:[\n{1}\n]. odpsutil.16=ODPS \u76ee\u7684\u8868\u5728\u8fd0\u884c ODPS SQL \u65f6\u629b\u51fa\u5f02\u5e38, \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. SQL \u5185\u5bb9\u4e3a:[\n{0}\n]. odpsutil.17=ODPS \u76ee\u7684\u8868\u5728\u63d0\u4ea4 block:[\n{0}\n] \u65f6\u5931\u8d25, uploadId=[{1}]. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.18=ODPS \u76ee\u7684\u8868\u5199 block:{0} \u5931\u8d25\uff0c uploadId=[{1}]. \u8bf7\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5904\u7406. odpsutil.19=ODPS \u76ee\u7684\u8868\u7684\u5217\u914d\u7f6e\u9519\u8bef. \u7531\u4e8e\u60a8\u6240\u914d\u7f6e\u7684\u5217:{0} \u4e0d\u5b58\u5728\uff0c\u4f1a\u5bfc\u81f4datax\u65e0\u6cd5\u6b63\u5e38\u63d2\u5165\u6570\u636e\uff0c\u8bf7\u68c0\u67e5\u8be5\u5217\u662f\u5426\u5b58\u5728\uff0c\u5982\u679c\u5b58\u5728\u8bf7\u68c0\u67e5\u5927\u5c0f\u5199\u7b49\u914d\u7f6e. odpsutil.20=DataX \u5199\u5165 ODPS \u8868\u4e0d\u652f\u6301\u8be5\u5b57\u6bb5\u7c7b\u578b:[{0}]. \u76ee\u524d\u652f\u6301\u62bd\u53d6\u7684\u5b57\u6bb5\u7c7b\u578b\u6709\uff1abigint, boolean, datetime, double, string. \u60a8\u53ef\u4ee5\u9009\u62e9\u4e0d\u62bd\u53d6 DataX \u4e0d\u652f\u6301\u7684\u5b57\u6bb5\u6216\u8005\u8054\u7cfb ODPS \u7ba1\u7406\u5458\u5bfb\u6c42\u5e2e\u52a9. odpsutil.21=\u60a8\u6ca1\u6709\u914d\u7f6e\u5206\u533a\u4fe1\u606f\uff0c\u56e0\u4e3a\u4f60\u914d\u7f6e\u7684\u8868\u662f\u5206\u533a\u8868:{0} \u5982\u679c\u9700\u8981\u8fdb\u884c truncate \u64cd\u4f5c\uff0c\u5fc5\u987b\u6307\u5b9a\u9700\u8981\u6e05\u7a7a\u7684\u5177\u4f53\u5206\u533a. \u8bf7\u4fee\u6539\u5206\u533a\u914d\u7f6e\uff0c\u683c\u5f0f\u5f62\u5982 pt=$'{bizdate'} . odpsutil.22=\u5206\u533a\u4fe1\u606f\u914d\u7f6e\u9519\u8bef\uff0c\u4f60\u7684ODPS\u8868\u662f\u975e\u5206\u533a\u8868:{0} \u8fdb\u884c truncate \u64cd\u4f5c\u65f6\u4e0d\u9700\u8981\u6307\u5b9a\u5177\u4f53\u5206\u533a\u503c. \u8bf7\u68c0\u67e5\u60a8\u7684\u5206\u533a\u914d\u7f6e\uff0c\u5220\u9664\u8be5\u914d\u7f6e\u9879\u7684\u503c. odpsutil.23=\u60a8\u7684\u76ee\u7684\u8868\u662f\u5206\u533a\u8868\uff0c\u5199\u5165\u5206\u533a\u8868:{0} \u65f6\u5fc5\u987b\u6307\u5b9a\u5177\u4f53\u5206\u533a\u503c. \u8bf7\u4fee\u6539\u60a8\u7684\u5206\u533a\u914d\u7f6e\u4fe1\u606f\uff0c\u683c\u5f0f\u5f62\u5982 \u683c\u5f0f\u5f62\u5982 pt=$'{bizdate'}. odpsutil.24=\u60a8\u7684\u76ee\u7684\u8868\u662f\u975e\u5206\u533a\u8868\uff0c\u5199\u5165\u975e\u5206\u533a\u8868:{0} \u65f6\u4e0d\u9700\u8981\u6307\u5b9a\u5177\u4f53\u5206\u533a\u503c. \u8bf7\u5220\u9664\u5206\u533a\u914d\u7f6e\u4fe1\u606f odpsutil.25=\u60a8\u6ca1\u6709\u914d\u7f6e\u5206\u533a\u4fe1\u606f\uff0c\u56e0\u4e3a\u4f60\u914d\u7f6e\u7684\u8868\u662f\u5206\u533a\u8868:{0} \u5982\u679c\u9700\u8981\u8fdb\u884c truncate \u64cd\u4f5c\uff0c\u5fc5\u987b\u6307\u5b9a\u9700\u8981\u6e05\u7a7a\u7684\u5177\u4f53\u5206\u533a. \u8bf7\u4fee\u6539\u5206\u533a\u914d\u7f6e\uff0c\u683c\u5f0f\u5f62\u5982 pt=$'{bizdate'} . odpsutil.26=\u5206\u533a\u4fe1\u606f\u914d\u7f6e\u9519\u8bef\uff0c\u4f60\u7684ODPS\u8868\u662f\u975e\u5206\u533a\u8868:{0} \u8fdb\u884c truncate \u64cd\u4f5c\u65f6\u4e0d\u9700\u8981\u6307\u5b9a\u5177\u4f53\u5206\u533a\u503c. \u8bf7\u68c0\u67e5\u60a8\u7684\u5206\u533a\u914d\u7f6e\uff0c\u5220\u9664\u8be5\u914d\u7f6e\u9879\u7684\u503c. odpsutil.27=\u60a8\u7684\u76ee\u7684\u8868\u662f\u5206\u533a\u8868\uff0c\u5199\u5165\u5206\u533a\u8868:{0} \u65f6\u5fc5\u987b\u6307\u5b9a\u5177\u4f53\u5206\u533a\u503c. \u8bf7\u4fee\u6539\u60a8\u7684\u5206\u533a\u914d\u7f6e\u4fe1\u606f\uff0c\u683c\u5f0f\u5f62\u5982 \u683c\u5f0f\u5f62\u5982 pt=$'{bizdate'}. odpsutil.28=\u60a8\u7684\u76ee\u7684\u8868\u662f\u975e\u5206\u533a\u8868\uff0c\u5199\u5165\u975e\u5206\u533a\u8868:{0} \u65f6\u4e0d\u9700\u8981\u6307\u5b9a\u5177\u4f53\u5206\u533a\u503c. \u8bf7\u5220\u9664\u5206\u533a\u914d\u7f6e\u4fe1\u606f odpsutil.29=\u52a0\u8f7d ODPS \u76ee\u7684\u8868:{0} \u5931\u8d25. \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684 ODPS \u76ee\u7684\u8868\u7684 [project] \u662f\u5426\u6b63\u786e. odpsutil.30=\u52a0\u8f7d ODPS \u76ee\u7684\u8868:{0} \u5931\u8d25. \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684 ODPS \u76ee\u7684\u8868\u7684 [table] \u662f\u5426\u6b63\u786e. odpsutil.31=\u52a0\u8f7d ODPS \u76ee\u7684\u8868:{0} \u5931\u8d25. \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684 ODPS \u76ee\u7684\u8868\u7684 [accessId] [accessKey]\u662f\u5426\u6b63\u786e. odpsutil.32=\u52a0\u8f7d ODPS \u76ee\u7684\u8868:{0} \u5931\u8d25. \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684 ODPS \u76ee\u7684\u8868\u7684 [accessKey] \u662f\u5426\u6b63\u786e. odpsutil.33=\u52a0\u8f7d ODPS \u76ee\u7684\u8868:{0} \u5931\u8d25. \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684 ODPS \u76ee\u7684\u8868\u7684 [accessId] [accessKey] [project]\u662f\u5426\u5339\u914d. odpsutil.34=\u52a0\u8f7d ODPS \u76ee\u7684\u8868:{0} \u5931\u8d25. \u8bf7\u68c0\u67e5\u60a8\u914d\u7f6e\u7684 ODPS \u76ee\u7684\u8868\u7684 project,table,accessId,accessKey,odpsServer\u7b49\u503c. ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/OdpsExceptionMsg.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter.util; public class OdpsExceptionMsg { public static final String ODPS_PROJECT_NOT_FOUNT = "ODPS-0420111: Project not found"; public static final String ODPS_TABLE_NOT_FOUNT = "ODPS-0130131:Table not found"; public static final String ODPS_ACCESS_KEY_ID_NOT_FOUND = "ODPS-0410051:Invalid credentials - accessKeyId not found"; public static final String ODPS_ACCESS_KEY_INVALID = "ODPS-0410042:Invalid signature value - User signature dose not match"; public static final String ODPS_ACCESS_DENY = "ODPS-0420095: Access Denied - Authorization Failed [4002], You doesn't exist in project"; } ================================================ FILE: odpswriter/src/main/java/com/alibaba/datax/plugin/writer/odpswriter/util/OdpsUtil.java ================================================ package com.alibaba.datax.plugin.writer.odpswriter.util; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.MessageSource; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.writer.odpswriter.*; import com.aliyun.odps.*; import com.aliyun.odps.Column; import com.aliyun.odps.account.Account; import com.aliyun.odps.account.AliyunAccount; import com.aliyun.odps.data.ResultSet; import com.aliyun.odps.data.Binary; import com.aliyun.odps.task.SQLTask; import com.aliyun.odps.tunnel.TableTunnel; import com.aliyun.odps.tunnel.io.ProtobufRecordPack; import com.aliyun.odps.tunnel.io.TunnelRecordWriter; import com.aliyun.odps.type.TypeInfo; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.time.DateFormatUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.SimpleDateFormat; import java.util.*; import java.util.concurrent.Callable; public class OdpsUtil { private static final Logger LOG = LoggerFactory.getLogger(OdpsUtil.class); private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OdpsUtil.class); public static int MAX_RETRY_TIME = 10; public static void checkNecessaryConfig(Configuration originalConfig) { originalConfig.getNecessaryValue(Key.ODPS_SERVER, OdpsWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.PROJECT, OdpsWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.TABLE, OdpsWriterErrorCode.REQUIRED_VALUE); if (null == originalConfig.getList(Key.COLUMN) || originalConfig.getList(Key.COLUMN, String.class).isEmpty()) { throw DataXException.asDataXException(OdpsWriterErrorCode.REQUIRED_VALUE, MESSAGE_SOURCE.message("odpsutil.1")); } // getBool 内部要求,值只能为 true,false 的字符串(大小写不敏感),其他一律报错,不再有默认配置 // 如果是动态分区写入,不进行truncate Boolean truncate = originalConfig.getBool(Key.TRUNCATE); if (null == truncate) { throw DataXException.asDataXException(OdpsWriterErrorCode.REQUIRED_VALUE, MESSAGE_SOURCE.message("odpsutil.2")); } } public static void dealMaxRetryTime(Configuration originalConfig) { int maxRetryTime = originalConfig.getInt(Key.MAX_RETRY_TIME, OdpsUtil.MAX_RETRY_TIME); if (maxRetryTime < 1 || maxRetryTime > OdpsUtil.MAX_RETRY_TIME) { throw DataXException.asDataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpsutil.3", OdpsUtil.MAX_RETRY_TIME)); } MAX_RETRY_TIME = maxRetryTime; } public static String formatPartition(String partitionString, Boolean printLog) { if (null == partitionString) { return null; } String parsedPartition = partitionString.trim().replaceAll(" *= *", "=").replaceAll(" */ *", ",") .replaceAll(" *, *", ",").replaceAll("'", ""); if (printLog) { LOG.info("format partition with rules: remove all space; remove all '; replace / to ,"); LOG.info("original partiton {} parsed partition {}", partitionString, parsedPartition); } return parsedPartition; } public static Odps initOdpsProject(Configuration originalConfig) { String accessId = originalConfig.getString(Key.ACCESS_ID); String accessKey = originalConfig.getString(Key.ACCESS_KEY); String odpsServer = originalConfig.getString(Key.ODPS_SERVER); String project = originalConfig.getString(Key.PROJECT); String securityToken = originalConfig.getString(Key.SECURITY_TOKEN); Account account; if (StringUtils.isNotBlank(securityToken)) { account = new com.aliyun.odps.account.StsAccount(accessId, accessKey, securityToken); } else { account = new AliyunAccount(accessId, accessKey); } Odps odps = new Odps(account); boolean isPreCheck = originalConfig.getBool("dryRun", false); if(isPreCheck) { odps.getRestClient().setConnectTimeout(3); odps.getRestClient().setReadTimeout(3); odps.getRestClient().setRetryTimes(2); } odps.setDefaultProject(project); odps.setEndpoint(odpsServer); odps.setUserAgent("DATAX"); return odps; } public static Table getTable(Odps odps, String projectName, String tableName) { final Table table = odps.tables().get(projectName, tableName); try { //通过这种方式检查表是否存在,失败重试。重试策略:每秒钟重试一次,最大重试3次 return RetryUtil.executeWithRetry(new Callable

() { @Override public Table call() throws Exception { table.reload(); return table; } }, 3, 1000, false); } catch (Exception e) { throwDataXExceptionWhenReloadTable(e, tableName); } return table; } public static List listOdpsPartitions(Table table) { List parts = new ArrayList(); try { List partitions = table.getPartitions(); for(Partition partition : partitions) { parts.add(partition.getPartitionSpec().toString()); } } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.GET_PARTITION_FAIL, MESSAGE_SOURCE.message("odpsutil.5", table.getName()), e); } return parts; } public static boolean isPartitionedTable(Table table) { //必须要是非分区表才能 truncate 整个表 List partitionKeys; try { partitionKeys = table.getSchema().getPartitionColumns(); if (null != partitionKeys && !partitionKeys.isEmpty()) { return true; } } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.CHECK_IF_PARTITIONED_TABLE_FAILED, MESSAGE_SOURCE.message("odpsutil.6", table.getName()), e); } return false; } public static void truncateNonPartitionedTable(Odps odps, Table tab) { truncateNonPartitionedTable(odps, tab.getName()); } public static void truncateNonPartitionedTable(Odps odps, String tableName) { String truncateNonPartitionedTableSql = "truncate table " + tableName + ";"; try { LOG.info("truncate non partitioned table with sql: {}", truncateNonPartitionedTableSql); runSqlTaskWithRetry(odps, truncateNonPartitionedTableSql, MAX_RETRY_TIME, 1000, true, "truncate", null); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.TABLE_TRUNCATE_ERROR, MESSAGE_SOURCE.message("odpsutil.7", tableName), e); } } public static void truncatePartition(Odps odps, Table table, String partition) { if (isPartitionExist(table, partition)) { LOG.info("partition {} is already exist, truncate it to clean old data", partition); dropPart(odps, table, partition); } LOG.info("begin to add partition {}", partition); addPart(odps, table, partition); } private static boolean isPartitionExist(Table table, String partition) { // check if exist partition 返回值不为 null List odpsParts = OdpsUtil.listOdpsPartitions(table); int j = 0; for (; j < odpsParts.size(); j++) { if (odpsParts.get(j).replaceAll("'", "").equals(partition)) { LOG.info("found a partiton {} equals to (ignore ' if contains) configured partiton {}", odpsParts.get(j), partition); break; } } return j != odpsParts.size(); } public static void addPart(Odps odps, Table table, String partition) { String partSpec = getPartSpec(partition); // add if not exists partition StringBuilder addPart = new StringBuilder(); addPart.append("alter table ").append(table.getName()).append(" add IF NOT EXISTS partition(") .append(partSpec).append(");"); try { Map hints = new HashMap(); //开启ODPS SQL TYPE2.0类型 hints.put("odps.sql.type.system.odps2", "true"); LOG.info("add partition with sql: {}", addPart.toString()); runSqlTaskWithRetry(odps, addPart.toString(), MAX_RETRY_TIME, 1000, true, "addPart", hints); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.ADD_PARTITION_FAILED, MESSAGE_SOURCE.message("odpsutil.8", table.getProject(), table.getName(), partition), e); } } public static TableTunnel.UploadSession createMasterTunnelUpload(final TableTunnel tunnel, final String projectName, final String tableName, final String partition) { if(StringUtils.isBlank(partition)) { try { return RetryUtil.executeWithRetry(new Callable() { @Override public TableTunnel.UploadSession call() throws Exception { return tunnel.createUploadSession(projectName, tableName); } }, MAX_RETRY_TIME, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.CREATE_MASTER_UPLOAD_FAIL, MESSAGE_SOURCE.message("odpsutil.9"), e); } } else { final PartitionSpec partitionSpec = new PartitionSpec(partition); try { return RetryUtil.executeWithRetry(new Callable() { @Override public TableTunnel.UploadSession call() throws Exception { return tunnel.createUploadSession(projectName, tableName, partitionSpec); } }, MAX_RETRY_TIME, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.CREATE_MASTER_UPLOAD_FAIL, MESSAGE_SOURCE.message("odpsutil.10"), e); } } } public static TableTunnel.UploadSession getSlaveTunnelUpload(final TableTunnel tunnel, final String projectName, final String tableName, final String partition, final String uploadId) { if(StringUtils.isBlank(partition)) { try { return RetryUtil.executeWithRetry(new Callable() { @Override public TableTunnel.UploadSession call() throws Exception { return tunnel.getUploadSession(projectName, tableName, uploadId); } }, MAX_RETRY_TIME, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.GET_SLAVE_UPLOAD_FAIL, MESSAGE_SOURCE.message("odpsutil.11"), e); } } else { final PartitionSpec partitionSpec = new PartitionSpec(partition); try { return RetryUtil.executeWithRetry(new Callable() { @Override public TableTunnel.UploadSession call() throws Exception { return tunnel.getUploadSession(projectName, tableName, partitionSpec, uploadId); } }, MAX_RETRY_TIME, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.GET_SLAVE_UPLOAD_FAIL, MESSAGE_SOURCE.message("odpsutil.12"), e); } } } private static void dropPart(Odps odps, Table table, String partition) { String partSpec = getPartSpec(partition); StringBuilder dropPart = new StringBuilder(); dropPart.append("alter table ").append(table.getName()) .append(" drop IF EXISTS partition(").append(partSpec) .append(");"); try { Map hints = new HashMap(); //开启ODPS SQL TYPE2.0类型 hints.put("odps.sql.type.system.odps2", "true"); LOG.info("drop partition with sql: {}", dropPart.toString()); runSqlTaskWithRetry(odps, dropPart.toString(), MAX_RETRY_TIME, 1000, true, "truncate", hints); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.ADD_PARTITION_FAILED, MESSAGE_SOURCE.message("odpsutil.13", table.getProject(), table.getName(), partition), e); } } private static String getPartSpec(String partition) { StringBuilder partSpec = new StringBuilder(); String[] parts = partition.split(","); for (int i = 0; i < parts.length; i++) { String part = parts[i]; String[] kv = part.split("="); if (kv.length != 2) { throw DataXException.asDataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpsutil.14", partition)); } partSpec.append(kv[0]).append("="); partSpec.append("'").append(kv[1].replace("'", "")).append("'"); if (i != parts.length - 1) { partSpec.append(","); } } return partSpec.toString(); } public static Instance runSqlTaskWithRetry(final Odps odps, final String sql, String tag) { try { long beginTime = System.currentTimeMillis(); Instance instance = runSqlTaskWithRetry(odps, sql, MAX_RETRY_TIME, 1000, true, tag, null); long endIime = System.currentTimeMillis(); LOG.info(String.format("exectue odps sql: %s finished, cost time : %s ms", sql, (endIime - beginTime))); return instance; } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.RUN_SQL_ODPS_EXCEPTION, MESSAGE_SOURCE.message("odpsutil.16", sql), e); } } public static ResultSet getSqlTaskRecordsWithRetry(final Odps odps, final String sql, String tag) { Instance instance = runSqlTaskWithRetry(odps, sql, tag); if (instance == null) { LOG.error("can not get odps instance from sql {}", sql); throw DataXException.asDataXException(OdpsWriterErrorCode.RUN_SQL_ODPS_EXCEPTION, MESSAGE_SOURCE.message("odpsutil.16", sql)); } try { return SQLTask.getResultSet(instance, instance.getTaskNames().iterator().next()); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.RUN_SQL_ODPS_EXCEPTION, MESSAGE_SOURCE.message("odpsutil.16", sql), e); } } /** * 该方法只有在 sql 为幂等的才可以使用,且odps抛出异常时候才会进行重试 * * @param odps odps * @param query 执行sql * @throws Exception */ public static Instance runSqlTaskWithRetry(final Odps odps, final String query, int retryTimes, long sleepTimeInMilliSecond, boolean exponential, String tag, Map hints) throws Exception { for(int i = 0; i < retryTimes; i++) { try { return runSqlTask(odps, query, tag, hints); } catch (DataXException e) { if (OdpsWriterErrorCode.RUN_SQL_ODPS_EXCEPTION.equals(e.getErrorCode())) { LOG.debug("Exception when calling callable", e); if (i + 1 < retryTimes && sleepTimeInMilliSecond > 0) { LOG.warn(String.format("will do [%s] times retry, current exception=%s", i + 1, e.getMessage())); long timeToSleep; if (exponential) { timeToSleep = sleepTimeInMilliSecond * (long) Math.pow(2, i); if(timeToSleep >= 128 * 1000) { timeToSleep = 128 * 1000; } } else { timeToSleep = sleepTimeInMilliSecond; if(timeToSleep >= 128 * 1000) { timeToSleep = 128 * 1000; } } try { Thread.sleep(timeToSleep); } catch (InterruptedException ignored) { } } else { throw e; } } else { throw e; } } catch (Exception e) { throw e; } } return null; } public static Instance runSqlTask(Odps odps, String query, String tag, Map hints) { if (StringUtils.isBlank(query)) { return null; } String taskName = String.format("datax_odpswriter_%s_%s", tag, UUID.randomUUID().toString().replace('-', '_')); LOG.info("Try to start sqlTask:[{}] to run odps sql:[\n{}\n] .", taskName, query); //todo:biz_id set (目前ddl先不做) Instance instance; Instance.TaskStatus status; try { instance = SQLTask.run(odps, odps.getDefaultProject(), query, taskName, hints, null); instance.waitForSuccess(); status = instance.getTaskStatus().get(taskName); if (!Instance.TaskStatus.Status.SUCCESS.equals(status.getStatus())) { throw DataXException.asDataXException(OdpsWriterErrorCode.RUN_SQL_FAILED, MESSAGE_SOURCE.message("odpsutil.15", query)); } return instance; } catch (DataXException e) { throw e; } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.RUN_SQL_ODPS_EXCEPTION, MESSAGE_SOURCE.message("odpsutil.16", query), e); } } public static String generateTaskName(String tag) { return String.format("datax_odpswriter_%s_%s", tag, UUID.randomUUID().toString().replace('-', '_')); } public static void checkBlockComplete(final TableTunnel.UploadSession masterUpload, final Long[] blocks) { Long[] serverBlocks; try { serverBlocks = RetryUtil.executeWithRetry(new Callable() { @Override public Long[] call() throws Exception { return masterUpload.getBlockList(); } }, MAX_RETRY_TIME, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.COMMIT_BLOCK_FAIL, MESSAGE_SOURCE.message("odpsutil.17", masterUpload.getId()), e); } HashMap serverBlockMap = new HashMap(); for (Long blockId : serverBlocks) { serverBlockMap.put(blockId, true); } for (Long blockId : blocks) { if (!serverBlockMap.containsKey(blockId)) { throw DataXException.asDataXException(OdpsWriterErrorCode.COMMIT_BLOCK_FAIL, "BlockId[" + blockId + "] upload failed!"); } } } public static void masterComplete(final TableTunnel.UploadSession masterUpload) { try { RetryUtil.executeWithRetry(new Callable() { @Override public Void call() throws Exception { masterUpload.commit(); return null; } }, MAX_RETRY_TIME, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.COMMIT_BLOCK_FAIL, MESSAGE_SOURCE.message("odpsutil.17", masterUpload.getId()), e); } } public static void masterCompleteBlocks(final TableTunnel.UploadSession masterUpload, final Long[] blocks) { try { RetryUtil.executeWithRetry(new Callable() { @Override public Void call() throws Exception { masterUpload.commit(blocks); return null; } }, MAX_RETRY_TIME, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.COMMIT_BLOCK_FAIL, MESSAGE_SOURCE.message("odpsutil.17", StringUtils.join(blocks, ","), masterUpload.getId()), e); } } public static void slaveWriteOneBlock(final TableTunnel.UploadSession slaveUpload, final ProtobufRecordPack protobufRecordPack, final long blockId, final Long timeoutInMs) { try { RetryUtil.executeWithRetry(new Callable() { @Override public Void call() throws Exception { slaveUpload.writeBlock(blockId, protobufRecordPack, timeoutInMs); return null; } }, MAX_RETRY_TIME, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.WRITER_RECORD_FAIL, MESSAGE_SOURCE.message("odpsutil.18", blockId, slaveUpload.getId()), e); } } public static List parsePosition(List allColumnList, List allPartColumnList, List userConfiguredColumns) { List retList = new ArrayList(); boolean hasColumn; for (String col : userConfiguredColumns) { hasColumn = false; for (int i = 0, len = allColumnList.size(); i < len; i++) { if (allColumnList.get(i).equalsIgnoreCase(col)) { retList.add(i); hasColumn = true; break; } } if (null != allPartColumnList) { for (int i = 0, len = allPartColumnList.size(); i < len; i++) { if (allPartColumnList.get(i).equalsIgnoreCase(col)) { retList.add(-1); hasColumn = true; break; } } } if (!hasColumn) { throw DataXException.asDataXException(OdpsWriterErrorCode.COLUMN_NOT_EXIST, MESSAGE_SOURCE.message("odpsutil.19", col)); } } return retList; } public static List getAllColumns(TableSchema schema) { if (null == schema) { throw new IllegalArgumentException("parameter schema can not be null."); } List allColumns = new ArrayList(); List columns = schema.getColumns(); OdpsType type; for(Column column: columns) { allColumns.add(column.getName()); type = column.getType(); } return allColumns; } public static List getAllPartColumns(TableSchema schema) { if (null == schema) { throw new IllegalArgumentException("parameter schema can not be null."); } List allPartColumns = new ArrayList<>(); List partCols = schema.getPartitionColumns(); for (Column column : partCols) { allPartColumns.add(column.getName()); } return allPartColumns; } public static String getPartColValFromDataXRecord(com.alibaba.datax.common.element.Record dataxRecord, List positions, List userConfiguredColumns, Map dateTransFormMap) { StringBuilder partition = new StringBuilder(); for (int i = 0, len = dataxRecord.getColumnNumber(); i < len; i++) { if (positions.get(i) == -1) { if (partition.length() > 0) { partition.append(","); } String partName = userConfiguredColumns.get(i); //todo: 这里应该根据分区列的类型做转换,这里先直接toString转换了 com.alibaba.datax.common.element.Column partitionCol = dataxRecord.getColumn(i); String partVal = partitionCol.getRawData().toString(); if (StringUtils.isBlank(partVal)) { throw new DataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, String.format( "value of column %s exit null value, it can not be used as partition column", partName)); } // 如果分区列的值的格式是一个日期,并且用户设置列的转换规则 DateTransForm dateTransForm = null; if (null != dateTransFormMap) { dateTransForm = dateTransFormMap.get(partName); } if (null != dateTransForm) { try { // 日期列 if (partitionCol.getType().equals(com.alibaba.datax.common.element.Column.Type.DATE)) { partVal = OdpsUtil.date2StringWithFormat(partitionCol.asDate(), dateTransForm.getToFormat()); } // String 列,需要先按照 fromFormat 转换为日期 if (partitionCol.getType().equals(com.alibaba.datax.common.element.Column.Type.STRING)) { partVal = OdpsUtil.date2StringWithFormat(partitionCol.asDate(dateTransForm.getFromFormat()), dateTransForm.getToFormat()); } } catch (DataXException e) { LOG.warn("Parse {} with format {} error! Please check the column config and {} config. So user original value '{}'. Detail info: {}", partVal, dateTransForm.toString(), Key.PARTITION_COL_MAPPING, partVal, e); } } partition.append(partName).append("=").append(partVal); } } return partition.toString(); } public static String date2StringWithFormat(Date date, String dateFormat) { return DateFormatUtils.format(date, dateFormat, TimeZone.getTimeZone("GMT+8")); } public static List getTableOriginalColumnTypeList(TableSchema schema) { List tableOriginalColumnTypeList = new ArrayList(); List columns = schema.getColumns(); for (Column column : columns) { tableOriginalColumnTypeList.add(column.getTypeInfo()); } return tableOriginalColumnTypeList; } public static void dealTruncate(Odps odps, Table table, String partition, boolean truncate) { boolean isPartitionedTable = OdpsUtil.isPartitionedTable(table); if (truncate) { //需要 truncate if (isPartitionedTable) { //分区表 if (StringUtils.isBlank(partition)) { throw DataXException.asDataXException(OdpsWriterErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsutil.21", table.getName())); } else { LOG.info("Try to truncate partition=[{}] in table=[{}].", partition, table.getName()); OdpsUtil.truncatePartition(odps, table, partition); } } else { //非分区表 if (StringUtils.isNotBlank(partition)) { throw DataXException.asDataXException(OdpsWriterErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsutil.22", table.getName())); } else { LOG.info("Try to truncate table:[{}].", table.getName()); OdpsUtil.truncateNonPartitionedTable(odps, table); } } } else { //不需要 truncate if (isPartitionedTable) { //分区表 if (StringUtils.isBlank(partition)) { throw DataXException.asDataXException(OdpsWriterErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsutil.23", table.getName())); } else { boolean isPartitionExists = OdpsUtil.isPartitionExist(table, partition); if (!isPartitionExists) { LOG.info("Try to add partition:[{}] in table:[{}].", partition, table.getName()); OdpsUtil.addPart(odps, table, partition); } } } else { //非分区表 if (StringUtils.isNotBlank(partition)) { throw DataXException.asDataXException(OdpsWriterErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsutil.24", table.getName())); } } } } /** * 检查odpswriter 插件的分区信息 * * @param odps * @param table * @param partition * @param truncate */ public static void preCheckPartition(Odps odps, Table table, String partition, boolean truncate) { boolean isPartitionedTable = OdpsUtil.isPartitionedTable(table); if (truncate) { //需要 truncate if (isPartitionedTable) { //分区表 if (StringUtils.isBlank(partition)) { throw DataXException.asDataXException(OdpsWriterErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsutil.25", table.getName())); } } else { //非分区表 if (StringUtils.isNotBlank(partition)) { throw DataXException.asDataXException(OdpsWriterErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsutil.26", table.getName())); } } } else { //不需要 truncate if (isPartitionedTable) { //分区表 if (StringUtils.isBlank(partition)) { throw DataXException.asDataXException(OdpsWriterErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsutil.27", table.getName())); } } else { //非分区表 if (StringUtils.isNotBlank(partition)) { throw DataXException.asDataXException(OdpsWriterErrorCode.PARTITION_ERROR, MESSAGE_SOURCE.message("odpsutil.28", table.getName())); } } } } /** * table.reload() 方法抛出的 odps 异常 转化为更清晰的 datax 异常 抛出 */ public static void throwDataXExceptionWhenReloadTable(Exception e, String tableName) { if(e.getMessage() != null) { if(e.getMessage().contains(OdpsExceptionMsg.ODPS_PROJECT_NOT_FOUNT)) { throw DataXException.asDataXException(OdpsWriterErrorCode.ODPS_PROJECT_NOT_FOUNT, MESSAGE_SOURCE.message("odpsutil.29", tableName), e); } else if(e.getMessage().contains(OdpsExceptionMsg.ODPS_TABLE_NOT_FOUNT)) { throw DataXException.asDataXException(OdpsWriterErrorCode.ODPS_TABLE_NOT_FOUNT, MESSAGE_SOURCE.message("odpsutil.30", tableName), e); } else if(e.getMessage().contains(OdpsExceptionMsg.ODPS_ACCESS_KEY_ID_NOT_FOUND)) { throw DataXException.asDataXException(OdpsWriterErrorCode.ODPS_ACCESS_KEY_ID_NOT_FOUND, MESSAGE_SOURCE.message("odpsutil.31", tableName), e); } else if(e.getMessage().contains(OdpsExceptionMsg.ODPS_ACCESS_KEY_INVALID)) { throw DataXException.asDataXException(OdpsWriterErrorCode.ODPS_ACCESS_KEY_INVALID, MESSAGE_SOURCE.message("odpsutil.32", tableName), e); } else if(e.getMessage().contains(OdpsExceptionMsg.ODPS_ACCESS_DENY)) { throw DataXException.asDataXException(OdpsWriterErrorCode.ODPS_ACCESS_DENY, MESSAGE_SOURCE.message("odpsutil.33", tableName), e); } } throw DataXException.asDataXException(OdpsWriterErrorCode.ILLEGAL_VALUE, MESSAGE_SOURCE.message("odpsutil.34", tableName), e); } /** * count统计数据,自动创建统计表 * @param tableName 统计表名字 * @return */ public static String getCreateSummaryTableDDL(String tableName) { return String.format("CREATE TABLE IF NOT EXISTS %s " + "(src_table_name STRING, " + "dest_table_name STRING, " + "src_row_num BIGINT, " + "src_query_time DATETIME, " + "read_succeed_records BIGINT," + "write_succeed_records BIGINT," + "dest_row_num BIGINT, " + "write_time DATETIME);", tableName); } /** * count统计数据,获取count dml * @param tableName * @return */ public static String countTableSql(final String tableName, final String partition) { if (StringUtils.isNotBlank(partition)) { String[] partitions = partition.split("\\,"); String p = String.join(" and ", partitions); return String.format("SELECT COUNT(1) AS odps_num FROM %s WHERE %s;", tableName, p); } else { return String.format("SELECT COUNT(1) AS odps_num FROM %s;", tableName); } } /** * count统计数据 dml 对应字段,用于查询 * @return */ public static String countName() { return "odps_num"; } /** * count统计数据dml * @param summaryTableName 统计数据写入表 * @param sourceTableName datax reader 表 * @param destTableName datax writer 表 * @param srcCount reader表行数 * @param queryTime reader表查询时间 * @param destCount writer 表行书 * @return insert dml sql */ public static String getInsertSummaryTableSql(String summaryTableName, String sourceTableName, String destTableName, Long srcCount, String queryTime, Number readSucceedRecords, Number writeSucceedRecords, Long destCount) { final String sql = "INSERT INTO %s (src_table_name,dest_table_name," + " src_row_num, src_query_time, read_succeed_records, write_succeed_records, dest_row_num, write_time) VALUES ( %s );"; String insertData = String.format("'%s', '%s', %s, %s, %s, %s, %s, getdate()", sourceTableName, destTableName, srcCount, queryTime, readSucceedRecords, writeSucceedRecords, destCount ); return String.format(sql, summaryTableName, insertData); } public static void createTable(Odps odps, String tableName, final String sql) { try { LOG.info("create table with sql: {}", sql); runSqlTaskWithRetry(odps, sql, MAX_RETRY_TIME, 1000, true, "create", null); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.RUN_SQL_FAILED, MESSAGE_SOURCE.message("odpsutil.7", tableName), e); } } public static void createTableFromTable(Odps odps, String resourceTable, String targetTable) { TableSchema schema = odps.tables().get(resourceTable).getSchema(); StringBuilder builder = new StringBuilder(); Iterator iterator = schema.getColumns().iterator(); while (iterator.hasNext()) { Column c = iterator.next(); builder.append(String.format(" %s %s ", c.getName(), c.getTypeInfo().getTypeName())); if (iterator.hasNext()) { builder.append(","); } } String createTableSql = String.format("CREATE TABLE IF NOT EXISTS %s (%s);", targetTable, builder.toString()); try { LOG.info("create table with sql: {}", createTableSql); runSqlTaskWithRetry(odps, createTableSql, MAX_RETRY_TIME, 1000, true, "create", null); } catch (Exception e) { throw DataXException.asDataXException(OdpsWriterErrorCode.RUN_SQL_FAILED, MESSAGE_SOURCE.message("odpsutil.7", targetTable), e); } } public static Object truncateSingleFieldData(OdpsType type, Object data, int limit, Boolean enableOverLengthOutput) { if (data == null) { return data; } if (OdpsType.STRING.equals(type)) { if(enableOverLengthOutput) { LOG.warn( "InvalidData: The string's length is more than " + limit + " bytes. content:" + data); } LOG.info("before truncate string length:" + ((String) data).length()); //确保特殊字符场景下的截断 limit -= Constant.UTF8_ENCODED_CHAR_MAX_SIZE; data = cutString((String) data, limit); LOG.info("after truncate string length:" + ((String) data).length()); } else if (OdpsType.BINARY.equals(type)) { byte[] oriDataBytes = ((Binary) data).data(); if(oriDataBytes == null){ return data; } int originLength = oriDataBytes.length; if (originLength <= limit) { return data; } if(enableOverLengthOutput) { LOG.warn("InvalidData: The binary's length is more than " + limit + " bytes. content:" + byteArrToHex(oriDataBytes)); } LOG.info("before truncate binary length:" + oriDataBytes.length); byte[] newData = new byte[limit]; System.arraycopy(oriDataBytes, 0, newData, 0, limit); LOG.info("after truncate binary length:" + newData.length); return new Binary(newData); } return data; } public static Object setNull(OdpsType type,Object data, int limit, Boolean enableOverLengthOutput) { if (data == null ) { return null; } if (OdpsType.STRING.equals(type)) { if(enableOverLengthOutput) { LOG.warn( "InvalidData: The string's length is more than " + limit + " bytes. content:" + data); } return null; } else if (OdpsType.BINARY.equals(type)) { byte[] oriDataBytes = ((Binary) data).data(); int originLength = oriDataBytes.length; if (originLength > limit) { if(enableOverLengthOutput) { LOG.warn("InvalidData: The binary's length is more than " + limit + " bytes. content:" + new String(oriDataBytes)); } return null; } } return data; } public static boolean validateStringLength(String value, long limit) { try { if (value.length() * Constant.UTF8_ENCODED_CHAR_MAX_SIZE > limit && value.getBytes("utf-8").length > limit) { return false; } } catch (Exception e) { e.printStackTrace(); return true; } return true; } public static String cutString(String sourceString, int cutBytes) { if (sourceString == null || "".equals(sourceString.trim()) || cutBytes < 1) { return ""; } int lastIndex = 0; boolean stopFlag = false; int totalBytes = 0; for (int i = 0; i < sourceString.length(); i++) { String s = Integer.toBinaryString(sourceString.charAt(i)); if (s.length() > 8) { totalBytes += 3; } else { totalBytes += 1; } if (!stopFlag) { if (totalBytes == cutBytes) { lastIndex = i; stopFlag = true; } else if (totalBytes > cutBytes) { lastIndex = i - 1; stopFlag = true; } } } if (!stopFlag) { return sourceString; } else { return sourceString.substring(0, lastIndex + 1); } } public static boolean dataOverLength(OdpsType type, Object data, int limit){ if (data == null ) { return false; } if (OdpsType.STRING.equals(type)) { if(!OdpsUtil.validateStringLength((String)data, limit)){ return true; } }else if (OdpsType.BINARY.equals(type)){ byte[] oriDataBytes = ((Binary) data).data(); if(oriDataBytes == null){ return false; } int originLength = oriDataBytes.length; if (originLength > limit) { return true; } } return false; } public static Object processOverLengthData(Object data, OdpsType type, String overLengthRule, int maxFieldLength, Boolean enableOverLengthOutput) { try{ //超长数据检查 if(OdpsWriter.maxOutputOverLengthRecord != null && OdpsWriter.globalTotalTruncatedRecordNumber.get() >= OdpsWriter.maxOutputOverLengthRecord){ enableOverLengthOutput = false; } if ("truncate".equalsIgnoreCase(overLengthRule)) { if (OdpsUtil.dataOverLength(type, data, OdpsWriter.maxOdpsFieldLength)) { Object newData = OdpsUtil.truncateSingleFieldData(type, data, maxFieldLength, enableOverLengthOutput); OdpsWriter.globalTotalTruncatedRecordNumber.incrementAndGet(); return newData; } } else if ("setNull".equalsIgnoreCase(overLengthRule)) { if (OdpsUtil.dataOverLength(type, data, OdpsWriter.maxOdpsFieldLength)) { OdpsWriter.globalTotalTruncatedRecordNumber.incrementAndGet(); return OdpsUtil.setNull(type, data, maxFieldLength, enableOverLengthOutput); } } }catch (Throwable e){ LOG.warn("truncate overLength data failed!", e); } return data; } private static final char HEX_CHAR_ARR[] = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'}; /** * 字节数组转十六进制字符串 * @param btArr * @return */ public static String byteArrToHex(byte[] btArr) { char strArr[] = new char[btArr.length * 2]; int i = 0; for (byte bt : btArr) { strArr[i++] = HEX_CHAR_ARR[bt>>>4 & 0xf]; strArr[i++] = HEX_CHAR_ARR[bt & 0xf]; } return new String(strArr); } public static byte[] hexToByteArr(String hexStr) { char[] charArr = hexStr.toCharArray(); byte btArr[] = new byte[charArr.length / 2]; int index = 0; for (int i = 0; i < charArr.length; i++) { int highBit = hexStr.indexOf(charArr[i]); int lowBit = hexStr.indexOf(charArr[++i]); btArr[index] = (byte) (highBit << 4 | lowBit); index++; } return btArr; } } ================================================ FILE: odpswriter/src/main/resources/plugin.json ================================================ { "name": "odpswriter", "class": "com.alibaba.datax.plugin.writer.odpswriter.OdpsWriter", "description": { "useScene": "prod.", "mechanism": "TODO", "warn": "TODO" }, "developer": "alibaba" } ================================================ FILE: odpswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "odpswriter", "parameter": { "project": "", "table": "", "partition":"", "column": [], "accessId": "", "accessKey": "", "truncate": true, "odpsServer": "", "tunnelServer": "" } } ================================================ FILE: opentsdbreader/doc/opentsdbreader.md ================================================ # OpenTSDBReader 插件文档 ___ ## 1 快速介绍 OpenTSDBReader 插件实现了从 OpenTSDB 读取数据。OpenTSDB 是主要由 Yahoo 维护的、可扩展的、分布式时序数据库,与阿里巴巴自研 TSDB 的关系与区别详见阿里云官网:《[相比 OpenTSDB 优势](https://help.aliyun.com/document_detail/113368.html)》 ## 2 实现原理 在底层实现上,OpenTSDBReader 通过 HTTP 请求链接到 OpenTSDB 实例,利用 `/api/config` 接口获取到其底层存储 HBase 的连接信息,再利用 AsyncHBase 框架连接 HBase,通过 Scan 的方式将数据点扫描出来。整个同步的过程通过 metric 和时间段进行切分,即某个 metric 在某一个小时内的数据迁移,组合成一个迁移 Task。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从 OpenTSDB 数据库同步抽取数据到本地的作业: ```json { "job": { "content": [ { "reader": { "name": "opentsdbreader", "parameter": { "endpoint": "http://localhost:4242", "column": [ "m" ], "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 03:00:00" } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 1 } } } } ``` ### 3.2 参数说明 * **name** * 描述:本插件的名称 * 必选:是 * 默认值:opentsdbreader * **parameter** * **endpoint** * 描述:OpenTSDB 的 HTTP 连接地址 * 必选:是 * 格式:http://IP:Port * 默认值:无 * **column** * 描述:数据迁移任务需要迁移的 Metric 列表 * 必选:是 * 默认值:无 * **beginDateTime** * 描述:和 endDateTime 配合使用,用于指定哪个时间段内的数据点,需要被迁移 * 必选:是 * 格式:`yyyy-MM-dd HH:mm:ss` * 默认值:无 * 注意:指定起止时间会自动忽略分钟和秒,转为整点时刻,例如 2019-4-18 的 [3:35, 4:55) 会被转为 [3:00, 4:00) * **endDateTime** * 描述:和 beginDateTime 配合使用,用于指定哪个时间段内的数据点,需要被迁移 * 必选:是 * 格式:`yyyy-MM-dd HH:mm:ss` * 默认值:无 * 注意:指定起止时间会自动忽略分钟和秒,转为整点时刻,例如 2019-4-18 的 [3:35, 4:55) 会被转为 [3:00, 4:00) ### 3.3 类型转换 | DataX 内部类型 | TSDB 数据类型 | | -------------- | ------------------------------------------------------------ | | String | TSDB 数据点序列化字符串,包括 timestamp、metric、tags 和 value | ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 从 Metric、时间线、Value 和 采集周期 四个方面来描述: ##### metric 固定指定一个 metric 为 `m`。 ##### tagkv 前四个 tagkv 全排列,形成 `10 * 20 * 100 * 100 = 2000000` 条时间线,最后 IP 对应 2000000 条时间线从 1 开始自增。 | **tag_k** | **tag_v** | | --------- | ------------- | | zone | z1~z10 | | cluster | c1~c20 | | group | g1~100 | | app | a1~a100 | | ip | ip1~ip2000000 | ##### value 度量值为 [1, 100] 区间内的随机值 ##### interval 采集周期为 10 秒,持续摄入 3 小时,总数据量为 `3 * 60 * 60 / 10 * 2000000 = 2,160,000,000` 个数据点。 #### 4.1.2 机器参数 OpenTSDB Reader 机型: 64C256G HBase 机型: 8C16G * 5 #### 4.1.3 DataX jvm 参数 "-Xms4096m -Xmx4096m" ### 4.2 测试报告 | 通道数| DataX 速度 (Rec/s) |DataX 流量 (MB/s)| |--------| --------|--------| |1| 215428 | 25.65 | |2| 424994 | 50.60 | |3| 603132 | 71.81 | ## 5 约束限制 ### 5.1 需要确保与 OpenTSDB 底层存储的网络是连通的 具体缘由详见 6.1 ### 5.2 如果存在某一个 Metric 下在一个小时范围内的数据量过大,可能需要通过 `-j` 参数调整 JVM 内存大小 考虑到下游 Writer 如果写入速度不及 OpenTSDB reader 的查询数据,可能会存在积压的情况,因此需要适当地调整 JVM 参数。以"从 OpenTSDB 数据库同步抽取数据到本地的作业"为例,启动命令如下: ```bash python datax/bin/datax.py opentsdb2stream.json -j "-Xms4096m -Xmx4096m" ``` ### 5.3 指定起止时间会自动被转为整点时刻 指定起止时间会自动被转为整点时刻,例如 2019-4-18 的 `[3:35, 3:55)` 会被转为 `[3:00, 4:00)` ### 5.4 目前只支持兼容 OpenTSDB 2.3.x 其他版本暂不保证兼容 ## 6 FAQ *** **Q:为什么需要连接 OpenTSDB 的底层存储,为什么不直接使用 `/api/query` 查询获取数据点?** A:因为通过 OpenTSDB 的 HTTP 接口(`/api/query`)来读取数据的话,经内部压测发现,在大数据量的情况下,会导致 OpenTSDB 的异步框架会报 CallBack 过多的问题;所以,采用了直连底层 HBase 存储,通过 Scan 的方式来扫描数据点,来避免这个问题。另外,还考虑到,可以通过指定 metric 和时间范围,可以顺序地 Scan HBase 表,提高查询效率。 ================================================ FILE: opentsdbreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT opentsdbreader opentsdbreader jar UTF-8 3.3.2 4.5 2.4 2.3.2 4.13.1 2.9.9 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j commons-math3 org.apache.commons org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.commons commons-lang3 ${commons-lang3.version} org.apache.httpcomponents httpclient ${httpclient.version} commons-io commons-io ${commons-io.version} org.apache.httpcomponents fluent-hc ${httpclient.version} com.alibaba.fastjson2 fastjson2 net.opentsdb opentsdb ${opentsdb.version} joda-time joda-time ${joda-time.version} junit junit ${junit4.version} test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: opentsdbreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/opentsdbreader target/ opentsdbreader-0.0.1-SNAPSHOT.jar plugin/reader/opentsdbreader false plugin/reader/opentsdbreader/libs runtime ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/CliQuery.java ================================================ package com.alibaba.datax.plugin.reader.conn; import net.opentsdb.core.*; import net.opentsdb.utils.DateTime; import java.util.ArrayList; import java.util.HashMap; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . final class CliQuery { /** * Parses the query from the command lines. * * @param args The command line arguments. * @param tsdb The TSDB to use. * @param queries The list in which {@link Query}s will be appended. */ static void parseCommandLineQuery(final String[] args, final TSDB tsdb, final ArrayList queries) { long start_ts = DateTime.parseDateTimeString(args[0], null); if (start_ts >= 0) { start_ts /= 1000; } long end_ts = -1; if (args.length > 3) { // see if we can detect an end time try { if (args[1].charAt(0) != '+' && (args[1].indexOf(':') >= 0 || args[1].indexOf('/') >= 0 || args[1].indexOf('-') >= 0 || Long.parseLong(args[1]) > 0)) { end_ts = DateTime.parseDateTimeString(args[1], null); } } catch (NumberFormatException ignore) { // ignore it as it means the third parameter is likely the aggregator } } // temp fixup to seconds from ms until the rest of TSDB supports ms // Note you can't append this to the DateTime.parseDateTimeString() call as // it clobbers -1 results if (end_ts >= 0) { end_ts /= 1000; } int i = end_ts < 0 ? 1 : 2; while (i < args.length && args[i].charAt(0) == '+') { i++; } while (i < args.length) { final Aggregator agg = Aggregators.get(args[i++]); final boolean rate = "rate".equals(args[i]); RateOptions rate_options = new RateOptions(false, Long.MAX_VALUE, RateOptions.DEFAULT_RESET_VALUE); if (rate) { i++; long counterMax = Long.MAX_VALUE; long resetValue = RateOptions.DEFAULT_RESET_VALUE; if (args[i].startsWith("counter")) { String[] parts = Tags.splitString(args[i], ','); if (parts.length >= 2 && parts[1].length() > 0) { counterMax = Long.parseLong(parts[1]); } if (parts.length >= 3 && parts[2].length() > 0) { resetValue = Long.parseLong(parts[2]); } rate_options = new RateOptions(true, counterMax, resetValue); i++; } } final boolean downsample = "downsample".equals(args[i]); if (downsample) { i++; } final long interval = downsample ? Long.parseLong(args[i++]) : 0; final Aggregator sampler = downsample ? Aggregators.get(args[i++]) : null; final String metric = args[i++]; final HashMap tags = new HashMap(); while (i < args.length && args[i].indexOf(' ', 1) < 0 && args[i].indexOf('=', 1) > 0) { Tags.parse(tags, args[i++]); } final Query query = tsdb.newQuery(); query.setStartTime(start_ts); if (end_ts > 0) { query.setEndTime(end_ts); } query.setTimeSeries(metric, tags, agg, rate, rate_options); if (downsample) { query.downsample(interval, sampler); } queries.add(query); } } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/Connection4TSDB.java ================================================ package com.alibaba.datax.plugin.reader.conn; import com.alibaba.datax.common.plugin.RecordSender; import java.util.List; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public interface Connection4TSDB { /** * Get the address of Database. * * @return host+ip */ String address(); /** * Get the version of Database. * * @return version */ String version(); /** * Get these configurations. * * @return configs */ String config(); /** * Get the list of supported version. * * @return version list */ String[] getSupportVersionPrefix(); /** * Send data points by metric & start time & end time. * * @param metric metric * @param start startTime * @param end endTime * @param recordSender sender */ void sendDPs(String metric, Long start, Long end, RecordSender recordSender) throws Exception; /** * Put data point. * * @param dp data point * @return whether the data point is written successfully */ boolean put(DataPoint4TSDB dp); /** * Put data points. * * @param dps data points * @return whether the data point is written successfully */ boolean put(List dps); /** * Whether current version is supported. * * @return true: supported; false: not yet! */ boolean isSupported(); } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/DataPoint4TSDB.java ================================================ package com.alibaba.datax.plugin.reader.conn; import com.alibaba.fastjson2.JSON; import java.util.Map; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public class DataPoint4TSDB { private long timestamp; private String metric; private Map tags; private Object value; public DataPoint4TSDB() { } public DataPoint4TSDB(long timestamp, String metric, Map tags, Object value) { this.timestamp = timestamp; this.metric = metric; this.tags = tags; this.value = value; } public long getTimestamp() { return timestamp; } public void setTimestamp(long timestamp) { this.timestamp = timestamp; } public String getMetric() { return metric; } public void setMetric(String metric) { this.metric = metric; } public Map getTags() { return tags; } public void setTags(Map tags) { this.tags = tags; } public Object getValue() { return value; } public void setValue(Object value) { this.value = value; } @Override public String toString() { return JSON.toJSONString(this); } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/DumpSeries.java ================================================ package com.alibaba.datax.plugin.reader.conn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.plugin.RecordSender; import net.opentsdb.core.*; import net.opentsdb.core.Internal.Cell; import org.hbase.async.KeyValue; import org.hbase.async.Scanner; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . final class DumpSeries { private static final Logger LOG = LoggerFactory.getLogger(DumpSeries.class); /** * Dump all data points with special metric and time range, then send them all by {@link RecordSender}. */ static void doDump(TSDB tsdb, String[] args, RecordSender sender) throws Exception { final ArrayList queries = new ArrayList(); CliQuery.parseCommandLineQuery(args, tsdb, queries); List dps = new LinkedList(); for (final Query query : queries) { final List scanners = Internal.getScanners(query); for (Scanner scanner : scanners) { ArrayList> rows; while ((rows = scanner.nextRows().join()) != null) { for (final ArrayList row : rows) { final byte[] key = row.get(0).key(); final long baseTime = Internal.baseTime(tsdb, key); final String metric = Internal.metricName(tsdb, key); for (final KeyValue kv : row) { formatKeyValue(dps, tsdb, kv, baseTime, metric); for (DataPoint4TSDB dp : dps) { StringColumn tsdbColumn = new StringColumn(dp.toString()); Record record = sender.createRecord(); record.addColumn(tsdbColumn); sender.sendToWriter(record); } dps.clear(); } } } } } } /** * Parse KeyValue into data points. */ private static void formatKeyValue(final List dps, final TSDB tsdb, final KeyValue kv, final long baseTime, final String metric) { Map tagKVs = Internal.getTags(tsdb, kv.key()); final byte[] qualifier = kv.qualifier(); final int q_len = qualifier.length; if (!AppendDataPoints.isAppendDataPoints(qualifier) && q_len % 2 != 0) { // custom data object, not a data point if (LOG.isDebugEnabled()) { LOG.debug("Not a data point"); } } else if (q_len == 2 || q_len == 4 && Internal.inMilliseconds(qualifier)) { // regular data point final Cell cell = Internal.parseSingleValue(kv); if (cell == null) { throw new IllegalDataException("Unable to parse row: " + kv); } dps.add(new DataPoint4TSDB(cell.absoluteTimestamp(baseTime), metric, tagKVs, cell.parseValue())); } else { final Collection cells; if (q_len == 3) { // append data points cells = new AppendDataPoints().parseKeyValue(tsdb, kv); } else { // compacted column cells = Internal.extractDataPoints(kv); } for (Cell cell : cells) { dps.add(new DataPoint4TSDB(cell.absoluteTimestamp(baseTime), metric, tagKVs, cell.parseValue())); } } } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBConnection.java ================================================ package com.alibaba.datax.plugin.reader.conn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.util.TSDBUtils; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import java.util.List; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public class OpenTSDBConnection implements Connection4TSDB { private String address; public OpenTSDBConnection(String address) { this.address = address; } @Override public String address() { return address; } @Override public String version() { return TSDBUtils.version(address); } @Override public String config() { return TSDBUtils.config(address); } @Override public String[] getSupportVersionPrefix() { return new String[]{"2.3"}; } @Override public void sendDPs(String metric, Long start, Long end, RecordSender recordSender) throws Exception { OpenTSDBDump.dump(this, metric, start, end, recordSender); } @Override public boolean put(DataPoint4TSDB dp) { return false; } @Override public boolean put(List dps) { return false; } @Override public boolean isSupported() { String versionJson = version(); if (StringUtils.isBlank(versionJson)) { throw new RuntimeException("Cannot get the version!"); } String version = JSON.parseObject(versionJson).getString("version"); if (StringUtils.isBlank(version)) { return false; } for (String prefix : getSupportVersionPrefix()) { if (version.startsWith(prefix)) { return true; } } return false; } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBDump.java ================================================ package com.alibaba.datax.plugin.reader.conn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.fastjson2.JSON; import net.opentsdb.core.TSDB; import net.opentsdb.utils.Config; import java.util.Map; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . final class OpenTSDBDump { private static TSDB TSDB_INSTANCE; private OpenTSDBDump() { } static void dump(OpenTSDBConnection conn, String metric, Long start, Long end, RecordSender sender) throws Exception { DumpSeries.doDump(getTSDB(conn), new String[]{start + "", end + "", "none", metric}, sender); } private static TSDB getTSDB(OpenTSDBConnection conn) { if (TSDB_INSTANCE == null) { synchronized (TSDB.class) { if (TSDB_INSTANCE == null) { try { Config config = new Config(false); Map configurations = JSON.parseObject(conn.config(), Map.class); for (Object key : configurations.keySet()) { config.overrideConfig(key.toString(), configurations.get(key.toString()).toString()); } TSDB_INSTANCE = new TSDB(config); } catch (Exception e) { throw new RuntimeException("Cannot init OpenTSDB connection!"); } } } } return TSDB_INSTANCE; } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.opentsdbreader; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public final class Constant { static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.opentsdbreader; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public class Key { static final String ENDPOINT = "endpoint"; static final String COLUMN = "column"; static final String BEGIN_DATE_TIME = "beginDateTime"; static final String END_DATE_TIME = "endDateTime"; } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/OpenTSDBReader.java ================================================ package com.alibaba.datax.plugin.reader.opentsdbreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.conn.OpenTSDBConnection; import com.alibaba.datax.plugin.reader.util.TimeUtils; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.joda.time.DateTime; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Collections; import java.util.List; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . @SuppressWarnings("unused") public class OpenTSDBReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig; @Override public void init() { this.originalConfig = super.getPluginJobConf(); String address = originalConfig.getString(Key.ENDPOINT); if (StringUtils.isBlank(address)) { throw DataXException.asDataXException( OpenTSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.ENDPOINT + "] is not set."); } List columns = originalConfig.getList(Key.COLUMN, String.class); if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException( OpenTSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.COLUMN + "] is not set."); } SimpleDateFormat format = new SimpleDateFormat(Constant.DEFAULT_DATA_FORMAT); String startTime = originalConfig.getString(Key.BEGIN_DATE_TIME); Long startDate; if (startTime == null || startTime.trim().length() == 0) { throw DataXException.asDataXException( OpenTSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.BEGIN_DATE_TIME + "] is not set."); } else { try { startDate = format.parse(startTime).getTime(); } catch (ParseException e) { throw DataXException.asDataXException(OpenTSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.BEGIN_DATE_TIME + "] needs to conform to the [" + Constant.DEFAULT_DATA_FORMAT + "] format."); } } String endTime = originalConfig.getString(Key.END_DATE_TIME); Long endDate; if (endTime == null || endTime.trim().length() == 0) { throw DataXException.asDataXException( OpenTSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.END_DATE_TIME + "] is not set."); } else { try { endDate = format.parse(endTime).getTime(); } catch (ParseException e) { throw DataXException.asDataXException(OpenTSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.END_DATE_TIME + "] needs to conform to the [" + Constant.DEFAULT_DATA_FORMAT + "] format."); } } if (startDate >= endDate) { throw DataXException.asDataXException(OpenTSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.BEGIN_DATE_TIME + "] should be less than the parameter [" + Key.END_DATE_TIME + "]."); } } @Override public void prepare() { } @Override public List split(int adviceNumber) { List configurations = new ArrayList(); // get metrics List columns = originalConfig.getList(Key.COLUMN, String.class); // get time range SimpleDateFormat format = new SimpleDateFormat(Constant.DEFAULT_DATA_FORMAT); long startTime; try { startTime = format.parse(originalConfig.getString(Key.BEGIN_DATE_TIME)).getTime(); } catch (ParseException e) { throw DataXException.asDataXException( OpenTSDBReaderErrorCode.ILLEGAL_VALUE, "解析[" + Key.BEGIN_DATE_TIME + "]失败.", e); } long endTime; try { endTime = format.parse(originalConfig.getString(Key.END_DATE_TIME)).getTime(); } catch (ParseException e) { throw DataXException.asDataXException( OpenTSDBReaderErrorCode.ILLEGAL_VALUE, "解析[" + Key.END_DATE_TIME + "]失败.", e); } if (TimeUtils.isSecond(startTime)) { startTime *= 1000; } if (TimeUtils.isSecond(endTime)) { endTime *= 1000; } DateTime startDateTime = new DateTime(TimeUtils.getTimeInHour(startTime)); DateTime endDateTime = new DateTime(TimeUtils.getTimeInHour(endTime)); // split by metric for (String column : columns) { // split by time in hour while (startDateTime.isBefore(endDateTime)) { Configuration clone = this.originalConfig.clone(); clone.set(Key.COLUMN, Collections.singletonList(column)); clone.set(Key.BEGIN_DATE_TIME, startDateTime.getMillis()); startDateTime = startDateTime.plusHours(1); // Make sure the time interval is [start, end). // Because net.opentsdb.core.Query.setEndTime means less than or equal to the end time. clone.set(Key.END_DATE_TIME, startDateTime.getMillis() - 1); configurations.add(clone); LOG.info("Configuration: {}", JSON.toJSONString(clone)); } } return configurations; } @Override public void post() { } @Override public void destroy() { } } public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private List columns; private OpenTSDBConnection conn; private Long startTime; private Long endTime; @Override public void init() { Configuration readerSliceConfig = super.getPluginJobConf(); LOG.info("getPluginJobConf: {}", JSON.toJSONString(readerSliceConfig)); this.columns = readerSliceConfig.getList(Key.COLUMN, String.class); String address = readerSliceConfig.getString(Key.ENDPOINT); conn = new OpenTSDBConnection(address); this.startTime = readerSliceConfig.getLong(Key.BEGIN_DATE_TIME); this.endTime = readerSliceConfig.getLong(Key.END_DATE_TIME); } @Override public void prepare() { } @Override public void startRead(RecordSender recordSender) { try { for (String column : columns) { conn.sendDPs(column, this.startTime, this.endTime, recordSender); } } catch (Exception e) { throw DataXException.asDataXException( OpenTSDBReaderErrorCode.ILLEGAL_VALUE, "获取或发送数据点的过程中出错!", e); } } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/opentsdbreader/OpenTSDBReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.opentsdbreader; import com.alibaba.datax.common.spi.ErrorCode; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public enum OpenTSDBReaderErrorCode implements ErrorCode { REQUIRED_VALUE("OpenTSDBReader-00", "缺失必要的值"), ILLEGAL_VALUE("OpenTSDBReader-01", "值非法"); private final String code; private final String description; OpenTSDBReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/HttpUtils.java ================================================ package com.alibaba.datax.plugin.reader.util; import com.alibaba.fastjson2.JSON; import org.apache.http.client.fluent.Content; import org.apache.http.client.fluent.Request; import org.apache.http.entity.ContentType; import java.nio.charset.Charset; import java.util.Map; import java.util.concurrent.TimeUnit; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public final class HttpUtils { public final static Charset UTF_8 = Charset.forName("UTF-8"); public final static int CONNECT_TIMEOUT_DEFAULT_IN_MILL = (int) TimeUnit.SECONDS.toMillis(60); public final static int SOCKET_TIMEOUT_DEFAULT_IN_MILL = (int) TimeUnit.SECONDS.toMillis(60); private HttpUtils() { } public static String get(String url) throws Exception { Content content = Request.Get(url) .connectTimeout(CONNECT_TIMEOUT_DEFAULT_IN_MILL) .socketTimeout(SOCKET_TIMEOUT_DEFAULT_IN_MILL) .execute() .returnContent(); if (content == null) { return null; } return content.asString(UTF_8); } public static String post(String url, Map params) throws Exception { return post(url, JSON.toJSONString(params), CONNECT_TIMEOUT_DEFAULT_IN_MILL, SOCKET_TIMEOUT_DEFAULT_IN_MILL); } public static String post(String url, String params) throws Exception { return post(url, params, CONNECT_TIMEOUT_DEFAULT_IN_MILL, SOCKET_TIMEOUT_DEFAULT_IN_MILL); } public static String post(String url, Map params, int connectTimeoutInMill, int socketTimeoutInMill) throws Exception { return post(url, JSON.toJSONString(params), connectTimeoutInMill, socketTimeoutInMill); } public static String post(String url, String params, int connectTimeoutInMill, int socketTimeoutInMill) throws Exception { Content content = Request.Post(url) .connectTimeout(connectTimeoutInMill) .socketTimeout(socketTimeoutInMill) .addHeader("Content-Type", "application/json") .bodyString(params, ContentType.APPLICATION_JSON) .execute() .returnContent(); if (content == null) { return null; } return content.asString(UTF_8); } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/TSDBUtils.java ================================================ package com.alibaba.datax.plugin.reader.util; import com.alibaba.datax.plugin.reader.conn.DataPoint4TSDB; import com.alibaba.fastjson2.JSON; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public final class TSDBUtils { private static final Logger LOG = LoggerFactory.getLogger(TSDBUtils.class); private TSDBUtils() { } public static String version(String address) { String url = String.format("%s/api/version", address); String rsp; try { rsp = HttpUtils.get(url); } catch (Exception e) { throw new RuntimeException(e); } return rsp; } public static String config(String address) { String url = String.format("%s/api/config", address); String rsp; try { rsp = HttpUtils.get(url); } catch (Exception e) { throw new RuntimeException(e); } return rsp; } public static boolean put(String address, List dps) { return put(address, JSON.toJSON(dps)); } public static boolean put(String address, DataPoint4TSDB dp) { return put(address, JSON.toJSON(dp)); } private static boolean put(String address, Object o) { String url = String.format("%s/api/put", address); String rsp; try { rsp = HttpUtils.post(url, o.toString()); // If successful, the returned content should be null. assert rsp == null; } catch (Exception e) { LOG.error("Address: {}, DataPoints: {}", url, o); throw new RuntimeException(e); } return true; } } ================================================ FILE: opentsdbreader/src/main/java/com/alibaba/datax/plugin/reader/util/TimeUtils.java ================================================ package com.alibaba.datax.plugin.reader.util; import java.util.concurrent.TimeUnit; //This file is part of OpenTSDB. //Copyright (C) 2010-2012 The OpenTSDB Authors. //Copyright(C)2019 Alibaba Group Holding Ltd. // //This program is free software: you can redistribute it and/or modify it //under the terms of the GNU Lesser General Public License as published by //the Free Software Foundation, either version 2.1 of the License, or (at your //option) any later version. This program is distributed in the hope that it //will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty //of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser //General Public License for more details. You should have received a copy //of the GNU Lesser General Public License along with this program. If not, //see . public final class TimeUtils { private TimeUtils() { } private static final long SECOND_MASK = 0xFFFFFFFF00000000L; private static final long HOUR_IN_MILL = TimeUnit.HOURS.toMillis(1); /** * Weather the timestamp is second. * * @param ts timestamp */ public static boolean isSecond(long ts) { return (ts & SECOND_MASK) == 0; } /** * Get the hour. * * @param ms time in millisecond */ public static long getTimeInHour(long ms) { return ms - ms % HOUR_IN_MILL; } } ================================================ FILE: opentsdbreader/src/main/resources/plugin.json ================================================ { "name": "opentsdbreader", "class": "com.alibaba.datax.plugin.reader.opentsdbreader.OpenTSDBReader", "description": { "useScene": "从 OpenTSDB 中摄取数据点", "mechanism": "根据时间和 metric 直连底层 HBase 存储,从而 Scan 出符合条件的数据点", "warn": "指定起止时间会自动忽略分钟和秒,转为整点时刻,例如 2019-4-18 的 [3:35, 4:55) 会被转为 [3:00, 4:00)" }, "developer": "alibaba" } ================================================ FILE: opentsdbreader/src/main/resources/plugin_job_template.json ================================================ { "name": "opentsdbreader", "parameter": { "endpoint": "http://localhost:8242", "column": [ "m" ], "startTime": "2019-01-01 00:00:00", "endTime": "2019-01-01 01:00:00" } } ================================================ FILE: opentsdbreader/src/test/java/com/alibaba/datax/plugin/reader/conn/OpenTSDBConnectionTest.java ================================================ package com.alibaba.datax.plugin.reader.conn; import org.junit.Assert; import org.junit.Ignore; import org.junit.Test; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:OpenTSDB Connection4TSDB Test * * @author Benedict Jin * @since 2019-03-29 */ @Ignore public class OpenTSDBConnectionTest { private static final String OPENTSDB_ADDRESS = "http://localhost:8242"; @Test public void testVersion() { String version = new OpenTSDBConnection(OPENTSDB_ADDRESS).version(); Assert.assertNotNull(version); } @Test public void testIsSupported() { Assert.assertTrue(new OpenTSDBConnection(OPENTSDB_ADDRESS).isSupported()); } } ================================================ FILE: opentsdbreader/src/test/java/com/alibaba/datax/plugin/reader/util/Const.java ================================================ package com.alibaba.datax.plugin.reader.util; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Const * * @author Benedict Jin * @since 2019-03-29 */ final class Const { private Const() { } static final String OPENTSDB_ADDRESS = "http://localhost:8242"; static final String TSDB_ADDRESS = "http://localhost:8240"; } ================================================ FILE: opentsdbreader/src/test/java/com/alibaba/datax/plugin/reader/util/HttpUtilsTest.java ================================================ package com.alibaba.datax.plugin.reader.util; import org.junit.Assert; import org.junit.Ignore; import org.junit.Test; import java.util.HashMap; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:HttpUtils Test * * @author Benedict Jin * @since 2019-03-29 */ @Ignore public class HttpUtilsTest { @Test public void testSimpleCase() throws Exception { String url = "https://httpbin.org/post"; Map params = new HashMap(); params.put("foo", "bar"); String rsp = HttpUtils.post(url, params); System.out.println(rsp); Assert.assertNotNull(rsp); } @Test public void testGet() throws Exception { String url = String.format("%s/api/version", Const.OPENTSDB_ADDRESS); String rsp = HttpUtils.get(url); System.out.println(rsp); Assert.assertNotNull(rsp); } } ================================================ FILE: opentsdbreader/src/test/java/com/alibaba/datax/plugin/reader/util/TSDBTest.java ================================================ package com.alibaba.datax.plugin.reader.util; import org.junit.Assert; import org.junit.Ignore; import org.junit.Test; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Test * * @author Benedict Jin * @since 2019-04-11 */ @Ignore public class TSDBTest { @Test public void testVersion() { String version = TSDBUtils.version(Const.TSDB_ADDRESS); Assert.assertNotNull(version); System.out.println(version); version = TSDBUtils.version(Const.OPENTSDB_ADDRESS); Assert.assertNotNull(version); System.out.println(version); } } ================================================ FILE: opentsdbreader/src/test/java/com/alibaba/datax/plugin/reader/util/TimeUtilsTest.java ================================================ package com.alibaba.datax.plugin.reader.util; import org.junit.Assert; import org.junit.Test; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:com.alibaba.datax.common.util * * @author Benedict Jin * @since 2019-04-22 */ public class TimeUtilsTest { @Test public void testIsSecond() { Assert.assertFalse(TimeUtils.isSecond(System.currentTimeMillis())); Assert.assertTrue(TimeUtils.isSecond(System.currentTimeMillis() / 1000)); } @Test public void testGetTimeInHour() throws ParseException { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); Date date = sdf.parse("2019-04-18 15:32:33"); long timeInHour = TimeUtils.getTimeInHour(date.getTime()); Assert.assertEquals("2019-04-18 15:00:00", sdf.format(timeInHour)); } } ================================================ FILE: oraclereader/doc/oraclereader.md ================================================ # OracleReader 插件文档 ___ ## 1 快速介绍 OracleReader插件实现了从Oracle读取数据。在底层实现上,OracleReader通过JDBC连接远程Oracle数据库,并执行相应的sql语句将数据从Oracle库中SELECT出来。 ## 2 实现原理 简而言之,OracleReader通过JDBC连接器连接到远程的Oracle数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程Oracle数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,OracleReader将其拼接为SQL语句发送到Oracle数据库;对于用户配置querySql信息,Oracle直接将其发送到Oracle数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Oracle数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度 byte/s 尽量逼近这个速度但是不高于它. // channel 表示通道数量,byte表示通道速度,如果单通道速度1MB,配置byte为1048576表示一个channel "byte": 1048576 }, //出错限制 "errorLimit": { //先选择record "record": 0, //百分比 1表示100% "percentage": 0.02 } }, "content": [ { "reader": { "name": "oraclereader", "parameter": { // 数据库连接用户名 "username": "root", // 数据库连接密码 "password": "root", "column": [ "id","name" ], //切分主键 "splitPk": "db_id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:oracle:thin:@[HOST_NAME]:PORT:[DATABASE_NAME]" ] } ] } }, "writer": { //writer类型 "name": "streamwriter", // 是否打印内容 "parameter": { "print": true } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": { "name": "oraclereader", "parameter": { "username": "root", "password": "root", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10" ], "jdbcUrl": [ "jdbc:oracle:thin:@[HOST_NAME]:PORT:[DATABASE_NAME]" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "visible": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,OracleReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,OracleReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 jdbcUrl按照Oracle官方规范,并可以填写连接附件控制信息。具体请参看[Oracle官方文档](http://www.oracle.com/technetwork/database/enterprise-edition/documentation/index.html)。 * 必选:是
* 默认值:无
* **username** * 描述:数据源的用户名
* 必选:是
* 默认值:无
* **password** * 描述:数据源指定用户名的密码
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,OracleReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
* 必选:是
* 默认值:无
* **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照JSON格式: ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] id为普通列名,\`table\`为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 Column必须显示填写,不允许为空! * 必选:是
* 默认值:无
* **splitPk** * 描述:OracleReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形、字符串型数据切分,`不支持浮点、日期等其他类型`。如果用户指定其他非支持类型,OracleReader将报错! splitPk如果不填写,将视作用户不对单表进行切分,OracleReader使用单通道同步全量数据。 * 必选:否
* 默认值:无
* **where** * 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
where条件可以有效地进行业务增量同步。 * 必选:否
* 默认值:无
* **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
`当用户配置querySql时,OracleReader直接忽略table、column、where条件的配置`。 * 必选:否
* 默认值:无
* **fetchSize** * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
`注意,该值过大(>2048)可能造成DataX进程OOM。`。 * 必选:否
* 默认值:1024
* **session** * 描述:控制写入数据的时间格式,时区等的配置,如果表中有时间字段,配置该值以明确告知写入 oracle 的时间格式。通常配置的参数为:NLS_DATE_FORMAT,NLS_TIME_FORMAT。其配置的值为 json 格式,例如: ``` "session": [ "alter session set NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'", "alter session set NLS_TIMESTAMP_FORMAT='yyyy-mm-dd hh24:mi:ss'", "alter session set NLS_TIMESTAMP_TZ_FORMAT='yyyy-mm-dd hh24:mi:ss'", "alter session set TIME_ZONE='US/Pacific'" ] ``` `(注意"是 " 的转义字符串)`。 * 必选:否
* 默认值:无
### 3.3 类型转换 目前OracleReader支持大部分Oracle类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出OracleReader针对Oracle类型转换列表: | DataX 内部类型| Oracle 数据类型 | | -------- | ----- | | Long |NUMBER,INTEGER,INT,SMALLINT| | Double |NUMERIC,DECIMAL,FLOAT,DOUBLE PRECISION,REAL| | String |LONG,CHAR,NCHAR,VARCHAR,VARCHAR2,NVARCHAR2,CLOB,NCLOB,CHARACTER,CHARACTER VARYING,CHAR VARYING,NATIONAL CHARACTER,NATIONAL CHAR,NATIONAL CHARACTER VARYING,NATIONAL CHAR VARYING,NCHAR VARYING | | Date |TIMESTAMP,DATE | | Boolean |bit, bool | | Bytes |BLOB,BFILE,RAW,LONG RAW | 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 为了模拟线上真实数据,我们设计两个Oracle数据表,分别为: #### 4.1.2 机器参数 * 执行DataX的机器参数为: * Oracle数据库机器参数为: ### 4.2 测试报告 #### 4.2.1 表1测试报告 | 并发任务数| DataX速度(Rec/s)|DataX流量|网卡流量|DataX运行负载|DB运行负载| |--------| --------|--------|--------|--------|--------| |1| DataX 统计速度(Rec/s)|DataX统计流量|网卡流量|DataX运行负载|DB运行负载| ## 5 约束限制 ### 5.1 主备同步数据恢复问题 主备同步问题指Oracle使用主从灾备,备库从主库不间断通过binlog恢复数据。由于主备数据同步存在一定的时间差,特别在于某些特定情况,例如网络延迟等问题,导致备库同步恢复的数据与主库有较大差别,导致从备库同步的数据不是一份当前时间的完整镜像。 针对这个问题,我们提供了preSql功能,该功能待补充。 ### 5.2 一致性约束 Oracle在数据存储划分中属于RDBMS系统,对外可以提供强一致性数据查询接口。例如当一次同步任务启动运行过程中,当该库存在其他数据写入方写入数据时,OracleReader完全不会获取到写入更新数据,这是由于数据库本身的快照特性决定的。关于数据库快照特性,请参看[MVCC Wikipedia](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) 上述是在OracleReader单线程模型下数据同步一致性的特性,由于OracleReader可以根据用户配置信息使用了并发数据抽取,因此不能严格保证数据一致性:当OracleReader根据splitPk进行数据切分后,会先后启动多个并发任务完成数据同步。由于多个并发任务相互之间不属于同一个读事务,同时多个并发任务存在时间间隔。因此这份数据并不是`完整的`、`一致的`数据快照信息。 针对多线程的一致性快照需求,在技术上目前无法实现,只能从工程角度解决,工程化的方式存在取舍,我们提供几个解决思路给用户,用户可以自行选择: 1. 使用单线程同步,即不再进行数据切片。缺点是速度比较慢,但是能够很好保证一致性。 2. 关闭其他数据写入方,保证当前数据为静态数据,例如,锁表、关闭备库同步等等。缺点是可能影响在线业务。 ### 5.3 数据库编码问题 OracleReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此OracleReader不需用户指定编码,可以自动获取编码并转码。 对于Oracle底层写入编码和其设定的编码不一致的混乱情况,OracleReader对此无法识别,对此也无法提供解决方案,对于这类情况,`导出有可能为乱码`。 ### 5.4 增量数据同步 OracleReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种: * 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,OracleReader只需要WHERE条件跟上一同步阶段时间戳即可。 * 对于新增流水型数据,OracleReader可以WHERE条件后跟上一阶段最大自增ID即可。 对于业务上无字段区分新增、修改数据情况,OracleReader也无法进行增量数据同步,只能同步全量数据。 ### 5.5 Sql安全性 OracleReader提供querySql语句交给用户自己实现SELECT抽取语句,OracleReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。 ## 6 FAQ *** **Q: OracleReader同步报错,报错信息为XXX** A: 网络或者权限问题,请使用Oracle命令行测试: sqlplus username/password@//host:port/sid 如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 **Q: OracleReader抽取速度很慢怎么办?** A: 影响抽取时间的原因大概有如下几个:(来自专业 DBA 卫绾) 1. 由于SQL的plan异常,导致的抽取时间长; 在抽取时,尽可能使用全表扫描代替索引扫描; 2. 合理sql的并发度,减少抽取时间;根据表的大小, <50G可以不用并发, <100G添加如下hint: parallel(a,2), >100G添加如下hint : parallel(a,4); 3. 抽取sql要简单,尽量不用replace等函数,这个非常消耗cpu,会严重影响抽取速度; ================================================ FILE: oraclereader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 oraclereader oraclereader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.oracle ojdbc6 11.2.0.3 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: oraclereader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/oraclereader target/ oraclereader-0.0.1-SNAPSHOT.jar plugin/reader/oraclereader false plugin/reader/oraclereader/libs runtime ================================================ FILE: oraclereader/src/main/java/com/alibaba/datax/plugin/reader/oraclereader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.oraclereader; public class Constant { public static final int DEFAULT_FETCH_SIZE = 1024; } ================================================ FILE: oraclereader/src/main/java/com/alibaba/datax/plugin/reader/oraclereader/OracleReader.java ================================================ package com.alibaba.datax.plugin.reader.oraclereader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.reader.util.HintUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class OracleReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.Oracle; public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(OracleReader.Job.class); private Configuration originalConfig = null; private CommonRdbmsReader.Job commonRdbmsReaderJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); dealFetchSize(this.originalConfig); this.commonRdbmsReaderJob = new CommonRdbmsReader.Job( DATABASE_TYPE); this.commonRdbmsReaderJob.init(this.originalConfig); // 注意:要在 this.commonRdbmsReaderJob.init(this.originalConfig); 之后执行,这样可以直接快速判断是否是querySql 模式 dealHint(this.originalConfig); } @Override public void preCheck(){ init(); this.commonRdbmsReaderJob.preCheck(this.originalConfig,DATABASE_TYPE); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderJob.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderJob.destroy(this.originalConfig); } private void dealFetchSize(Configuration originalConfig) { int fetchSize = originalConfig.getInt( com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constant.DEFAULT_FETCH_SIZE); if (fetchSize < 1) { throw DataXException .asDataXException(DBUtilErrorCode.REQUIRED_VALUE, String.format("您配置的 fetchSize 有误,fetchSize:[%d] 值不能小于 1.", fetchSize)); } originalConfig.set( com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); } private void dealHint(Configuration originalConfig) { String hint = originalConfig.getString(Key.HINT); if (StringUtils.isNotBlank(hint)) { boolean isTableMode = originalConfig.getBool(com.alibaba.datax.plugin.rdbms.reader.Constant.IS_TABLE_MODE).booleanValue(); if(!isTableMode){ throw DataXException.asDataXException(OracleReaderErrorCode.HINT_ERROR, "当且仅当非 querySql 模式读取 oracle 时才能配置 HINT."); } HintUtil.initHintConf(DATABASE_TYPE, originalConfig); } } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderTask; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderTask = new CommonRdbmsReader.Task( DATABASE_TYPE ,super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderTask.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig .getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderTask.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); } } } ================================================ FILE: oraclereader/src/main/java/com/alibaba/datax/plugin/reader/oraclereader/OracleReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.oraclereader; import com.alibaba.datax.common.spi.ErrorCode; public enum OracleReaderErrorCode implements ErrorCode { HINT_ERROR("Oraclereader-00", "您的 Hint 配置出错."), ; private final String code; private final String description; private OracleReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: oraclereader/src/main/resources/plugin.json ================================================ { "name": "oraclereader", "class": "com.alibaba.datax.plugin.reader.oraclereader.OracleReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: oraclereader/src/main/resources/plugin_job_template.json ================================================ { "name": "oraclereader", "parameter": { "username": "", "password": "", "column": [], "connection": [ { "table": [], "jdbcUrl": [] } ] } } ================================================ FILE: oraclewriter/doc/oraclewriter.md ================================================ # DataX OracleWriter --- ## 1 快速介绍 OracleWriter 插件实现了写入数据到 Oracle 主库的目的表的功能。在底层实现上, OracleWriter 通过 JDBC 连接远程 Oracle 数据库,并执行相应的 insert into ... sql 语句将数据写入 Oracle,内部会分批次提交入库。 OracleWriter 面向ETL开发工程师,他们使用 OracleWriter 从数仓导入数据到 Oracle。同时 OracleWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 OracleWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL语句 * `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行)
注意: 1. 目的表所在数据库必须是主库才能写入数据;整个任务至少需具备 insert into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 2.OracleWriter和MysqlWriter不同,不支持配置writeMode参数。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 Oracle 导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "oraclewriter", "parameter": { "username": "root", "password": "root", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "jdbc:oracle:thin:@[HOST_NAME]:PORT:[DATABASE_NAME]", "table": [ "test" ] } ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。 注意:1、在一个数据库上只能配置一个值。这与 OracleReader 支持多个备库探测不同,因为此处不支持同一个数据库存在多个主库的情况(双主导入数据情况) 2、jdbcUrl按照Oracle官方规范,并可以填写连接附加参数信息。具体请参看 Oracle官方文档或者咨询对应 DBA。 * 必选:是
* 默认值:无
* **username** * 描述:目的数据库的用户名
* 必选:是
* 默认值:无
* **password** * 描述:目的数据库的密码
* 必选:是
* 默认值:无
* **table** * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
* 默认值:无
* **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用*表示, 例如: "column": ["*"] **column配置项必须指定,不能留空!** 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 * 必选:是
* 默认值:否
* **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
* 必选:否
* 默认值:无
* **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
* 必选:否
* 默认值:无
* **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与Oracle的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
* 必选:否
* 默认值:1024
* **session** * 描述:设置oracle连接时的session信息,格式示例如下:
``` "session":[ "alter session set nls_date_format = 'dd.mm.yyyy hh24:mi:ss';" "alter session set NLS_LANG = 'AMERICAN';" ] ``` * 必选:否
* 默认值:无
### 3.3 类型转换 类似 OracleReader ,目前 OracleWriter 支持大部分 Oracle 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出 OracleWriter 针对 Oracle 类型转换列表: | DataX 内部类型| Oracle 数据类型 | | -------- | ----- | | Long |NUMBER,INTEGER,INT,SMALLINT| | Double |NUMERIC,DECIMAL,FLOAT,DOUBLE PRECISION,REAL| | String |LONG,CHAR,NCHAR,VARCHAR,VARCHAR2,NVARCHAR2,CLOB,NCLOB,CHARACTER,CHARACTER VARYING,CHAR VARYING,NATIONAL CHARACTER,NATIONAL CHAR,NATIONAL CHARACTER VARYING,NATIONAL CHAR VARYING,NCHAR VARYING | | Date |TIMESTAMP,DATE | | Boolean |bit, bool | | Bytes |BLOB,BFILE,RAW,LONG RAW | ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: ``` --DROP TABLE PERF_ORACLE_WRITER; CREATE TABLE PERF_ORACLE_WRITER ( COL1 VARCHAR2(255 BYTE) NULL , COL2 NUMBER(32) NULL , COL3 NUMBER(32) NULL , COL4 DATE NULL , COL5 FLOAT NULL , COL6 VARCHAR2(255 BYTE) NULL , COL7 VARCHAR2(255 BYTE) NULL , COL8 VARCHAR2(255 BYTE) NULL , COL9 VARCHAR2(255 BYTE) NULL , COL10 VARCHAR2(255 BYTE) NULL ) LOGGING NOCOMPRESS NOCACHE; ``` 单行记录类似于: ``` col1:485924f6ab7f272af361cd3f7f2d23e0d764942351#$%^&fdafdasfdas%%^(*&^^&* co12:1 co13:1696248667889 co14:2013-01-06 00:00:00 co15:3.141592653578 co16:100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209 co17:100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11fdsafdsfdsa209 co18:100DAFDSAFDSAHOFJDPSAWIFDISHAF;dsadsafdsahfdsajf;dsfdsa;FJDSAL;11209 co19:100dafdsafdsahofjdpsawifdishaf;DSADSAFDSAHFDSAJF;dsfdsa;fjdsal;11209 co110:12~!2345100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209 ``` #### 4.1.2 机器参数 * 执行 DataX 的机器参数为: 1. cpu: 24 Core Intel(R) Xeon(R) CPU E5-2430 0 @ 2.20GHz 2. mem: 94GB 3. net: 千兆双网卡 4. disc: DataX 数据不落磁盘,不统计此项 * Oracle 数据库机器参数为: 1. cpu: 4 Core Intel(R) Xeon(R) CPU E5420 @ 2.50GHz 2. mem: 7GB #### 4.1.3 DataX jvm 参数 -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError #### 4.1.4 性能测试作业配置 ``` { "job": { "setting": { "speed": { "channel": 4 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "sliceRecordCount": 1000000000, "column": [ { "value": "485924f6ab7f272af361cd3f7f2d23e0d764942351#$%^&fdafdasfdas%%^(*&^^&*" }, { "value": 1, "type": "long" }, { "value": "1696248667889", "type": "long" }, { "type": "date", "value": "2013-07-06 00:00:00", "dateFormat": "yyyy-mm-dd hh:mm:ss" }, { "value": "3.141592653578", "type": "double" }, { "value": "100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209" }, { "value": "100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11fdsafdsfdsa209" }, { "value": "100DAFDSAFDSAHOFJDPSAWIFDISHAF;dsadsafdsahfdsajf;dsfdsa;FJDSAL;11209" }, { "value": "100dafdsafdsahofjdpsawifdishaf;DSADSAFDSAHFDSAJF;dsfdsa;fjdsal;11209" }, { "value": "12~!2345100dafdsafdsahofjdpsawifdishaf;dsadsafdsahfdsajf;dsfdsa;fjdsal;11209" } ] } }, "writer": { "name": "oraclewriter", "parameter": { "username": "username", "password": "password", "truncate": "true", "batchSize": "512", "column": [ "col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8", "col9", "col10" ], "connection": [ { "table": [ "PERF_ORACLE_WRITER" ], "jdbcUrl": "jdbc:oracle:thin:@ip:port:dataplat" } ] } } } ] } } ``` ### 4.2 测试报告 #### 4.2.1 测试报告 | 通道数| 批量提交行数| DataX速度(Rec/s)|DataX流量(MB/s)| DataX机器网卡流出流量(MB/s)|DataX机器运行负载|DB网卡进入流量(MB/s)|DB运行负载| |--------|--------| --------|--------|--------|--------|--------|--------| |1|128|15564|6.51|7.5|0.02|7.4|1.08| |1|512|29491|10.90|12.6|0.05|12.4|1.55| |1|1024|31529|11.87|13.5|0.22|13.3|1.58| |1|2048|33469|12.57|14.3|0.17|14.3|1.53| |1|4096|31363|12.48|13.4|0.10|10.0|1.72| |4|10|9440|4.05|5.6|0.01|5.0|3.75| |4|128|42832|16.48|18.3|0.07|18.5|2.89| |4|512|46643|20.02|22.7|0.35|21.1|3.31| |4|1024|39116|16.79|18.7|0.10|18.1|3.05| |4|2048|39526|16.96|18.5|0.32|17.1|2.86| |4|4096|37683|16.17|17.2|0.23|15.5|2.26| |8|128|38336|16.45|17.5|0.13|16.2|3.87| |8|512|31078|13.34|14.9|0.11|13.4|2.09| |8|1024|37888|16.26|18.5|0.20|18.5|3.14| |8|2048|38502|16.52|18.5|0.18|18.5|2.96| |8|4096|38092|16.35|18.3|0.10|17.8|3.19| |16|128|35366|15.18|16.9|0.13|15.6|3.49| |16|512|35584|15.27|16.8|0.23|17.4|3.05| |16|1024|38297|16.44|17.5|0.20|17.0|3.42| |16|2048|28467|12.22|13.8|0.10|12.4|3.38| |16|4096|27852|11.95|12.3|0.11|12.3|3.86| |32|1024|34406|14.77|15.4|0.09|15.4|3.55| 1. `batchSize 和 通道个数,对性能影响较大` 2. `通常不建议写入数据库时,通道个数 >32` ## 5 约束限制 ## FAQ *** **Q: OracleWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 *** **Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。第二种,向临时表导入数据,完成后再 rename 到线上表。 *** **Q: 上面第二种方法可以避免对线上数据造成影响,那我具体怎样操作?** A: 可以配置临时表导入 ================================================ FILE: oraclewriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT oraclewriter oraclewriter jar writer data into oracle database com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.oracle ojdbc6 11.2.0.3 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: oraclewriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/oraclewriter target/ oraclewriter-0.0.1-SNAPSHOT.jar plugin/writer/oraclewriter false plugin/writer/oraclewriter/libs runtime ================================================ FILE: oraclewriter/src/main/java/com/alibaba/datax/plugin/writer/oraclewriter/OracleWriter.java ================================================ package com.alibaba.datax.plugin.writer.oraclewriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.util.List; public class OracleWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.Oracle; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterJob; public void preCheck() { this.init(); this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE); } @Override public void init() { this.originalConfig = super.getPluginJobConf(); // warn:not like mysql, oracle only support insert mode, don't use String writeMode = this.originalConfig.getString(Key.WRITE_MODE); if (null != writeMode) { throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "写入模式(writeMode)配置错误. 因为Oracle不支持配置项 writeMode: %s, Oracle只能使用insert sql 插入数据. 请检查您的配置并作出修改", writeMode)); } this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job( DATABASE_TYPE); this.commonRdbmsWriterJob.init(this.originalConfig); } @Override public void prepare() { //oracle实跑先不做权限检查 //this.commonRdbmsWriterJob.privilegeValid(this.originalConfig, DATABASE_TYPE); this.commonRdbmsWriterJob.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterJob.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterTask; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterTask = new CommonRdbmsWriter.Task(DATABASE_TYPE); this.commonRdbmsWriterTask.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); } public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterTask.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); } } } ================================================ FILE: oraclewriter/src/main/java/com/alibaba/datax/plugin/writer/oraclewriter/OracleWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.oraclewriter; import com.alibaba.datax.common.spi.ErrorCode; public enum OracleWriterErrorCode implements ErrorCode { ; private final String code; private final String describe; private OracleWriterErrorCode(String code, String describe) { this.code = code; this.describe = describe; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.describe; } @Override public String toString() { return String.format("Code:[%s], Describe:[%s]. ", this.code, this.describe); } } ================================================ FILE: oraclewriter/src/main/resources/plugin.json ================================================ { "name": "oraclewriter", "class": "com.alibaba.datax.plugin.writer.oraclewriter.OracleWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: oraclewriter/src/main/resources/plugin_job_template.json ================================================ { "name": "oraclewriter", "parameter": { "username": "", "password": "", "column": [], "preSql": [], "connection": [ { "jdbcUrl": "", "table": [] } ] } } ================================================ FILE: oscarwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 oscarwriter oscarwriter jar writer data into oscar database com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.csicit.thirdparty oscar 1.0.1 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: oscarwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/oscarwriter src/main/lib oscarJDBC.jar plugin/writer/oscarwriter/libs target/ oscarwriter-0.0.1-SNAPSHOT.jar plugin/writer/oscarwriter false plugin/writer/oscarwriter/libs runtime ================================================ FILE: oscarwriter/src/main/java/com/alibaba/datax/plugin/writer/oscarwriter/OscarWriter.java ================================================ package com.alibaba.datax.plugin.writer.oscarwriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import java.util.List; public class OscarWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.Oscar; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterJob; @Override public void preCheck() { this.init(); this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE); } @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job( DATABASE_TYPE); this.commonRdbmsWriterJob.init(this.originalConfig); } @Override public void prepare() { this.commonRdbmsWriterJob.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterJob.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterTask; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterTask = new CommonRdbmsWriter.Task(DATABASE_TYPE); this.commonRdbmsWriterTask.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); } @Override public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterTask.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); } } } ================================================ FILE: oscarwriter/src/main/java/com/alibaba/datax/plugin/writer/oscarwriter/OscarWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.oscarwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum OscarWriterErrorCode implements ErrorCode { ; private final String code; private final String describe; private OscarWriterErrorCode(String code, String describe) { this.code = code; this.describe = describe; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.describe; } @Override public String toString() { return String.format("Code:[%s], Describe:[%s]. ", this.code, this.describe); } } ================================================ FILE: oscarwriter/src/main/resources/plugin.json ================================================ { "name": "oscarwriter", "class": "com.alibaba.datax.plugin.writer.oscarwriter.OscarWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: oscarwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "oscarwriter", "parameter": { "username": "", "password": "", "column": [], "preSql": [], "connection": [ { "jdbcUrl": "", "table": [] } ] } } ================================================ FILE: ossreader/doc/ossreader.md ================================================ # DataX OSSReader 说明 ------------ ## 1 快速介绍 OSSReader提供了读取OSS数据存储的能力。在底层实现上,OSSReader使用OSS官方Java SDK获取OSS数据,并转换为DataX传输协议传递给Writer。 * OSS 产品介绍, 参看[[阿里云OSS Portal](http://www.aliyun.com/product/oss)] * OSS Java SDK, 参看[[阿里云OSS Java SDK](http://oss.aliyuncs.com/aliyun_portal_storage/help/oss/OSS_Java_SDK_Dev_Guide_20141113.pdf)] ## 2 功能与限制 OSSReader实现了从OSS读取数据并转为DataX协议的功能,OSS本身是无结构化数据存储,对于DataX而言,OSSReader实现上类比TxtFileReader,有诸多相似之处。目前OSSReader支持功能如下: 1. 支持且仅支持读取TXT的文件,且要求TXT中shema为一张二维表。 2. 支持类CSV格式文件,自定义分隔符。 3. 支持多种类型数据读取(使用String表示),支持列裁剪,支持列常量 4. 支持递归读取、支持文件名过滤。 5. 支持文本压缩,现有压缩格式为zip、gzip、bzip2。注意,一个压缩包不允许多文件打包压缩。 6. 多个object可以支持并发读取。 7. 支持读取 parquet orc 文件 我们暂时不能做到: 1. 单个Object(File)支持多线程并发读取,这里涉及到单个Object内部切分算法。二期考虑支持。 2. 单个Object在压缩情况下,从技术上无法支持多线程并发读取。 ## 3 功能说明 ### 3.1 配置样例 读取 txt, csv 格式样例 ```json { "job": { "setting": {}, "content": [ { "reader": { "name": "ossreader", "parameter": { "endpoint": "http://oss.aliyuncs.com", "accessId": "", "accessKey": "", "bucket": "myBucket", "object": [ "bazhen/*" ], "column": [ { "type": "long", "index": 0 }, { "type": "string", "value": "alibaba" }, { "type": "date", "index": 1, "format": "yyyy-MM-dd" } ], "encoding": "UTF-8", "fieldDelimiter": "\t", "compress": "gzip" } }, "writer": {} } ] } } ``` 读取 orc 格式样例 ```json { "stepType": "oss", "parameter": { "endpoint": "http://oss.aliyuncs.com", "accessId": "", "accessKey": "", "bucket": "myBucket", "fileFormat": "orc", "path": "/tests/case61/orc__691b6815_9260_4037_9899_****", "column": [ { "index": 0, "type": "long" }, { "index": "1", "type": "string" }, { "index": "2", "type": "string" } ] } } ``` 读取 parquet 格式样例 ```json { "stepType": "oss", "parameter": { "endpoint": "http://oss.aliyuncs.com", "accessId": "", "accessKey": "", "bucket": "myBucket", "fileFormat": "parquet", "path": "/parquet", "parquetSchema":"message m { optional BINARY registration_dttm (UTF8); optional Int64 id; optional BINARY first_name (UTF8); optional BINARY last_name (UTF8); optional BINARY email (UTF8); optional BINARY gender (UTF8); optional BINARY ip_address (UTF8); optional BINARY cc (UTF8); optional BINARY country (UTF8); optional BINARY birthdate (UTF8); optional DOUBLE salary; optional BINARY title (UTF8); optional BINARY comments (UTF8); }", "column": [ { "index": 0, "type": "long" }, { "index": "1", "type": "string" }, { "index": "2", "type": "string" } ] } } ``` ### 3.2 参数说明 * **endpoint** * 描述:OSS Server的EndPoint地址,例如http://oss.aliyuncs.com。 * 必选:是
* 默认值:无
* **accessId** * 描述:OSS的accessId
* 必选:是
* 默认值:无
* **accessKey** * 描述:OSS的accessKey
* 必选:是
* 默认值:无
* **bucket** * 描述:OSS的bucket
* 必选:是
* 默认值:无
* **object** * 描述:OSS的object信息,注意这里可以支持填写多个Object。
当指定单个OSS Object,OSSReader暂时只能使用单线程进行数据抽取。二期考虑在非压缩文件情况下针对单个Object可以进行多线程并发读取。 当指定多个OSS Object,OSSReader支持使用多线程进行数据抽取。线程并发数通过通道数指定。 当指定通配符,OSSReader尝试遍历出多个Object信息。例如: 指定/*代表读取bucket下游所有的Object,指定/bazhen/\*代表读取bazhen目录下游所有的Object。 **特别需要注意的是,DataX会将一个作业下同步的所有Object视作同一张数据表。用户必须自己保证所有的Object能够适配同一套schema信息。** * 必选:是
* 默认值:无
* **column** * 描述:读取字段列表,type指定源数据的类型,index指定当前列来自于文本第几列(以0开始),value指定当前类型为常量,不从源头文件读取数据,而是根据value值自动生成对应的列。
默认情况下,用户可以全部按照String类型读取数据,配置如下: ```json "column": ["*"] ``` 用户可以指定Column字段信息,配置如下: ```json { "type": "long", "index": 0 //从OSS文本第一列获取int字段 }, { "type": "string", "value": "alibaba" //从OSSReader内部生成alibaba的字符串字段作为当前字段 } ``` 对于用户指定Column信息,type必须填写,index/value必须选择其一。 * 必选:是
* 默认值:全部按照string类型读取
* **fieldDelimiter** * 描述:读取的字段分隔符
* 必选:是
* 默认值:,
* **compress** * 描述:文本压缩类型,默认不填写意味着没有压缩。支持压缩类型为zip、gzip、bzip2。
* 必选:否
* 默认值:不压缩
* **encoding** * 描述:读取文件的编码配置,目前只支持utf-8/gbk。
* 必选:否
* 默认值:utf-8
* **nullFormat** * 描述:文本文件中无法使用标准字符串定义null(空指针),DataX提供nullFormat定义哪些字符串可以表示为null。
例如如果用户配置: nullFormat="\N",那么如果源头数据是"\N",DataX视作null字段。 * 必选:否
* 默认值:\N
* **skipHeader** * 描述:类CSV格式文件可能存在表头为标题情况,需要跳过。默认不跳过。
* 必选:否
* 默认值:false
* **csvReaderConfig** * 描述:读取CSV类型文件参数配置,Map类型。读取CSV类型文件使用的CsvReader进行读取,会有很多配置,不配置则使用默认值。
* 必选:否
* 默认值:无
常见配置: ```json "csvReaderConfig":{ "safetySwitch": false, "skipEmptyRecords": false, "useTextQualifier": false } ``` 所有配置项及默认值,配置时 csvReaderConfig 的map中请**严格按照以下字段名字进行配置**: ``` boolean caseSensitive = true; char textQualifier = 34; boolean trimWhitespace = true; boolean useTextQualifier = true;//是否使用csv转义字符 char delimiter = 44;//分隔符 char recordDelimiter = 0; char comment = 35; boolean useComments = false; int escapeMode = 1; boolean safetySwitch = true;//单列长度是否限制100000字符 boolean skipEmptyRecords = true;//是否跳过空行 boolean captureRawRecord = true; ``` ### 3.3 类型转换 OSS本身不提供数据类型,该类型是DataX OSSReader定义: | DataX 内部类型| OSS 数据类型 | | -------- | ----- | | Long |Long | | Double |Double| | String |String| | Boolean |Boolean | | Date |Date | 其中: * OSS Long是指OSS文本中使用整形的字符串表示形式,例如"19901219"。 * OSS Double是指OSS文本中使用Double的字符串表示形式,例如"3.1415"。 * OSS Boolean是指OSS文本中使用Boolean的字符串表示形式,例如"true"、"false"。不区分大小写。 * OSS Date是指OSS文本中使用Date的字符串表示形式,例如"2014-12-31",Date可以指定format格式。 ## 4 性能报告 |并发数|DataX 流量|Datax 记录数| |--------|--------| --------| |1| 971.40KB/s |10047rec/s | |2| 1.81MB/s | 19181rec/s | |4| 3.46MB/s| 36695rec/s | |8| 6.57MB/s | 69289 records/s | |16|7.92MB/s| 83920 records/s| |32|7.87MB/s| 83350 records/s| ## 5 约束限制 略 ## 6 FAQ 略 ================================================ FILE: ossreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT ossreader ossreader jar org.apache.logging.log4j log4j-api 2.17.1 org.apache.logging.log4j log4j-core 2.17.1 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 com.aliyun.oss aliyun-sdk-oss 3.4.2 junit junit test com.alibaba.datax hdfsreader 0.0.1-SNAPSHOT compile maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: ossreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/ossreader target/ ossreader-0.0.1-SNAPSHOT.jar plugin/reader/ossreader false plugin/reader/ossreader/libs runtime ================================================ FILE: ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.ossreader; /** * Created by mengxin.liumx on 2014/12/7. */ public class Constant { public static final String OBJECT = "object"; public static final int SOCKETTIMEOUT = 5000000; } ================================================ FILE: ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.ossreader; /** * Created by mengxin.liumx on 2014/12/7. */ public class Key { public static final String ENDPOINT = "endpoint"; public static final String ACCESSID = "accessId"; public static final String ACCESSKEY = "accessKey"; public static final String ENCODING = "encoding"; public static final String BUCKET = "bucket"; public static final String OBJECT = "object"; public static final String CNAME = "cname"; public static final String SUCCESS_ON_NO_Object = "successOnNoObject"; public static final String PROXY_HOST = "proxyHost"; public static final String PROXY_PORT = "proxyPort"; public static final String PROXY_USERNAME = "proxyUsername"; public static final String PROXY_PASSWORD = "proxyPassword"; public static final String PROXY_DOMAIN = "proxyDomain"; public static final String PROXY_WORKSTATION = "proxyWorkstation"; public static final String HDOOP_CONFIG = "hadoopConfig"; public static final String FS_OSS_ACCESSID = "fs.oss.accessKeyId"; public static final String FS_OSS_ACCESSKEY = "fs.oss.accessKeySecret"; public static final String FS_OSS_ENDPOINT = "fs.oss.endpoint"; /*判断分片是否均匀的标准,是否有分片长度超出平均值的百分比*/ public static final String BALANCE_THRESHOLD = "balanceThreshold"; } ================================================ FILE: ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/OssInputStream.java ================================================ package com.alibaba.datax.plugin.reader.ossreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.RetryUtil; import com.aliyun.oss.OSSClient; import com.aliyun.oss.model.GetObjectRequest; import com.aliyun.oss.model.OSSObject; import org.apache.commons.io.IOUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.io.InputStream; import java.util.concurrent.Callable; /** * @Author: guxuan * @Date 2022-05-17 15:52 */ public class OssInputStream extends InputStream { private final OSSClient ossClient; private GetObjectRequest getObjectRequest; private long startIndex = 0; private long endIndex = -1; private InputStream inputStream; /** * retryTimes : 重试次数, 默认值是60次; * description: 能够cover住的网络断连时间= retryTimes*(socket_timeout+sleepTime); * 默认cover住的网络断连时间= 60*(5+5) = 600秒. */ private int retryTimes = 60; private static final Logger LOG = LoggerFactory.getLogger(OssInputStream.class); /** * 如果start为0, end为1000, inputstream范围是[0,1000],共1001个字节 * * @param ossClient * @param bucket * @param object * @param start inputstream start index * @param end inputstream end index */ public OssInputStream(final OSSClient ossClient, final String bucket, final String object, long start, long end) { this.ossClient = ossClient; this.getObjectRequest = new GetObjectRequest(bucket, object); this.startIndex = start; this.getObjectRequest.setRange(this.startIndex, end); this.endIndex = end; try { RetryUtil.executeWithRetry(new Callable() { @Override public Boolean call() throws Exception { OSSObject ossObject = ossClient.getObject(getObjectRequest); // 读取InputStream inputStream = ossObject.getObjectContent(); return true; } }, this.retryTimes, 5000, false); } catch (Exception e) { throw DataXException.asDataXException( OssReaderErrorCode.RUNTIME_EXCEPTION,e.getMessage(), e); } } public OssInputStream(final OSSClient ossClient, final String bucket, final String object) { this.ossClient = ossClient; this.getObjectRequest = new GetObjectRequest(bucket, object); this.getObjectRequest.setRange(startIndex, -1); try { RetryUtil.executeWithRetry(new Callable() { @Override public Boolean call() throws Exception { OSSObject ossObject = ossClient.getObject(getObjectRequest); // 读取InputStream inputStream = ossObject.getObjectContent(); return true; } }, this.retryTimes, 5000, false); } catch (Exception e) { throw DataXException.asDataXException( OssReaderErrorCode.RUNTIME_EXCEPTION, e.getMessage(), e); } } @Override public int read() throws IOException { int cbyte; try { cbyte = RetryUtil.executeWithRetry(new Callable() { @Override public Integer call() throws Exception { try { int c = inputStream.read(); startIndex++; return c; } catch (Exception e) { LOG.warn(e.getMessage(),e); /** * 必须将inputStream先关闭, 否则会造成连接泄漏 */ IOUtils.closeQuietly(inputStream); // getOssRangeInuptStream时,如果网络不连通,则会抛出异常,RetryUtil捕获异常进行重试 inputStream = getOssRangeInuptStream(startIndex); int c = inputStream.read(); startIndex++; return c; } } }, this.retryTimes,5000, false); return cbyte; } catch (Exception e) { throw DataXException.asDataXException( OssReaderErrorCode.RUNTIME_EXCEPTION, e.getMessage(), e); } } private InputStream getOssRangeInuptStream(final long startIndex) { LOG.info("Start to retry reading [inputStream] from Byte {}", startIndex); // 第二个参数值设为-1,表示不设置结束的字节位置,读取startIndex及其以后的所有数据 getObjectRequest.setRange(startIndex, this.endIndex); // 范围下载 OSSObject ossObject = ossClient.getObject(getObjectRequest); // 读取InputStream LOG.info("Start to retry reading [inputStream] from Byte {}", startIndex); return ossObject.getObjectContent(); } } ================================================ FILE: ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/OssReader.java ================================================ package com.alibaba.datax.plugin.reader.ossreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.hdfsreader.HdfsReader; import com.alibaba.datax.plugin.reader.ossreader.util.HdfsParquetUtil; import com.alibaba.datax.plugin.reader.ossreader.util.OssSplitUtil; import com.alibaba.datax.plugin.reader.ossreader.util.OssUtil; import com.alibaba.datax.plugin.unstructuredstorage.FileFormat; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; import com.alibaba.datax.plugin.unstructuredstorage.reader.binaryFileUtil.BinaryFileReaderUtil; import com.alibaba.datax.plugin.unstructuredstorage.reader.split.StartEndPair; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import com.aliyun.oss.ClientException; import com.aliyun.oss.OSSClient; import com.aliyun.oss.OSSException; import com.aliyun.oss.model.ListObjectsRequest; import com.aliyun.oss.model.OSSObjectSummary; import com.aliyun.oss.model.ObjectListing; import com.aliyun.oss.model.ObjectMetadata; import org.apache.commons.lang3.tuple.MutablePair; import org.apache.commons.lang3.tuple.Pair; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.InputStream; import java.util.ArrayList; import java.util.List; import java.util.Locale; import java.util.regex.Pattern; public class OssReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(OssReader.Job.class); private Configuration readerOriginConfig = null; private OSSClient ossClient = null; private String endpoint; private String accessId; private String accessKey; private String bucket; private boolean successOnNoObject; private Boolean isBinaryFile; private List objects; private List> objectSizePairs; /*用于任务切分的依据*/ private String fileFormat; private HdfsReader.Job hdfsReaderJob; private boolean useHdfsReaderProxy = false; @Override public void init() { LOG.debug("init() begin..."); this.readerOriginConfig = this.getPluginJobConf(); this.basicValidateParameter(); this.fileFormat = this.readerOriginConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.FILE_FORMAT, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.DEFAULT_FILE_FORMAT); this.useHdfsReaderProxy = HdfsParquetUtil.isUseHdfsWriterProxy(this.fileFormat); if(useHdfsReaderProxy){ HdfsParquetUtil.adaptConfiguration(this.readerOriginConfig); this.hdfsReaderJob = new HdfsReader.Job(); this.hdfsReaderJob.setJobPluginCollector(this.getJobPluginCollector()); this.hdfsReaderJob.setPeerPluginJobConf(this.getPeerPluginJobConf()); this.hdfsReaderJob.setPeerPluginName(this.getPeerPluginName()); this.hdfsReaderJob.setPluginJobConf(this.getPluginJobConf()); this.hdfsReaderJob.init(); return; } this.isBinaryFile = FileFormat.getFileFormatByConfiguration(this.readerOriginConfig).isBinary(); this.validate(); UnstructuredStorageReaderUtil.validateCsvReaderConfig(this.readerOriginConfig); this.successOnNoObject = this.readerOriginConfig.getBool( Key.SUCCESS_ON_NO_Object, false); LOG.debug("init() ok and end..."); } private void basicValidateParameter(){ endpoint = this.readerOriginConfig.getString(Key.ENDPOINT); if (StringUtils.isBlank(endpoint)) { throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION,"invalid endpoint"); } accessId = this.readerOriginConfig.getString(Key.ACCESSID); if (StringUtils.isBlank(accessId)) { throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "invalid accessId"); } accessKey = this.readerOriginConfig.getString(Key.ACCESSKEY); if (StringUtils.isBlank(accessKey)) { throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "invalid accessKey"); } } // warn: 提前验证endpoint,accessId,accessKey,bucket,object的有效性 private void validate() { // fxxk // ossClient = new OSSClient(endpoint,accessId,accessKey); ossClient = OssUtil.initOssClient(this.readerOriginConfig); bucket = this.readerOriginConfig.getString(Key.BUCKET); if (StringUtils.isBlank(bucket)) { throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "invalid bucket"); }else if(!ossClient.doesBucketExist(bucket)){ throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "invalid bucket"); } String object = this.readerOriginConfig.getString(Key.OBJECT); if (StringUtils.isBlank(object)) { throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "invalid object"); } if (this.isBinaryFile){ return; } UnstructuredStorageReaderUtil.validateParameter(this.readerOriginConfig); } @Override public void prepare() { if(useHdfsReaderProxy){ this.hdfsReaderJob.prepare(); return; } // 将每个单独的 object 作为一个 slice this.objectSizePairs = parseOriginObjectSizePairs(readerOriginConfig.getList(Key.OBJECT, String.class)); this.objects = parseOriginObjects(readerOriginConfig.getList(Key.OBJECT, String.class)); UnstructuredStorageReaderUtil.setSourceFileName(readerOriginConfig, this.objects); UnstructuredStorageReaderUtil.setSourceFile(readerOriginConfig, this.objects); } @Override public void post() { if(useHdfsReaderProxy){ this.hdfsReaderJob.post(); return; } LOG.debug("post()"); } @Override public void destroy() { if(useHdfsReaderProxy){ this.hdfsReaderJob.destroy(); return; } LOG.debug("destroy()"); } @Override public List split(int adviceNumber) { LOG.debug("split() begin..."); if(useHdfsReaderProxy){ return hdfsReaderJob.split(adviceNumber); } List readerSplitConfigs; if (0 == objects.size() && this.successOnNoObject) { readerSplitConfigs = new ArrayList(); Configuration splitedConfig = this.readerOriginConfig.clone(); splitedConfig.set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.SPLIT_SLICE_CONFIG, null); readerSplitConfigs.add(splitedConfig); LOG.info(String.format("no OSS object to be read")); LOG.debug("split() ok and end..."); return readerSplitConfigs; }else if (0 == objects.size()) { throw DataXException.asDataXException( OssReaderErrorCode.EMPTY_BUCKET_EXCEPTION, String.format("Unable to find the object to read. Please confirm your configured item [bucket]: %s object: %s", this.readerOriginConfig.get(Key.BUCKET), this.readerOriginConfig.get(Key.OBJECT))); } /** * 当文件类型是text纯文本文件,并且不是压缩的情况下, * 可以对纯文本文件进行内部切分实现并发读取, 如果用户不希望对文件拆分, 可以指定fileFormat为csv * * 注意:这里判断文件是否为text以及是否压缩,信息都是通过任务配置项来获取的 * * 这里抽出一个方法来判断是否需要分片 * */ OssSplitUtil ossFileSplit = new OssSplitUtil(this.ossClient, this.bucket); long t1 = System.currentTimeMillis(); readerSplitConfigs = ossFileSplit.getSplitedConfigurations(this.readerOriginConfig, this.objectSizePairs, adviceNumber); long t2 = System.currentTimeMillis(); LOG.info("all split done, cost {}ms", t2 - t1); /** * 在日志中告知用户,为什么实际datax切分跑的channel数会小于用户配置的channel数 * 注意:这里的报告的原因不准确,报的原因是一个文件一个task,所以最终切分数小于用户配置数,实际原因还有单文件切分时, * 单文件的大小太小(理论64M一个block),导致问题比较少 */ if(readerSplitConfigs.size() < adviceNumber){ LOG.info("[Note]: During OSSReader data synchronization, one file can only be synchronized in one task. You want to synchronize {} files " + "and the number is less than the number of channels you configured: {}. " + "Therefore, please take note that DataX will actually have {} sub-tasks, that is, the actual concurrent channels = {}", objects.size(), adviceNumber, objects.size(), objects.size()); } LOG.info("split() ok and end..."); return readerSplitConfigs; } private List parseOriginObjects(List originObjects) { List objList = new ArrayList<>(); if (this.objectSizePairs == null) { this.objectSizePairs = parseOriginObjectSizePairs(originObjects); } for (Pair objSizePair : this.objectSizePairs) { objList.add(objSizePair.getKey()); } return objList; } private List> parseOriginObjectSizePairs(List originObjects) { List> parsedObjectSizePaires = new ArrayList>(); for (String object : originObjects) { int firstMetaChar = (object.indexOf('*') > object.indexOf('?')) ? object .indexOf('*') : object.indexOf('?'); if (firstMetaChar != -1) { int lastDirSeparator = object.lastIndexOf( IOUtils.DIR_SEPARATOR, firstMetaChar); String parentDir = object .substring(0, lastDirSeparator + 1); List> allRemoteObjectSizePairs = getAllRemoteObjectsKeyAndSizeInDir(parentDir); Pattern pattern = Pattern.compile(object.replace("*", ".*") .replace("?", ".?")); for (Pair remoteObjectSizePair : allRemoteObjectSizePairs) { if (pattern.matcher(remoteObjectSizePair.getKey()).matches()) { parsedObjectSizePaires.add(remoteObjectSizePair); LOG.info(String .format("add object [%s] as a candidate to be read.", remoteObjectSizePair.getKey())); } } } else { // 如果没有配正则匹配,那么需要对用户自己配置的object存在性进行检测 try{ ossClient.getObject(bucket, object); ObjectMetadata objMeta = ossClient.getObjectMetadata(bucket, object); parsedObjectSizePaires.add(new MutablePair(object, objMeta.getContentLength() <= OssSplitUtil.SINGLE_FILE_SPLIT_THRESHOLD_IN_SIZE ? -1L : objMeta.getContentLength())); LOG.info(String.format( "add object [%s] as a candidate to be read.", object)); }catch (Exception e){ trackOssDetailException(e, object); } } } return parsedObjectSizePaires; } // 对oss配置异常信息进行细分定位 private void trackOssDetailException(Exception e, String object){ // 对异常信息进行细分定位 String errorMessage = e.getMessage(); if(StringUtils.isNotBlank(errorMessage)){ if(errorMessage.contains("UnknownHost")){ // endPoint配置错误 throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "The endpoint you configured is not correct. Please check the endpoint configuration", e); }else if(errorMessage.contains("InvalidAccessKeyId")){ // accessId配置错误 throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "The accessId you configured is not correct. Please check the accessId configuration", e); }else if(errorMessage.contains("SignatureDoesNotMatch")){ // accessKey配置错误 throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "The accessKey you configured is not correct. Please check the accessId configuration", e); }else if(errorMessage.contains("NoSuchKey")){ if (e instanceof OSSException) { OSSException ossException = (OSSException) e; if ("NoSuchKey".equalsIgnoreCase(ossException .getErrorCode()) && this.successOnNoObject) { LOG.warn(String.format("oss file %s is not exits to read:", object), e); return; } } // object配置错误 throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "The object you configured is not correct. Please check the accessId configuration"); }else{ // 其他错误 throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, String.format("Please check whether the configuration of [endpoint], [accessId], [accessKey], [bucket], and [object] are correct. Error reason: %s",e.getMessage()), e); } }else{ throw DataXException.asDataXException( OssReaderErrorCode.CONFIG_INVALID_EXCEPTION, "The configured json is invalid", e); } } private List> getAllRemoteObjectsKeyAndSizeInDir(String parentDir) throws OSSException, ClientException{ List> objectSizePairs = new ArrayList>(); List objectListings = getRemoteObjectListings(parentDir); if (objectListings.size() == 0) { return objectSizePairs; } for (ObjectListing objectList : objectListings){ for (OSSObjectSummary objectSummary : objectList.getObjectSummaries()) { Pair objNameSize = new MutablePair(objectSummary.getKey(), objectSummary.getSize() <= OssSplitUtil.SINGLE_FILE_SPLIT_THRESHOLD_IN_SIZE ? -1L : objectSummary.getSize()); objectSizePairs.add(objNameSize); } } return objectSizePairs; } private List getRemoteObjectListings(String parentDir) throws OSSException, ClientException { List remoteObjectListings = new ArrayList(); LOG.debug("Parent folder: {}", parentDir); List remoteObjects = new ArrayList(); OSSClient client = OssUtil.initOssClient(readerOriginConfig); try { ListObjectsRequest listObjectsRequest = new ListObjectsRequest( readerOriginConfig.getString(Key.BUCKET)); listObjectsRequest.setPrefix(parentDir); ObjectListing remoteObjectList; do { remoteObjectList = client.listObjects(listObjectsRequest); if (null != remoteObjectList) { LOG.info("ListObjects prefix: {} requestId: {}", remoteObjectList.getPrefix(), remoteObjectList.getRequestId()); } else { LOG.info("ListObjectsRequest get null"); } remoteObjectListings.add(remoteObjectList); listObjectsRequest.setMarker(remoteObjectList.getNextMarker()); LOG.debug(listObjectsRequest.getMarker()); LOG.debug(String.valueOf(remoteObjectList.isTruncated())); } while (remoteObjectList.isTruncated()); } catch (Exception e) { trackOssDetailException(e, null); } return remoteObjectListings; } } public static class Task extends Reader.Task { private static Logger LOG = LoggerFactory.getLogger(Reader.Task.class); private Configuration readerSliceConfig; private Boolean isBinaryFile; private Integer blockSizeInByte; private List allWorksForTask; private boolean originSkipHeader; private OSSClient ossClient; private String fileFormat; private HdfsReader.Task hdfsReaderTask; private boolean useHdfsReaderProxy = false; @Override public void init() { this.readerSliceConfig = this.getPluginJobConf(); this.fileFormat = this.readerSliceConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.FILE_FORMAT, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.DEFAULT_FILE_FORMAT); this.useHdfsReaderProxy = HdfsParquetUtil.isUseHdfsWriterProxy(this.fileFormat); if(useHdfsReaderProxy){ this.hdfsReaderTask = new HdfsReader.Task(); this.hdfsReaderTask.setPeerPluginJobConf(this.getPeerPluginJobConf()); this.hdfsReaderTask.setPeerPluginName(this.getPeerPluginName()); this.hdfsReaderTask.setPluginJobConf(this.getPluginJobConf()); this.hdfsReaderTask.setReaderPluginSplitConf(this.getReaderPluginSplitConf()); this.hdfsReaderTask.setTaskGroupId(this.getTaskGroupId()); this.hdfsReaderTask.setTaskId(this.getTaskId()); this.hdfsReaderTask.setTaskPluginCollector(this.getTaskPluginCollector()); this.hdfsReaderTask.init(); return; } String allWorksForTaskStr = this.readerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.SPLIT_SLICE_CONFIG); if (StringUtils.isBlank(allWorksForTaskStr)) { allWorksForTaskStr = "[]"; } this.allWorksForTask = JSON.parseObject(allWorksForTaskStr, new TypeReference>() { }); this.isBinaryFile = FileFormat.getFileFormatByConfiguration(this.readerSliceConfig).isBinary(); this.blockSizeInByte = this.readerSliceConfig.getInt( com.alibaba.datax.plugin.unstructuredstorage.reader.Key.BLOCK_SIZE_IN_BYTE, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.DEFAULT_BLOCK_SIZE_IN_BYTE); this.originSkipHeader = this.readerSliceConfig .getBool(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.SKIP_HEADER, false); } @Override public void prepare() { LOG.info("task prepare() begin..."); if(useHdfsReaderProxy){ this.hdfsReaderTask.prepare(); return; } } @Override public void startRead(RecordSender recordSender) { if(useHdfsReaderProxy){ this.hdfsReaderTask.startRead(recordSender); return; } boolean successOnNoObject = this.readerSliceConfig.getBool(Key.SUCCESS_ON_NO_Object, false); if (this.allWorksForTask.isEmpty() && successOnNoObject) { recordSender.flush(); return; } String bucket = this.readerSliceConfig.getString(Key.BUCKET); this.ossClient = OssUtil.initOssClient(this.readerSliceConfig); for (StartEndPair eachSlice : this.allWorksForTask) { String object = eachSlice.getFilePath(); Long start = eachSlice.getStart(); Long end = eachSlice.getEnd(); LOG.info(String.format("read bucket=[%s] object=[%s], range: [start=%s, end=%s] start...", bucket, object, start, end)); InputStream ossInputStream = new OssInputStream(ossClient, bucket, object, start, end); // 检查是否要跳过表头, 防止重复跳过首行 Boolean skipHeaderValue = this.originSkipHeader && (0L == start); this.readerSliceConfig.set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.SKIP_HEADER, skipHeaderValue); try { if (!this.isBinaryFile) { UnstructuredStorageReaderUtil.readFromStream(ossInputStream, object, this.readerSliceConfig, recordSender, this.getTaskPluginCollector()); } else { BinaryFileReaderUtil.readFromStream(ossInputStream, object, recordSender, this.blockSizeInByte); } } finally { IOUtils.closeQuietly(ossInputStream); } } recordSender.flush(); } @Override public void post() { LOG.info("task post() begin..."); if(useHdfsReaderProxy){ this.hdfsReaderTask.post(); return; } } @Override public void destroy() { if(useHdfsReaderProxy){ this.hdfsReaderTask.destroy(); return; } try { // this.ossClient.shutdown(); } catch (Exception e) { LOG.warn("shutdown ossclient meet a exception:" + e.getMessage(), e); } } } } ================================================ FILE: ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/OssReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.ossreader; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by mengxin.liumx on 2014/12/7. */ public enum OssReaderErrorCode implements ErrorCode { // TODO: 修改错误码类型 RUNTIME_EXCEPTION("OssReader-00", "运行时异常"), OSS_EXCEPTION("OssFileReader-01", "OSS配置异常"), CONFIG_INVALID_EXCEPTION("OssFileReader-02", "参数配置错误"), NOT_SUPPORT_TYPE("OssReader-03", "不支持的类型"), CAST_VALUE_TYPE_ERROR("OssFileReader-04", "无法完成指定类型的转换"), SECURITY_EXCEPTION("OssReader-05", "缺少权限"), ILLEGAL_VALUE("OssReader-06", "值错误"), REQUIRED_VALUE("OssReader-07", "必选项"), NO_INDEX_VALUE("OssReader-08","没有 Index" ), MIXED_INDEX_VALUE("OssReader-09","index 和 value 混合" ), EMPTY_BUCKET_EXCEPTION("OssReader-10", "您尝试读取的Bucket为空"); private final String code; private final String description; private OssReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/HdfsParquetUtil.java ================================================ package com.alibaba.datax.plugin.reader.ossreader.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.ossreader.Key; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONObject; /** * @Author: guxuan * @Date 2022-05-17 15:46 */ public class HdfsParquetUtil { public static boolean isUseHdfsWriterProxy( String fileFormat){ if("orc".equalsIgnoreCase(fileFormat) || "parquet".equalsIgnoreCase(fileFormat)){ return true; } return false; } /** * 配置readerOriginConfig 适配hdfsreader读取oss parquet * https://help.aliyun.com/knowledge_detail/74344.html * @param readerOriginConfig */ public static void adaptConfiguration(Configuration readerOriginConfig){ String bucket = readerOriginConfig.getString(Key.BUCKET); String fs =String.format("oss://%s",bucket); readerOriginConfig.set(com.alibaba.datax.plugin.reader.hdfsreader.Key.DEFAULT_FS,fs); readerOriginConfig.set(com.alibaba.datax.plugin.reader.hdfsreader.Key.FILETYPE, readerOriginConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_FORMAT)); /** * "path"、 "column" 相互一致 */ JSONObject hadoopConfig = new JSONObject(); hadoopConfig.put(Key.FS_OSS_ACCESSID,readerOriginConfig.getString(Key.ACCESSID)); hadoopConfig.put(Key.FS_OSS_ACCESSKEY,readerOriginConfig.getString(Key.ACCESSKEY)); hadoopConfig.put(Key.FS_OSS_ENDPOINT,readerOriginConfig.getString(Key.ENDPOINT)); readerOriginConfig.set(Key.HDOOP_CONFIG,Configuration.from(JSON.toJSONString(hadoopConfig))); } } ================================================ FILE: ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/OssSplitUtil.java ================================================ package com.alibaba.datax.plugin.reader.ossreader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.ossreader.OssInputStream; import com.alibaba.datax.plugin.unstructuredstorage.reader.Key; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import com.alibaba.datax.plugin.unstructuredstorage.reader.split.StartEndPair; import com.alibaba.datax.plugin.unstructuredstorage.reader.split.UnstructuredSplitUtil; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import com.aliyun.oss.OSSClient; import com.aliyun.oss.model.GetObjectRequest; import com.aliyun.oss.model.OSSObject; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.Comparator; import java.util.List; /** * @Author: guxuan * @Date 2022-05-17 15:48 */ public class OssSplitUtil extends UnstructuredSplitUtil { private static final Logger LOG = LoggerFactory.getLogger(UnstructuredSplitUtil.class); public static final Long SINGLE_FILE_SPLIT_THRESHOLD_IN_SIZE = 64 * 1024 * 1024L; // 小于 1MB 的文件不做内部切分 private OSSClient ossClient; private String bucketName; private Double balanceThreshold; private Long avgLen = -1L; private Integer splitGroupNum = -1; public OssSplitUtil(OSSClient ossClient, String bucketName) { super(false); this.ossClient = ossClient; this.bucketName = bucketName; } @Override public Long getFileTotalLength(String filePath) { // 获取object字节总数 GetObjectRequest getObjectRequest = new GetObjectRequest(this.bucketName, filePath); OSSObject ossObject = this.ossClient.getObject(getObjectRequest); return ossObject.getObjectMetadata().getContentLength(); } @Override public InputStream getFileInputStream(StartEndPair startEndPair) { InputStream inputStream = new OssInputStream(this.ossClient, this.bucketName, startEndPair.getFilePath(), startEndPair.getStart(), startEndPair.getEnd()); return inputStream; } private Boolean canSplitSingleFile(Configuration jobConfig) { Boolean enableInnerSplit = jobConfig.getBool(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.ENABLE_INNER_SPLIT, true); if (!enableInnerSplit) { return false; } // 默认不切分 String fileFormat = jobConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.FILE_FORMAT, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.DEFAULT_FILE_FORMAT); String compressType = jobConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COMPRESS); // 如果不满足"是text格式且非压缩文件",则直接返回false if (! StringUtils.equalsIgnoreCase(fileFormat, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.FILE_FORMAT_TEXT) || ! StringUtils.isBlank(compressType)) { return false; } // todo: 判断文件是否为软连接文件,如果为软连接文件,则不支持内部切分 return true; } private boolean isGroupsBalance(List groups) { assert (groups != null); if(groups.size() <= 1) { return true; } double avg = (double) this.avgLen * (1.0 + this.balanceThreshold/100); for (Group group : groups) { if(group.getFilledLenght() > avg) { return false; } } return true; } /* * 把 allObjectKeySizePares 分成 N 组,尽量使得各组中文件 size 之和 近似 * */ private List splitObjectToGroups(List> allObjKeySizePares, Integer N) { List groups; // 若文件数 <= N,则每个文件分一个组 if(allObjKeySizePares.size() <= N) { groups = new ArrayList<>(); int index = 0; for (Pair pair : allObjKeySizePares) { // capacity 初始化为avgLen Group group = new Group(avgLen); FileBlock fileBlock = new FileBlock(pair.getKey(), 0L, pair.getValue() - 1); group.fill(fileBlock); groups.add(group); } // 文件不足N,则以空group补全 for (int i = groups.size(); i < N; i++) { groups.add(new Group(avgLen)); } return groups; } //文件数量 > N //对 allObjKeySizePairs 按照 size 从大到小排序 allObjKeySizePares.sort(new Comparator>() { @Override public int compare(Pair o1, Pair o2) { if (o1.getValue().compareTo(o2.getValue()) < 0) { return 1; } if (o1.getValue().equals(o2.getValue())) { return 0; } return -1; } }); groups = new ArrayList<>(N); for (int i = 0; i < N; i++) { Group group = new Group(avgLen); groups.add(group); } for (Pair pair : allObjKeySizePares) { FileBlock fileBlock = new FileBlock(pair.getKey(), 0L, pair.getValue() - 1); // 对于avgLen < 0 的极端情况,直接将文件按照数量均分到各个group if (avgLen > 0 && pair.getValue() >= avgLen) { // 若果文件size > avgLen,则独立成组(放在一个空的group中 for (int index = 0; index < N; index++) { if (groups.get(index).isEmpty()) { groups.get(index).fill(fileBlock); break; } } } else { // 如果文件小于平均长度,则将其放在一个当前能够容纳,且容量最接近的 group 中 int selectedIndex = 0, index = 0; // 先找到第一个能容纳的 for (; index < N; index++) { if (groups.get(index).getCapacity() >= fileBlock.getSize()) { selectedIndex = index; } } // 找到能容纳且剩余容量最小的 for (;index < N; index++) { if (groups.get(index).getCapacity() >= fileBlock.getSize() && groups.get(index).getCapacity() < groups.get(selectedIndex).getCapacity()) { selectedIndex = index; } } groups.get(selectedIndex).fill(fileBlock); } } return groups; } private void reBalanceGroup(List groups) { LOG.info("reBalance start"); assert (groups != null && groups.size() > 0); // 对某些group内部的文件进行进一步切分 /* 1. 选出负载最小和最大的组 */ Group groupMinLoad = groups.get(0); Group groupMaxLoad = groups.get(0); for (Group group : groups) { if (group.getFilledLenght() > groupMaxLoad.getFilledLenght()) { groupMaxLoad = group; } if (group.getFilledLenght() < groupMinLoad.getFilledLenght()) { groupMinLoad = group; } } /* 2. 将 groupMaxLoad 最大文件切分出部分放入 groupMinLoad * 大小为 min{grouMaxLoad.length - mean, mean - groupMinLoad.length} */ Long splitLen = Math.min(groupMinLoad.getCapacity(), groupMaxLoad.getOverloadLength()); FileBlock splitOutBlock = groupMaxLoad.split(splitLen, this.ossClient, this.bucketName); groupMinLoad.fill(splitOutBlock); LOG.info("reBalance end"); } private Long getTotoalLenOfObjList(List> objKeySizePares) { Long totalLen = 0L; for (Pair pair : objKeySizePares) { totalLen += (pair.getValue() < 0 ? 1 : pair.getValue()); } return totalLen; } public List getSplitedConfigurations(Configuration originConfiguration, List> objKeySizePares, int adviceNumber) { List configurationList = new ArrayList<>(); this.splitGroupNum = adviceNumber; this.avgLen = (long)Math.ceil((double)this.getTotoalLenOfObjList(objKeySizePares) / this.splitGroupNum); this.balanceThreshold = originConfiguration.getDouble(com.alibaba.datax.plugin.reader.ossreader.Key.BALANCE_THRESHOLD, 10.0); List groups = this.splitObjectToGroups(objKeySizePares, this.splitGroupNum); // 划分后,各个组间如果长度确实比较近似,则不需要进一步对单个文件进行内部切分,反之,则需要对单个文件进行内部切分以进行进一步的调整 if (canSplitSingleFile(originConfiguration)) { // 防止文件内部单行过大,对循环加以限制,理论上最多只需要调整 splitGroupNum 次 Integer i = 0; Long timeStart = System.currentTimeMillis(); while (i++ < splitGroupNum && ! this.isGroupsBalance(groups)) { this.reBalanceGroup(groups); } Long timeEnd = System.currentTimeMillis(); LOG.info("split groups cost {} ms", timeEnd - timeStart); } LOG.info("Splited gourps:\n"); for (Group group : groups) { LOG.info(group.toString()); } // 根据Groups划分结果初始化各个分片任务配置 for (Group group : groups) { Configuration configuration = originConfiguration.clone(); // 根据groups初始化分片 List startEndPairs = new ArrayList<>(); for (FileBlock fileBlock : group.getFileBLocks()) { if (canSplitSingleFile(originConfiguration)) { startEndPairs.add(new StartEndPair(fileBlock.getStartOffset(), fileBlock.getEndOffset(), fileBlock.getObjName())); } else { // 如果不支持内部切分,则设置结束位点为-1,直接读取文件全部内容 // 对于软连接文件,这是必要的 30190064 startEndPairs.add(new StartEndPair(fileBlock.getStartOffset(), -1L, fileBlock.getObjName())); } } configuration.set(Key.SPLIT_SLICE_CONFIG, startEndPairs); configurationList.add(configuration); } return configurationList; } } class Group { /* * fileBlockList 表示该Group中对应的文件块列表,单个文件块用一个三元组 表示 * */ private List fileBLockList; private Long capacity; private Long filledLenght; private static final Logger LOG = LoggerFactory.getLogger(Group.class); Group (Long capacity) { this(new ArrayList<>(), capacity); } Group (List fileBLockList, Long capacity) { this.capacity = capacity; this.fileBLockList = fileBLockList; this.filledLenght = 0L; for (FileBlock fileBlock : fileBLockList) { this.filledLenght += fileBlock.getSize(); this.capacity -= fileBlock.getSize(); } } void fill(FileBlock fileBlock) { if (null == fileBlock) { return; } this.fileBLockList.add(fileBlock); this.capacity -= fileBlock.getSize(); this.filledLenght += fileBlock.getSize(); } void take(FileBlock fileBlock) { this.capacity += fileBlock.getSize(); this.filledLenght -= fileBlock.getSize(); this.fileBLockList.remove(fileBlock); } Long getCapacity() { return this.capacity; } void setCapacity(Long capacity) { this.capacity = capacity; } Long getFilledLenght() { return this.filledLenght; } public boolean isEmpty() { return this.fileBLockList.isEmpty(); } public boolean isFull() { return this.capacity <= 0; } List getFileBLocks() { return this.fileBLockList; } private Integer getBiggestFileBlock() { Integer index = 0; Long maxSize = -1L; for (int i = 0; i < this.fileBLockList.size(); i++) { if (this.fileBLockList.get(index).getSize() > maxSize) { index = i; } } return index; } /* * 对Group进行切分,切分逻辑为:对最大block进行切分,前splitLen个字节作为一个新块 * */ FileBlock split(Long splitLen, OSSClient ossClient, String ossBucketName) { Integer bigBlockIndex = this.getBiggestFileBlock(); FileBlock bigBlock = this.fileBLockList.get(bigBlockIndex); // 如果最大块的不足 10MB,则不进行内部切分直接返回 if (bigBlock.getSize() <= OssSplitUtil.SINGLE_FILE_SPLIT_THRESHOLD_IN_SIZE) { return null; } FileBlock outBlock; FileBlock remainBlock; this.take(bigBlock); // 如果splitLen 大于 最大块的长度, 则直接把最大块切分出去 if (splitLen >= bigBlock.getSize()) { outBlock = new FileBlock(bigBlock); } else { Long originalEnd = bigBlock.getEndOffset(); outBlock = new FileBlock(bigBlock.getObjName(), bigBlock.getStartOffset(), bigBlock.getStartOffset() + splitLen - 1); // 校准第一个block的结束位点,即往后推到第一个换行符 InputStream inputStream = new OssInputStream(ossClient, ossBucketName, outBlock.getObjName(), outBlock.getEndOffset(), originalEnd); Long endForward = this.getLFIndex(inputStream); outBlock.setEndOffset(outBlock.getEndOffset() + endForward); // outblock取的是前边部分record,切分除去后,剩余部分可能为空,这时候不生成remainBlock,确保有剩余(outBlock.end > originEnd)时再生成remainBlock. if (outBlock.getEndOffset() < originalEnd) { remainBlock = new FileBlock(bigBlock.getObjName(), outBlock.getEndOffset() + 1, originalEnd); this.fill(remainBlock); } } return outBlock; } Long getOverloadLength() { return Math.max(0, -this.capacity); } /** * 获取到输入流开始的第一个'\n'偏移量 * * @param inputStream * 输入流 * @return */ public Long getLFIndex(InputStream inputStream) { Long hasReadByteIndex = -1L; int ch = 0; while (ch != -1) { try { ch = inputStream.read(); } catch (IOException e) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.READ_FILE_IO_ERROR, String.format("inputstream read Byte has exception: %s", e.getMessage()), e); } hasReadByteIndex++; if (ch == '\n') { return hasReadByteIndex; } } return -1L; } public String toString() { JSONArray fbList = new JSONArray(); int index = 0; for (FileBlock fb : this.fileBLockList) { JSONObject jsonObject = new JSONObject(); jsonObject.put(String.format("block[%d]", index++), fb.toString()); fbList.add(jsonObject); } return fbList.toString(); } } class FileBlock { private String objName; private Long startOffset; private Long endOffset; private Long size; FileBlock(String objName, Long startOffset, Long endOffset) { assert (StringUtils.isNotBlank(objName) && startOffset >= 0 ); assert (endOffset == -1 || startOffset <= endOffset); this.objName = objName; this.startOffset = startOffset; // endOffset < 0 的情况下,统一设置为-1,size 设置为0 this.endOffset = endOffset < 0 ? -1 : endOffset; this.size = endOffset < 0 ? 1 : this.endOffset - this.startOffset + 1; } public FileBlock(String objName) { this(objName, 0L, -1L); } public FileBlock(String objName, Pair starEndPair) { this(objName, starEndPair.getKey(), starEndPair.getValue()); } public FileBlock(FileBlock fileBlock) { assert (fileBlock != null); this.objName = fileBlock.objName; this.startOffset = fileBlock.startOffset; this.endOffset = fileBlock.endOffset; this.size = fileBlock.size; } Long getSize() { return this.size; } Long getStartOffset() { return this.startOffset; } void setStartOffset(Long startOffset) { Long deltaSize = this.startOffset - startOffset; this.startOffset = startOffset; this.size += deltaSize; } Long getEndOffset() { return this.endOffset; } void setEndOffset(Long endOffset) { Long deltaSize = endOffset - this.endOffset; this.endOffset = endOffset; //size随之调整 this.size += deltaSize; } String getObjName() { return this.objName; } public String toString() { return String.format("<%s,%d,%d>", this.objName, this.startOffset, this.endOffset); } } ================================================ FILE: ossreader/src/main/java/com/alibaba/datax/plugin/reader/ossreader/util/OssUtil.java ================================================ package com.alibaba.datax.plugin.reader.ossreader.util; import java.util.ArrayList; import java.util.List; import org.apache.commons.lang3.StringUtils; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.ossreader.Constant; import com.alibaba.datax.plugin.reader.ossreader.Key; import com.alibaba.datax.plugin.reader.ossreader.OssReaderErrorCode; import com.aliyun.oss.ClientConfiguration; import com.aliyun.oss.OSSClient; /** * Created by mengxin.liumx on 2014/12/8. */ public class OssUtil { public static OSSClient initOssClient(Configuration conf) { String endpoint = conf.getString(Key.ENDPOINT); String accessId = conf.getString(Key.ACCESSID); String accessKey = conf.getString(Key.ACCESSKEY); ClientConfiguration ossConf = new ClientConfiguration(); ossConf.setSocketTimeout(Constant.SOCKETTIMEOUT); // .aliyun.com, if you are .aliyun.ga you need config this String cname = conf.getString(Key.CNAME); if (StringUtils.isNotBlank(cname)) { List cnameExcludeList = new ArrayList(); cnameExcludeList.add(cname); ossConf.setCnameExcludeList(cnameExcludeList); } OSSClient client = null; try { client = new OSSClient(endpoint, accessId, accessKey, ossConf); } catch (IllegalArgumentException e) { throw DataXException.asDataXException( OssReaderErrorCode.ILLEGAL_VALUE, e.getMessage()); } return client; } } ================================================ FILE: ossreader/src/main/resources/plugin.json ================================================ { "name": "ossreader", "class": "com.alibaba.datax.plugin.reader.ossreader.OssReader", "description": "", "developer": "alibaba" } ================================================ FILE: ossreader/src/main/resources/plugin_job_template.json ================================================ { "name": "ossreader", "parameter": { "endpoint": "", "accessId": "", "accessKey": "", "bucket": "", "object": [], "column": [], "encoding": "", "fieldDelimiter": "", "compress": "" } } ================================================ FILE: osswriter/doc/osswriter.md ================================================ # DataX OSSWriter 说明 ------------ ## 1 快速介绍 OSSWriter提供了向OSS写入类CSV格式的一个或者多个表文件。 **写入OSS内容存放的是一张逻辑意义上的二维表,例如CSV格式的文本信息。** * OSS 产品介绍, 参看[[阿里云OSS Portal](http://www.aliyun.com/product/oss)] * OSS Java SDK, 参看[[阿里云OSS Java SDK](http://oss.aliyuncs.com/aliyun_portal_storage/help/oss/OSS_Java_SDK_Dev_Guide_20141113.pdf)] ## 2 功能与限制 OSSWriter实现了从DataX协议转为OSS中的TXT文件功能,OSS本身是无结构化数据存储,OSSWriter需要在如下几个方面增加: 1. 支持写入 TXT的文件,且要求TXT中shema为一张二维表。 2. 支持类CSV格式文件,自定义分隔符。 3. 暂时不支持文本压缩。 6. 支持多线程写入,每个线程写入不同子文件。 7. 文件支持滚动,当文件大于某个size值或者行数值,文件需要切换。 [暂不支持] 8. 支持写 PARQUET、ORC 文件 我们不能做到: 1. 单个文件不能支持并发写入。 ## 3 功能说明 ### 3.1 配置样例 写 txt文件样例 ```json { "job": { "setting": {}, "content": [ { "reader": { }, "writer": { "name": "osswriter", "parameter": { "endpoint": "http://oss.aliyuncs.com", "accessId": "", "accessKey": "", "bucket": "myBucket", "object": "cdo/datax", "encoding": "UTF-8", "fieldDelimiter": ",", "writeMode": "truncate|append|nonConflict" } } } ] } } ``` 写 orc 文件样例 ```json { "job": { "setting": {}, "content": [ { "reader": {}, "writer": { "name": "osswriter", "parameter": { "endpoint": "http://oss.aliyuncs.com", "accessId": "", "accessKey": "", "bucket": "myBucket", "fileName": "test", "encoding": "UTF-8", "column": [ { "name": "col1", "type": "BIGINT" }, { "name": "col2", "type": "DOUBLE" }, { "name": "col3", "type": "STRING" } ], "fileFormat": "orc", "path": "/tests/case61", "writeMode": "append" } } } ] } } ``` 写 parquet 文件样例 ```json { "job": { "setting": {}, "content": [ { "reader": {}, "writer": { "name": "osswriter", "parameter": { "endpoint": "http://oss.aliyuncs.com", "accessId": "", "accessKey": "", "bucket": "myBucket", "fileName": "test", "encoding": "UTF-8", "column": [ { "name": "col1", "type": "BIGINT" }, { "name": "col2", "type": "DOUBLE" }, { "name": "col3", "type": "STRING" } ], "parquetSchema": "message test { required int64 int64_col;\n required binary str_col (UTF8);\nrequired group params (MAP) {\nrepeated group key_value {\nrequired binary key (UTF8);\nrequired binary value (UTF8);\n}\n}\nrequired group params_arr (LIST) {\n repeated group list {\n required binary element (UTF8);\n }\n}\nrequired group params_struct {\n required int64 id;\n required binary name (UTF8);\n }\nrequired group params_arr_complex (LIST) {\n repeated group list {\n required group element {\n required int64 id;\n required binary name (UTF8);\n}\n }\n}\nrequired group params_complex (MAP) {\nrepeated group key_value {\nrequired binary key (UTF8);\nrequired group value {\n required int64 id;\n required binary name (UTF8);\n }\n}\n}\nrequired group params_struct_complex {\n required int64 id;\n required group detail {\n required int64 id;\n required binary name (UTF8);\n }\n }\n}", "fileFormat": "parquet", "path": "/tests/case61", "writeMode": "append" } } } ] } } ``` ### 3.2 参数说明 * **endpoint** * 描述:OSS Server的EndPoint地址,例如http://oss.aliyuncs.com。 * 必选:是
* 默认值:无
* **accessId** * 描述:OSS的accessId
* 必选:是
* 默认值:无
* **accessKey** * 描述:OSS的accessKey
* 必选:是
* 默认值:无
* **bucket** * 描述:OSS的bucket
* 必选:是
* 默认值:无
* **object** * 描述:OSSWriter写入的文件名,OSS使用文件名模拟目录的实现。
使用"object": "datax",写入object以datax开头,后缀添加随机字符串。 使用"object": "cdo/datax",写入的object以cdo/datax开头,后缀随机添加字符串,/作为OSS模拟目录的分隔符。 * 必选:是
* 默认值:无
* **writeMode** * 描述:OSSWriter写入前数据清理处理:
* truncate,写入前清理object名称前缀匹配的所有object。例如: "object": "abc",将清理所有abc开头的object。 * append,写入前不做任何处理,DataX OSSWriter直接使用object名称写入,并使用随机UUID的后缀名来保证文件名不冲突。例如用户指定的object名为datax,实际写入为datax_xxxxxx_xxxx_xxxx * nonConflict,如果指定路径出现前缀匹配的object,直接报错。例如: "object": "abc",如果存在abc123的object,将直接报错。 * 必选:是
* 默认值:无
* **fieldDelimiter** * 描述:读取的字段分隔符
* 必选:否
* 默认值:,
* **encoding** * 描述:写出文件的编码配置。
* 必选:否
* 默认值:utf-8
* **nullFormat** * 描述:文本文件中无法使用标准字符串定义null(空指针),DataX提供nullFormat定义哪些字符串可以表示为null。
例如如果用户配置: nullFormat="\N",那么如果源头数据是"\N",DataX视作null字段。 * 必选:否
* 默认值:\N
* **dateFormat** * 描述:日期类型的数据序列化到object中时的格式,例如 "dateFormat": "yyyy-MM-dd"。
* 必选:否
* 默认值:无
* **fileFormat** * 描述:文件写出的格式,包括csv (http://zh.wikipedia.org/wiki/%E9%80%97%E5%8F%B7%E5%88%86%E9%9A%94%E5%80%BC) 和text两种,csv是严格的csv格式,如果待写数据包括列分隔符,则会按照csv的转义语法转义,转义符号为双引号";text格式是用列分隔符简单分割待写数据,对于待写数据包括列分隔符情况下不做转义。
* 必选:否
* 默认值:text
* **header** * 描述:Oss写出时的表头,示例['id', 'name', 'age']。
* 必选:否
* 默认值:无
* **maxFileSize** * 描述:Oss写出时单个Object文件的最大大小,默认为10000*10MB,类似log4j日志打印时根据日志文件大小轮转。OSS分块上传时,每个分块大小为10MB,每个OSS InitiateMultipartUploadRequest支持的分块最大数量为10000。轮转发生时,object名字规则是:在原有object前缀加UUID随机数的基础上,拼接_1,_2,_3等后缀。
* 必选:否
* 默认值:100000MB
### 3.3 类型转换 ## 4 性能报告 OSS本身不提供数据类型,该类型是DataX OSSWriter定义: | DataX 内部类型| OSS 数据类型 | | -------- | ----- | | Long |Long | | Double |Double| | String |String| | Boolean |Boolean | | Date |Date | 其中: * OSS Long是指OSS文本中使用整形的字符串表示形式,例如"19901219"。 * OSS Double是指OSS文本中使用Double的字符串表示形式,例如"3.1415"。 * OSS Boolean是指OSS文本中使用Boolean的字符串表示形式,例如"true"、"false"。不区分大小写。 * OSS Date是指OSS文本中使用Date的字符串表示形式,例如"2014-12-31",Date可以指定format格式。 ## 5 约束限制 略 ## 6 FAQ 略 ================================================ FILE: osswriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT osswriter osswriter jar org.apache.logging.log4j log4j-api 2.17.1 org.apache.logging.log4j log4j-core 2.17.1 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 com.aliyun.oss aliyun-sdk-oss 2.2.3 org.apache.parquet parquet-column 1.8.1 org.apache.parquet parquet-avro 1.8.1 org.apache.parquet parquet-common 1.8.1 org.apache.parquet parquet-format 2.3.1 org.apache.parquet parquet-jackson 1.8.1 org.apache.parquet parquet-encoding 1.8.1 org.apache.parquet parquet-hadoop 1.8.1 com.twitter parquet-hadoop-bundle 1.6.0 com.alibaba.datax hdfswriter 0.0.1-SNAPSHOT compile com.alibaba.datax datax-core 0.0.1-SNAPSHOT compile maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: osswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/osswriter target/ osswriter-0.0.1-SNAPSHOT.jar plugin/writer/osswriter false plugin/writer/osswriter/libs runtime ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.osswriter; /** * Created by haiwei.luo on 15-02-09. */ public class Constant { public static final String OBJECT = "object"; public static final int SOCKETTIMEOUT = 5000000; public static final String DEFAULT_NULL_FORMAT = "null"; /** * 每一个上传的Part都有一个标识它的号码(part number,范围是1-10000) * https://help.aliyun.com/document_detail/31993.html */ public static final int MAX_BLOCK_SIZE = 10000; } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.osswriter; /** * Created by haiwei.luo on 15-02-09. */ public class Key { public static final String ENDPOINT = "endpoint"; public static final String ACCESSID = "accessId"; public static final String ACCESSKEY = "accessKey"; public static final String BUCKET = "bucket"; public static final String OBJECT = "object"; public static final String CNAME = "cname"; public static final String PARTITION = "partition"; /** * encrypt: 是否需要将数据在oss上加密存储 */ public static final String ENCRYPT = "encrypt"; public static final String BLOCK_SIZE_IN_MB = "blockSizeInMB"; public static final String OSS_CONFIG = "oss"; public static final String POSTGRESQL_CONFIG = "postgresql"; public static final String PROXY_HOST = "proxyHost"; public static final String PROXY_PORT = "proxyPort"; public static final String PROXY_USERNAME = "proxyUsername"; public static final String PROXY_PASSWORD = "proxyPassword"; public static final String PROXY_DOMAIN = "proxyDomain"; public static final String PROXY_WORKSTATION = "proxyWorkstation"; public static final String HDOOP_CONFIG = "hadoopConfig"; public static final String FS_OSS_ACCESSID = "fs.oss.accessKeyId"; public static final String FS_OSS_ACCESSKEY = "fs.oss.accessKeySecret"; public static final String FS_OSS_ENDPOINT = "fs.oss.endpoint"; /** * 多个task是否写单个object文件: * false 多个task写多个object(默认是false, 保持向前兼容) * true 多个task写单个object */ public static final String WRITE_SINGLE_OBJECT = "writeSingleObject"; public static final String UPLOAD_ID = "uploadId"; /** * Only for parquet or orc fileType */ public static final String PATH = "path"; /** * Only for parquet or orc fileType */ public static final String FILE_NAME = "fileName"; public static final String GENERATE_EMPTY_FILE = "generateEmptyFile"; } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/OssSingleObject.java ================================================ package com.alibaba.datax.plugin.writer.osswriter; import com.alibaba.datax.common.exception.DataXException; import com.aliyun.oss.model.PartETag; import org.apache.commons.lang3.ArrayUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.concurrent.atomic.AtomicInteger; /** * @Author: guxuan * @Date 2022-05-17 16:30 */ public class OssSingleObject { private static Logger logger = LoggerFactory.getLogger(OssSingleObject.class); /** * 一个uploadId即一个oss对象 */ public static String uploadId; /** * 将最后一个未提交的block全部缓存到lastBlockBuffer中 */ private static byte[] lastBlockBuffer; /** * 当前part number */ public static AtomicInteger currentPartNumber = new AtomicInteger(1); /** * 所有已经提交的block * 注:allPartETags是线程安全的list */ public static List allPartETags = Collections.synchronizedList(new ArrayList()); /** * 将每个task最后未upload的block加入到lastBlockBuffer, * 如果lastBlockBuffer的大小已经超过blockSizeInByte,则需要upload一次, 防止task过多导致lastBlockBuffer暴增OOM * * @param lastBlock * @param ossWriterProxy * @param blockSizeInByte * @param object */ public synchronized static void addLastBlockBuffer(byte[] lastBlock, OssWriterProxy ossWriterProxy, long blockSizeInByte, String object, OssWriterProxy.HeaderProvider headerProvider) { lastBlockBuffer = ArrayUtils.addAll(lastBlockBuffer, lastBlock); //lastBlockBuffer大小超过blockSizeInByte则需要upload part if (lastBlockBuffer != null && lastBlockBuffer.length >= blockSizeInByte) { logger.info("write last block buffer part size [{}] to object [{}], all has uploaded part size:{}, current part number:{}, uploadId:{}", lastBlockBuffer.length, object, allPartETags.size(), currentPartNumber.intValue(), uploadId); try { ossWriterProxy.uploadOnePartForSingleObject(lastBlockBuffer, uploadId, allPartETags, object, headerProvider); } catch (Exception e) { logger.error("upload part error: {}", e.getMessage(), e); throw DataXException.asDataXException(e.getMessage()); } //currentPartNumber自增 currentPartNumber.incrementAndGet(); //清空lastBlockBuffer lastBlockBuffer = null; } } public static byte[] getLastBlockBuffer() { return lastBlockBuffer; } } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/OssWriter.java ================================================ package com.alibaba.datax.plugin.writer.osswriter; import java.io.*; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.*; import java.util.concurrent.Callable; import com.alibaba.datax.common.element.BytesColumn; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.util.RangeSplitUtil; import com.alibaba.datax.plugin.unstructuredstorage.FileFormat; import com.alibaba.datax.plugin.unstructuredstorage.writer.binaryFileUtil.BinaryFileWriterUtil; import com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriter; import com.alibaba.datax.plugin.writer.osswriter.util.HandlerUtil; import com.alibaba.datax.plugin.writer.osswriter.util.HdfsParquetUtil; import com.alibaba.fastjson2.JSON; import com.aliyun.oss.model.*; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.unstructuredstorage.writer.TextCsvWriterManager; import com.alibaba.datax.plugin.unstructuredstorage.writer.UnstructuredStorageWriterUtil; import com.alibaba.datax.plugin.unstructuredstorage.writer.UnstructuredWriter; import com.alibaba.datax.plugin.writer.osswriter.util.OssUtil; import com.aliyun.oss.ClientException; import com.aliyun.oss.OSSClient; import com.aliyun.oss.OSSException; import static com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.*; /** * Created by haiwei.luo on 15-02-09. */ public class OssWriter extends Writer { public static int parseParentPathLength(List path) { if (path == null || path.size() != 1) { throw DataXException.asDataXException( OssWriterErrorCode.CONFIG_INVALID_EXCEPTION, String.format("only support configure one path in binary copy mode, your config: %s", JSON.toJSONString(path))); } String eachPath = path.get(0); int endMark; for (endMark = 0; endMark < eachPath.length(); endMark++) { if ('*' != eachPath.charAt(endMark) && '?' != eachPath.charAt(endMark)) { continue; } else { break; } } int lastDirSeparator = eachPath.lastIndexOf(IOUtils.DIR_SEPARATOR) + 1; if (endMark < eachPath.length()) { lastDirSeparator = eachPath.substring(0, endMark).lastIndexOf(IOUtils.DIR_SEPARATOR) + 1; } return lastDirSeparator; } public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration writerSliceConfig = null; private OSSClient ossClient = null; private Configuration peerPluginJobConf; private Boolean isBinaryFile; private String objectDir; private String syncMode; private String fileFormat; private String encoding; private HdfsWriter.Job hdfsWriterJob; private boolean useHdfsWriterProxy = false; private boolean writeSingleObject; private OssWriterProxy ossWriterProxy; private String bucket; private String object; private List header; @Override public void preHandler(Configuration jobConfiguration) { HandlerUtil.preHandler(jobConfiguration); } @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.basicValidateParameter(); this.fileFormat = this.writerSliceConfig.getString( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_FORMAT, com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.FILE_FORMAT_TEXT); this.encoding = this.writerSliceConfig.getString( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.ENCODING, com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.DEFAULT_ENCODING); this.useHdfsWriterProxy = HdfsParquetUtil.isUseHdfsWriterProxy(this.fileFormat); if(useHdfsWriterProxy){ this.hdfsWriterJob = new HdfsWriter.Job(); HdfsParquetUtil.adaptConfiguration(this.hdfsWriterJob, this.writerSliceConfig); this.hdfsWriterJob.setJobPluginCollector(this.getJobPluginCollector()); this.hdfsWriterJob.setPeerPluginJobConf(this.getPeerPluginJobConf()); this.hdfsWriterJob.setPeerPluginName(this.getPeerPluginName()); this.hdfsWriterJob.setPluginJobConf(this.getPluginJobConf()); this.hdfsWriterJob.init(); return; } this.peerPluginJobConf = this.getPeerPluginJobConf(); this.isBinaryFile = FileFormat.getFileFormatByConfiguration(this.peerPluginJobConf).isBinary(); this.syncMode = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.SYNC_MODE, ""); this.writeSingleObject = this.writerSliceConfig.getBool(Key.WRITE_SINGLE_OBJECT, false); this.header = this.writerSliceConfig .getList(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.HEADER, null, String.class); this.validateParameter(); this.ossClient = OssUtil.initOssClient(this.writerSliceConfig); this.ossWriterProxy = new OssWriterProxy(this.writerSliceConfig, this.ossClient); } private void basicValidateParameter(){ this.writerSliceConfig.getNecessaryValue(Key.ENDPOINT, OssWriterErrorCode.REQUIRED_VALUE); this.writerSliceConfig.getNecessaryValue(Key.ACCESSID, OssWriterErrorCode.REQUIRED_VALUE); this.writerSliceConfig.getNecessaryValue(Key.ACCESSKEY, OssWriterErrorCode.REQUIRED_VALUE); this.writerSliceConfig.getNecessaryValue(Key.BUCKET, OssWriterErrorCode.REQUIRED_VALUE); } private void validateParameter() { this.writerSliceConfig.getBool(Key.ENCRYPT); if (this.isBinaryFile){ BinaryFileWriterUtil.validateParameter(this.writerSliceConfig); return; } if (!this.isPeer2PeerCopyMode()) { // 非对等拷贝模式下必选 this.writerSliceConfig.getNecessaryValue(Key.OBJECT, OssWriterErrorCode.REQUIRED_VALUE); } // warn: do not support compress!! String compress = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.COMPRESS); if (StringUtils.isNotBlank(compress)) { String errorMessage = String.format("OSS writes do not support compression for the moment. The compressed item %s does not work", compress); LOG.error(errorMessage); throw DataXException.asDataXException( OssWriterErrorCode.ILLEGAL_VALUE, errorMessage); } UnstructuredStorageWriterUtil .validateParameter(this.writerSliceConfig); LOG.info("writeSingleObject is: {}", this.writeSingleObject); } @Override public void prepare() { LOG.info("begin do prepare..."); if(useHdfsWriterProxy){ this.hdfsWriterJob.prepare(); return; } this.bucket = this.writerSliceConfig.getString(Key.BUCKET); this.object = this.writerSliceConfig.getString(Key.OBJECT); String writeMode = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.WRITE_MODE); List sourceFileName = this.peerPluginJobConf.getList(SOURCE_FILE_NAME, new ArrayList(), String.class); this.objectDir = this.getObjectDir(object); // 对等拷贝模式下将源头获取的文件列表在目的端删除 if (this.isPeer2PeerCopyMode()) { String fullObjectName = null; String truncateMode = this.writerSliceConfig.getString("truncateMode", "objectMatch"); // 前缀删除模式 if ("prefix".equalsIgnoreCase(truncateMode)) { BinaryFileWriterUtil.checkFileNameIfRepeatedThrowException(sourceFileName); if (TRUNCATE.equals(writeMode)) { LOG.info("You have configured [writeMode] [truncate], so the system will start to clear the objects starting with [{}] under [{}]. ", bucket, object); // warn: 默认情况下,如果Bucket中的Object数量大于100,则只会返回100个Object while (true) { ObjectListing listing = null; LOG.info("list objects with listObject(bucket, object)"); listing = this.ossClient.listObjects(bucket, object); List objectSummarys = listing .getObjectSummaries(); if (objectSummarys.isEmpty()) { break; } List objects2Delete = new ArrayList(); for (OSSObjectSummary objectSummary : objectSummarys) { objects2Delete.add(objectSummary.getKey()); } LOG.info(String.format("[prefix truncate mode]delete oss object [%s].", JSON.toJSONString(objects2Delete))); DeleteObjectsRequest deleteRequest = new DeleteObjectsRequest(bucket); deleteRequest.setKeys(objects2Delete); deleteRequest.setQuiet(true);// 简单模式 DeleteObjectsResult deleteResult = this.ossClient.deleteObjects(deleteRequest); assert deleteResult.getDeletedObjects().isEmpty(); LOG.warn("OSS request id:{}, objects delete failed:{}", deleteResult.getRequestId(), JSON.toJSONString(deleteResult.getDeletedObjects())); } }else { throw DataXException.asDataXException(OssWriterErrorCode.ILLEGAL_VALUE, "only support truncate writeMode in copy sync mode."); } } else { if (TRUNCATE.equals(writeMode)) { sourceFileName = this.peerPluginJobConf.getList(com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.SOURCE_FILE, new ArrayList(), String.class); List readerPath = this.peerPluginJobConf.getList(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.PATH, new ArrayList(), String.class); int parentPathLength = OssWriter.parseParentPathLength(readerPath); this.writerSliceConfig.set("__parentPathLength", parentPathLength); BinaryFileWriterUtil.checkFileNameIfRepeatedThrowException(sourceFileName); // 原样文件名删除模式 int splitCount = sourceFileName.size() / 1000 + 1; List> splitResult = RangeSplitUtil.doListSplit(sourceFileName, splitCount); for (List eachSlice : splitResult) { assert eachSlice.size() <= 1000; if (eachSlice.isEmpty()) { continue; } List ossObjFullPath = new ArrayList(); for (String eachObj : eachSlice) { fullObjectName = String.format("%s%s", objectDir, eachObj.substring(parentPathLength, eachObj.length())); ossObjFullPath.add(fullObjectName); } LOG.info(String.format("[origin object name truncate mode]delete oss object [%s].", JSON.toJSONString(ossObjFullPath))); DeleteObjectsRequest deleteRequest = new DeleteObjectsRequest(bucket); deleteRequest.setKeys(ossObjFullPath); deleteRequest.setQuiet(true);// 简单模式 DeleteObjectsResult deleteResult = this.ossClient.deleteObjects(deleteRequest); assert deleteResult.getDeletedObjects().isEmpty(); LOG.warn("OSS request id:{}, objects delete failed:{}", deleteResult.getRequestId(), JSON.toJSONString(deleteResult.getDeletedObjects())); } } else { throw DataXException.asDataXException(OssWriterErrorCode.ILLEGAL_VALUE, "only support truncate writeMode in copy sync mode."); } } return; } else { // warn: 源头表不是半结构化或者不是对等copy模式时走前缀删除策略 // warn: bucket is not exists, create it try { // warn: do not create bucket for user if (!this.ossClient.doesBucketExist(bucket)) { // this.ossClient.createBucket(bucket); String errorMessage = String.format("The [bucket]: %s you configured does not exist. Please confirm your configuration items. ", bucket); LOG.error(errorMessage); throw DataXException.asDataXException( OssWriterErrorCode.ILLEGAL_VALUE, errorMessage); } LOG.info(String.format("access control details [%s].", this.ossClient.getBucketAcl(bucket).toString())); if (writeSingleObject) { doPrepareForSingleObject(bucket, object, writeMode); } else { doPrepareForMutliObject(bucket, object, writeMode); } } catch (OSSException e) { throw DataXException.asDataXException( OssWriterErrorCode.OSS_COMM_ERROR, e.getMessage(), e); } catch (ClientException e) { throw DataXException.asDataXException( OssWriterErrorCode.OSS_COMM_ERROR, e.getMessage(), e); } } } /** * 执行多个task写单个object prepare逻辑 * * @param bucket * @param object * @param writeMode */ private void doPrepareForSingleObject(String bucket, String object, String writeMode) { boolean doesObjectExist = this.ossClient.doesObjectExist(bucket, object); LOG.info("does object [{}] exist in bucket {} : {}", object, bucket, doesObjectExist); if (TRUNCATE.equals(writeMode)) { LOG.info("Because you have configured writeMode truncate, and writeSingleObject is true, start cleaning up the duplicate object [{}] under [{}]", bucket, object); if (doesObjectExist) { LOG.info("object [{}] has exist in bucket, delete it!", object, bucket); this.ossClient.deleteObject(bucket, object); } } else if (APPEND.equals(writeMode)) { throw DataXException .asDataXException( OssWriterErrorCode.ILLEGAL_VALUE, "Illegal value"); } else if (NOCONFLICT.equals(writeMode)) { LOG.info("Because you have configured writeMode nonConflict, and writeSingleObject is true, start checking bucket [{}] under the same name object [{}]", bucket, object); if (doesObjectExist) { throw DataXException .asDataXException( OssWriterErrorCode.ILLEGAL_VALUE, String.format("Buffet you configured: %s There is a duplicate name of Object %s", bucket, object)); } } } /** * 执行多个task写多个object的prepare逻辑,这个是osswriter已有的逻辑,需要保持向前兼容性 * * @param bucket * @param object * @param writeMode */ private void doPrepareForMutliObject(String bucket, String object, String writeMode) { // truncate option handler if (TRUNCATE.equals(writeMode)) { LOG.info("You have configured [writeMode] [truncate], so the system will start to clear the objects starting with [{}] under [{}]. ", bucket, object); // warn: 默认情况下,如果Bucket中的Object数量大于100,则只会返回100个Object while (true) { ObjectListing listing = null; LOG.info("list objects with listObject(bucket, object)"); listing = this.ossClient.listObjects(bucket, object); List objectSummarys = listing .getObjectSummaries(); for (OSSObjectSummary objectSummary : objectSummarys) { LOG.info(String.format("delete oss object [%s].", objectSummary.getKey())); this.ossClient.deleteObject(bucket, objectSummary.getKey()); } if (objectSummarys.isEmpty()) { break; } } } else if (APPEND.equals(writeMode)) { LOG.info("You have configured [writeMode] [append], so the system won\\u2019t perform the clearing before writing. Data is written to objects with the name prefix of [{}] under the bucket: [{}]. ", bucket, object); } else if (NOCONFLICT.equals(writeMode)) { LOG.info("You have configured [writeMode] [nonConflict], so the system will start to check objects whose names start with [{}] under the bucket: [{}]. ", bucket, object); ObjectListing listing = this.ossClient.listObjects(bucket, object); if (0 < listing.getObjectSummaries().size()) { StringBuilder objectKeys = new StringBuilder(); objectKeys.append("[ "); for (OSSObjectSummary ossObjectSummary : listing .getObjectSummaries()) { objectKeys.append(ossObjectSummary.getKey() + " ,"); } objectKeys.append(" ]"); LOG.info(String.format( "object with prefix [%s] details: %s", object, objectKeys.toString())); throw DataXException .asDataXException( OssWriterErrorCode.ILLEGAL_VALUE, String.format("The [bucket] you configured: %s contains objects with the name prefix of %s.", bucket, object)); } } } @Override public void post() { if(useHdfsWriterProxy){ this.hdfsWriterJob.post(); return; } if (this.writeSingleObject) { try { /**1. 合并上传最后一个block*/ LOG.info("Has upload part size: {}", OssSingleObject.allPartETags.size()); if (OssSingleObject.getLastBlockBuffer() != null && OssSingleObject.getLastBlockBuffer().length != 0) { byte[] byteBuffer = OssSingleObject.getLastBlockBuffer(); LOG.info("post writer single object last merge block size is : {}", byteBuffer.length); this.ossWriterProxy.uploadOnePartForSingleObject(byteBuffer, OssSingleObject.uploadId, OssSingleObject.allPartETags, this.object, this::getHeaderBytes); } if (OssSingleObject.allPartETags.size() == 0) { LOG.warn("allPartETags size is 0, there is no part of data need to be complete uploaded, " + "skip complete multipart upload!"); this.ossWriterProxy.abortMultipartUpload(this.object,OssSingleObject.uploadId); return; } /**2. 完成complete upload */ LOG.info("begin complete multi part upload, bucket:{}, object:{}, uploadId:{}, all has upload part size:{}", this.bucket, this.object, OssSingleObject.uploadId, OssSingleObject.allPartETags.size()); orderPartETages(OssSingleObject.allPartETags); CompleteMultipartUploadRequest completeMultipartUploadRequest = new CompleteMultipartUploadRequest( this.bucket, this.object, OssSingleObject.uploadId, OssSingleObject.allPartETags); CompleteMultipartUploadResult completeMultipartUploadResult = this.ossWriterProxy.completeMultipartUpload(completeMultipartUploadRequest); LOG.info(String.format("post final object etag is:[%s]", completeMultipartUploadResult.getETag())); } catch (Exception e) { LOG.error("osswriter post error: {}", e.getMessage(), e); throw DataXException.asDataXException(e.getMessage()); } } } private byte[] getHeaderBytes() throws IOException { if (null != this.header && !this.header.isEmpty()) { // write header to writer try (StringWriter sw = new StringWriter(); UnstructuredWriter headerWriter = UnstructuredStorageWriterUtil. produceUnstructuredWriter(this.fileFormat, this.writerSliceConfig, sw)) { headerWriter.writeOneRecord(this.header); return sw.toString().getBytes(this.encoding); } } return new byte[0]; } /** * 对allPartETags做递增排序 * * @param allPartETags * @return */ private void orderPartETages(List allPartETags) { Collections.sort(allPartETags, new Comparator() { @Override public int compare(PartETag o1, PartETag o2) { //按照partNumber递增排序 return o1.getPartNumber() - o2.getPartNumber(); } }); } @Override public void destroy() { if(useHdfsWriterProxy){ this.hdfsWriterJob.destroy(); return; } try { // this.ossClient.shutdown(); } catch (Exception e) { LOG.warn("shutdown ossclient meet a exception:" + e.getMessage(), e); } } @Override public List split(int mandatoryNumber) { LOG.info("begin do split..."); if(useHdfsWriterProxy){ return this.hdfsWriterJob.split(mandatoryNumber); } List writerSplitConfigs = new ArrayList(); // warn: 这个地方其实可能有bug,datax frame其实会shuffle, 文件内部切分也不好支持这个诉求 if(this.isPeer2PeerCopyMode()){ // 有这个需求风险: 源头oss的文件 abc/123/data.txt yixiao.txt 2个文件对等拷贝过来, 这个场景下data.txt // yixiao.txt 只能放一个目录 List readerSplitConfigs = this.getReaderPluginSplitConf(); for (int i = 0; i < readerSplitConfigs.size(); i++) { Configuration splitedTaskConfig = writerSliceConfig.clone(); splitedTaskConfig.set(Key.OBJECT, objectDir); splitedTaskConfig.set(com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.BINARY, this.isBinaryFile); writerSplitConfigs.add(splitedTaskConfig); } } else { if (this.writeSingleObject) { writerSplitConfigs = doSplitForWriteSingleObject(mandatoryNumber); } else { writerSplitConfigs = doSplitForWriteMultiObject(mandatoryNumber); } } LOG.info("end do split. split size: {}", writerSplitConfigs.size()); return writerSplitConfigs; } /** * 针对多个task写单个文件模式,新增split逻辑 * * @param mandatoryNumber * @return */ private List doSplitForWriteSingleObject(int mandatoryNumber) { LOG.info("writeSingleObject is true, begin do split for write single object."); List writerSplitConfigs = new ArrayList(); String object = this.writerSliceConfig.getString(Key.OBJECT); InitiateMultipartUploadRequest uploadRequest = this.ossWriterProxy.getInitiateMultipartUploadRequest( object); InitiateMultipartUploadResult uploadResult; try { uploadResult = this.ossWriterProxy.initiateMultipartUpload( uploadRequest); } catch (Exception e) { LOG.error("initiateMultipartUpload error: {}", e.getMessage(), e); throw DataXException.asDataXException(e.getMessage()); } /** * 如果需要写同一个object,需要保证使用同一个upload Id * see: https://help.aliyun.com/document_detail/31993.html */ String uploadId = uploadResult.getUploadId(); OssSingleObject.uploadId = uploadId; LOG.info("writeSingleObject use uploadId: {}", uploadId); for (int i = 0; i < mandatoryNumber; i++) { Configuration splitedTaskConfig = this.writerSliceConfig .clone(); splitedTaskConfig.set(Key.OBJECT, object); splitedTaskConfig.set(Key.UPLOAD_ID, uploadId); writerSplitConfigs.add(splitedTaskConfig); } return writerSplitConfigs; } /** * osswriter多个task写多个object文件split逻辑,历史已有该逻辑,保持向前兼容性 * * @param mandatoryNumber * @return */ private List doSplitForWriteMultiObject(int mandatoryNumber) { List writerSplitConfigs = new ArrayList(); String bucket = this.writerSliceConfig.getString(Key.BUCKET); String object = this.writerSliceConfig.getString(Key.OBJECT); Set allObjects = new HashSet(); try { List ossObjectlisting = this.ossClient .listObjects(bucket).getObjectSummaries(); for (OSSObjectSummary objectSummary : ossObjectlisting) { allObjects.add(objectSummary.getKey()); } } catch (OSSException e) { throw DataXException.asDataXException( OssWriterErrorCode.OSS_COMM_ERROR, e.getMessage(), e); } catch (ClientException e) { throw DataXException.asDataXException( OssWriterErrorCode.OSS_COMM_ERROR, e.getMessage(), e); } String objectSuffix; for (int i = 0; i < mandatoryNumber; i++) { // handle same object name Configuration splitedTaskConfig = this.writerSliceConfig .clone(); String fullObjectName = null; objectSuffix = StringUtils.replace( UUID.randomUUID().toString(), "-", ""); fullObjectName = String.format("%s__%s", object, objectSuffix); while (allObjects.contains(fullObjectName)) { objectSuffix = StringUtils.replace(UUID.randomUUID() .toString(), "-", ""); fullObjectName = String.format("%s__%s", object, objectSuffix); } allObjects.add(fullObjectName); splitedTaskConfig.set(Key.OBJECT, fullObjectName); LOG.info(String.format("splited write object name:[%s]", fullObjectName)); writerSplitConfigs.add(splitedTaskConfig); } return writerSplitConfigs; } private boolean isPeer2PeerCopyMode() { return this.isBinaryFile || com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.SYNC_MODE_VALUE_COPY .equalsIgnoreCase(this.syncMode); } private String getObjectDir(String object) { String dir = null; if (StringUtils.isBlank(object)) { dir = ""; } else { dir = object.trim(); dir = dir.endsWith("/") ? dir : String.format("%s/", dir); } return dir; } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private OSSClient ossClient; private Configuration writerSliceConfig; private String bucket; private String object; private String nullFormat; private String encoding; private String dateFormat; private DateFormat dateParse; private String fileFormat; private List header; private Long maxFileSize;// MB private String suffix; private Boolean encrypt;// 是否在服务器端进行加密存储 private long blockSizeInByte; private Boolean isBinaryFile; private String objectDir; private String syncMode; private int parentPathLength; private String byteEncoding; private HdfsWriter.Task hdfsWriterTask; private boolean useHdfsWriterProxy = false; private boolean writeSingleObject; private String uploadId; private OssWriterProxy ossWriterProxy; private List partition; private boolean generateEmptyFile; @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.fileFormat = this.writerSliceConfig .getString( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_FORMAT, com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.FILE_FORMAT_TEXT); this.useHdfsWriterProxy = HdfsParquetUtil.isUseHdfsWriterProxy(this.fileFormat); if(useHdfsWriterProxy){ this.hdfsWriterTask = new HdfsWriter.Task(); this.hdfsWriterTask.setPeerPluginJobConf(this.getPeerPluginJobConf()); this.hdfsWriterTask.setPeerPluginName(this.getPeerPluginName()); this.hdfsWriterTask.setPluginJobConf(this.getPluginJobConf()); this.hdfsWriterTask.setReaderPluginSplitConf(this.getReaderPluginSplitConf()); this.hdfsWriterTask.setTaskGroupId(this.getTaskGroupId()); this.hdfsWriterTask.setTaskId(this.getTaskId()); this.hdfsWriterTask.setTaskPluginCollector(this.getTaskPluginCollector()); this.hdfsWriterTask.init(); return; } this.ossClient = OssUtil.initOssClient(this.writerSliceConfig); this.bucket = this.writerSliceConfig.getString(Key.BUCKET); this.object = this.writerSliceConfig.getString(Key.OBJECT); this.nullFormat = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.NULL_FORMAT); this.dateFormat = this.writerSliceConfig .getString( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.DATE_FORMAT, null); if (StringUtils.isNotBlank(this.dateFormat)) { this.dateParse = new SimpleDateFormat(dateFormat); } this.encoding = this.writerSliceConfig .getString( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.ENCODING, com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.DEFAULT_ENCODING); this.header = this.writerSliceConfig .getList( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.HEADER, null, String.class); this.maxFileSize = this.writerSliceConfig .getLong( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.MAX_FILE_SIZE, com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.MAX_FILE_SIZE); this.suffix = this.writerSliceConfig .getString( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.SUFFIX, com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.DEFAULT_SUFFIX); this.suffix = this.suffix.trim();// warn: need trim this.encrypt = this.writerSliceConfig.getBool(Key.ENCRYPT, false); // 设置每块字符串长度 this.blockSizeInByte = this.writerSliceConfig.getLong(Key.BLOCK_SIZE_IN_MB, 10L) * 1024 * 1024; this.isBinaryFile = this.writerSliceConfig.getBool( com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.BINARY, false); this.objectDir = this.getObjectDir(this.object); this.syncMode = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.SYNC_MODE, ""); this.parentPathLength = this.writerSliceConfig.getInt("__parentPathLength", 0); this.byteEncoding = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.BYTE_ENCODING); this.writeSingleObject = this.writerSliceConfig.getBool(Key.WRITE_SINGLE_OBJECT, false); this.uploadId = this.writerSliceConfig.getString(Key.UPLOAD_ID); this.ossWriterProxy = new OssWriterProxy(this.writerSliceConfig, this.ossClient); this.partition = this.writerSliceConfig.getList(Key.PARTITION, new ArrayList<>(), String.class); //是否生成空文件开关 this.generateEmptyFile = this.writerSliceConfig.getBool(Key.GENERATE_EMPTY_FILE,true); } @Override public void startWrite(RecordReceiver lineReceiver) { if(useHdfsWriterProxy){ hdfsWriterTask.startWrite(lineReceiver); return; } if (this.isPeer2PeerCopyMode()) { // 对等拷贝 this.startWriteBinaryFile(lineReceiver); } else if (this.writeSingleObject) { this.startWriteSingleObjectUnstructedStorageFile(lineReceiver); } else { this.startWriteUnstructedStorageFile(lineReceiver,generateEmptyFile); } } /** * 单object写入 * * @param lineReceiver */ public void startWriteSingleObjectUnstructedStorageFile(RecordReceiver lineReceiver) { try { Record record; String currentObject = this.object; List currentPartETags = new ArrayList(); //warn: may be StringBuffer->StringBuilder StringWriter sw = new StringWriter(); StringBuffer sb = sw.getBuffer(); UnstructuredWriter unstructuredWriter = UnstructuredStorageWriterUtil. produceUnstructuredWriter(this.fileFormat, this.writerSliceConfig, sw); while ((record = lineReceiver.getFromReader()) != null) { //单文件同步暂不支持轮转[目前单文件支持同步约最大100GB大小] if (OssSingleObject.currentPartNumber.intValue() > Constant.MAX_BLOCK_SIZE) { throw DataXException.asDataXException(String.format("When writeSingleObject is true, the write size of your single object has exceeded the maximum value of %s MB.", (Constant.MAX_BLOCK_SIZE * this.blockSizeInByte / 1024 / 1024))); } // write: upload data to current object UnstructuredStorageWriterUtil.transportOneRecord(record, this.nullFormat, this.dateParse, this.getTaskPluginCollector(), unstructuredWriter, this.byteEncoding); // 达到 this.blockSizeInByte ,上传文件块 if (sb.length() >= this.blockSizeInByte) { LOG.info(String .format("write to bucket: [%s] object: [%s] with oss uploadId: [%s], currentPartNumber: %s", this.bucket, currentObject, this.uploadId, OssSingleObject.currentPartNumber.intValue())); byte[] byteArray = sw.toString().getBytes(this.encoding); this.ossWriterProxy.uploadOnePartForSingleObject(byteArray, this.uploadId, currentPartETags, currentObject, this::getHeaderBytes); sb.setLength(0); } } //将本task所有upload的part加入到allPartETags中 OssSingleObject.allPartETags.addAll(currentPartETags); //将task未写完的最后一个block加入到 OssSingleObject.lastBlockBuffer 中,待job阶段合并上传 if (sb.length() > 0) { byte[] lastBlock = sw.toString().getBytes(this.encoding); LOG.info("begin add last block to buffer, last block size: {}", lastBlock.length); OssSingleObject.addLastBlockBuffer(lastBlock, this.ossWriterProxy, this.blockSizeInByte, this.object, this::getHeaderBytes); } } catch (IOException e) { // 脏数据UnstructuredStorageWriterUtil.transportOneRecord已经记录,header // 都是字符串不认为有脏数据 throw DataXException.asDataXException( OssWriterErrorCode.Write_OBJECT_ERROR, e.getMessage(), e); } catch (Exception e) { throw DataXException.asDataXException( OssWriterErrorCode.Write_OBJECT_ERROR, e.getMessage(), e); } LOG.info("single oss object end do write"); } private byte[] getHeaderBytes() throws IOException { if (null != this.header && !this.header.isEmpty()) { // write header to writer try (StringWriter sw = new StringWriter(); UnstructuredWriter headerWriter = UnstructuredStorageWriterUtil. produceUnstructuredWriter(this.fileFormat, this.writerSliceConfig, sw)) { headerWriter.writeOneRecord(this.header); return sw.toString().getBytes(this.encoding); } } return new byte[0]; } /** * 同步音视频等无结构化文件 * warn: 代码和startWriteUnstructedStorageFile重复程度太高,后续需要继续重构 */ private void startWriteBinaryFile(RecordReceiver lineReceiver) { Record record; String currentObject = null; InitiateMultipartUploadRequest currentInitiateMultipartUploadRequest; InitiateMultipartUploadResult currentInitiateMultipartUploadResult = null; String lastUploadId = null; boolean gotData = false; List currentPartETags = null; int currentPartNumber = 1; Map meta; ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); long currentSize = 0; try { // warn boolean needInitMultipartTransform = true; while ((record = lineReceiver.getFromReader()) != null) { Column column = record.getColumn(0); meta = record.getMeta(); assert meta != null; gotData = true; String objectNameTmp = meta .get(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.META_KEY_FILE_PATH); String fullObjectNameTmp = String.format("%s%s", this.objectDir, objectNameTmp.substring(this.parentPathLength, objectNameTmp.length())); // init: 2 condition begin new multipart upload if (needInitMultipartTransform || !StringUtils.equals(currentObject, fullObjectNameTmp)) { // 先将上一个分块上传的request complete掉 if (null != currentInitiateMultipartUploadResult) { // 如果还有部分分库数据没有提交,则先提交 if (currentSize > 0) { this.ossWriterProxy.uploadOnePart(byteArrayOutputStream.toByteArray(), currentPartNumber, currentInitiateMultipartUploadResult, currentPartETags, currentObject); currentPartNumber++; currentSize = 0; byteArrayOutputStream.reset(); } // TODO 如果当前文件是空文件 String commitKey = currentInitiateMultipartUploadResult.getKey(); LOG.info(String.format( "current object [%s] size %s, complete current multipart upload %s and begin new one", commitKey, currentPartNumber * this.blockSizeInByte, currentInitiateMultipartUploadResult.getUploadId())); CompleteMultipartUploadRequest currentCompleteMultipartUploadRequest = new CompleteMultipartUploadRequest( this.bucket, commitKey, currentInitiateMultipartUploadResult.getUploadId(), currentPartETags); CompleteMultipartUploadResult currentCompleteMultipartUploadResult = this.ossWriterProxy.completeMultipartUpload( currentCompleteMultipartUploadRequest); lastUploadId = currentInitiateMultipartUploadResult.getUploadId(); LOG.info(String.format("final object [%s] etag is:[%s]", commitKey, currentCompleteMultipartUploadResult.getETag())); } // 这里发现一个全新的文件需要分块上传 currentObject = fullObjectNameTmp; currentInitiateMultipartUploadRequest = this.ossWriterProxy.getInitiateMultipartUploadRequest(currentObject); currentInitiateMultipartUploadResult = this.ossWriterProxy.initiateMultipartUpload( currentInitiateMultipartUploadRequest); currentPartETags = new ArrayList(); LOG.info(String.format("write to bucket: [%s] object: [%s] with oss uploadId: [%s]", this.bucket, currentObject, currentInitiateMultipartUploadResult.getUploadId())); // warn needInitMultipartTransform = false; currentPartNumber = 1; } // write: upload data to current object byte[] data; if (column instanceof BytesColumn) { data = column.asBytes(); byteArrayOutputStream.write(data); currentSize += data.length; } else { String message = "the type of column must be BytesColumn!"; throw DataXException.asDataXException(OssWriterErrorCode.Write_OBJECT_ERROR, message); } if (currentSize >= this.blockSizeInByte) { this.ossWriterProxy.uploadOnePart(byteArrayOutputStream.toByteArray(), currentPartNumber, currentInitiateMultipartUploadResult, currentPartETags, currentObject); currentPartNumber++; currentSize = 0; byteArrayOutputStream.reset(); } } // TODO binary 模式读取,源头为空文件时是有问题的 if (!gotData) { LOG.info("Receive no data from the source."); currentInitiateMultipartUploadRequest = new InitiateMultipartUploadRequest(this.bucket, currentObject); currentInitiateMultipartUploadResult = this.ossWriterProxy.initiateMultipartUpload( currentInitiateMultipartUploadRequest); currentPartETags = new ArrayList(); } // warn: may be some data stall in byteArrayOutputStream if (byteArrayOutputStream.size() > 0) { this.ossWriterProxy.uploadOnePart(byteArrayOutputStream.toByteArray(), currentPartNumber, currentInitiateMultipartUploadResult, currentPartETags, currentObject); currentPartNumber++; } // 避免重复提交 if (!StringUtils.equals(lastUploadId, currentInitiateMultipartUploadResult.getUploadId())) { CompleteMultipartUploadRequest completeMultipartUploadRequest = new CompleteMultipartUploadRequest( this.bucket, currentObject, currentInitiateMultipartUploadResult.getUploadId(), currentPartETags); CompleteMultipartUploadResult completeMultipartUploadResult = this.ossWriterProxy.completeMultipartUpload( completeMultipartUploadRequest); LOG.info(String.format("final object etag is:[%s]", completeMultipartUploadResult.getETag())); } } catch (IOException e) { // 脏数据UnstructuredStorageWriterUtil.transportOneRecord已经记录,header // 都是字符串不认为有脏数据 throw DataXException.asDataXException(OssWriterErrorCode.Write_OBJECT_ERROR, e.getMessage(), e); } catch (Exception e) { throw DataXException.asDataXException(OssWriterErrorCode.Write_OBJECT_ERROR, e.getMessage(), e); } LOG.info("end do write"); } /** * 开始写半结构化文件 * * @param lineReceiver */ private void startWriteUnstructedStorageFile(RecordReceiver lineReceiver, boolean generateEmptyFile){ // 设置每块字符串长度 long numberCacul = (this.maxFileSize * 1024 * 1024L) / this.blockSizeInByte; final long maxPartNumber = numberCacul >= 1 ? numberCacul : 1; int objectRollingNumber = 0; Record record; String currentObject = this.object; if (this.isPeer2PeerCopyMode()) { currentObject = null; } else { // 加上suffix currentObject = appedSuffixTo(currentObject); } InitiateMultipartUploadRequest currentInitiateMultipartUploadRequest; InitiateMultipartUploadResult currentInitiateMultipartUploadResult = null; String lastUploadId = null; boolean gotData = false; List currentPartETags = null; // to do: // 可以根据currentPartNumber做分块级别的重试,InitiateMultipartUploadRequest多次一个currentPartNumber会覆盖原有 int currentPartNumber = 1; Map meta; //warn: may be StringBuffer->StringBuilder StringWriter sw = new StringWriter(); StringBuffer sb = sw.getBuffer(); UnstructuredWriter unstructuredWriter = UnstructuredStorageWriterUtil. produceUnstructuredWriter(this.fileFormat, this.writerSliceConfig, sw); LOG.info(String.format( "begin do write, each object maxFileSize: [%s]MB...", maxPartNumber * 10)); try { // warn 源头可能是MySQL中,导致没有meta这个第一次初始化标示省不掉 boolean needInitMultipartTransform = true; while ((record = lineReceiver.getFromReader()) != null) { meta = record.getMeta(); gotData = true; // init: 2 condition begin new multipart upload 轮转策略(文件名规则)不一致 // condition: 对等拷贝模式 && Record中的Meta切换文件名 && // condition: 类log4j日志轮转 && !对等拷贝模式 boolean realyNeedInitUploadRequest = false; if (this.isPeer2PeerCopyMode()) { assert meta != null; String objectNameTmp = meta .get(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.META_KEY_FILE_PATH); String fullObjectNameTmp = String.format("%s%s", this.objectDir, objectNameTmp.substring(this.parentPathLength, objectNameTmp.length())); if (!StringUtils.equals(currentObject, fullObjectNameTmp)) { currentObject = fullObjectNameTmp; realyNeedInitUploadRequest = true; } } else { if (needInitMultipartTransform || currentPartNumber > maxPartNumber) { currentObject = getCurrentObject(objectRollingNumber, record); objectRollingNumber++; realyNeedInitUploadRequest = true; } } if (realyNeedInitUploadRequest) { // 先将上一个分块上传的request complete掉 if (null != currentInitiateMultipartUploadResult) { if (sb.length() > 0) { this.uploadOnePart(sw, currentPartNumber, currentInitiateMultipartUploadResult, currentPartETags, currentObject); currentPartNumber++; sb.setLength(0); } // TODO 如果当前文件是空文件 String commitKey = currentInitiateMultipartUploadResult.getKey(); LOG.info(String.format( "current object [%s] size %s, complete current multipart upload %s and begin new one", commitKey, currentPartNumber * this.blockSizeInByte, currentInitiateMultipartUploadResult.getUploadId())); CompleteMultipartUploadRequest currentCompleteMultipartUploadRequest = new CompleteMultipartUploadRequest( this.bucket, commitKey, currentInitiateMultipartUploadResult.getUploadId(), currentPartETags); CompleteMultipartUploadResult currentCompleteMultipartUploadResult = this.ossWriterProxy.completeMultipartUpload( currentCompleteMultipartUploadRequest); lastUploadId = currentInitiateMultipartUploadResult.getUploadId(); LOG.info(String.format("final object [%s] etag is:[%s]", commitKey, currentCompleteMultipartUploadResult.getETag())); } currentInitiateMultipartUploadRequest = this.ossWriterProxy.getInitiateMultipartUploadRequest(currentObject); currentInitiateMultipartUploadResult = this.ossWriterProxy.initiateMultipartUpload(currentInitiateMultipartUploadRequest); currentPartETags = new ArrayList(); LOG.info(String .format("write to bucket: [%s] object: [%s] with oss uploadId: [%s]", this.bucket, currentObject, currentInitiateMultipartUploadResult .getUploadId())); // each object's header if (null != this.header && !this.header.isEmpty()) { unstructuredWriter.writeOneRecord(this.header); } // warn needInitMultipartTransform = false; currentPartNumber = 1; } // write: upload data to current object UnstructuredStorageWriterUtil.transportOneRecord(record, this.nullFormat, this.dateParse, this.getTaskPluginCollector(), unstructuredWriter, this.byteEncoding); if (sb.length() >= this.blockSizeInByte) { this.uploadOnePart(sw, currentPartNumber, currentInitiateMultipartUploadResult, currentPartETags, currentObject); currentPartNumber++; sb.setLength(0); } } if (!gotData) { LOG.info("Receive no data from the source."); currentInitiateMultipartUploadRequest = new InitiateMultipartUploadRequest( this.bucket, currentObject); currentInitiateMultipartUploadResult = this.ossWriterProxy.initiateMultipartUpload(currentInitiateMultipartUploadRequest); currentPartETags = new ArrayList(); // each object's header if (null != this.header && !this.header.isEmpty()) { unstructuredWriter.writeOneRecord(this.header); } } // warn: may be some data stall in sb if (0 < sb.length()) { this.uploadOnePart(sw, currentPartNumber, currentInitiateMultipartUploadResult, currentPartETags, currentObject); } // 避免重复提交 if (!StringUtils.equals(lastUploadId, currentInitiateMultipartUploadResult.getUploadId())) { CompleteMultipartUploadRequest completeMultipartUploadRequest = new CompleteMultipartUploadRequest( this.bucket, currentObject, currentInitiateMultipartUploadResult.getUploadId(), currentPartETags); if (gotData) { completeUpload(completeMultipartUploadRequest); } else{ if (generateEmptyFile) { LOG.info("Due to without data, oss will generate empty file, " + "the generateEmptyFile is {}, you can set it false to avoid this",generateEmptyFile); completeUpload(completeMultipartUploadRequest); } else { LOG.info("The generateEmptyFile is false, datax will not generate empty file"); } } } } catch (IOException e) { // 脏数据UnstructuredStorageWriterUtil.transportOneRecord已经记录,header // 都是字符串不认为有脏数据 throw DataXException.asDataXException( OssWriterErrorCode.Write_OBJECT_ERROR, e.getMessage(), e); } catch (Exception e) { throw DataXException.asDataXException( OssWriterErrorCode.Write_OBJECT_ERROR, e.getMessage(), e); } LOG.info("end do write"); } private void completeUpload(CompleteMultipartUploadRequest completeMultipartUploadRequest) throws Exception { CompleteMultipartUploadResult completeMultipartUploadResult = this.ossWriterProxy.completeMultipartUpload(completeMultipartUploadRequest); LOG.info(String.format("final object etag is:[%s]", completeMultipartUploadResult.getETag())); } private String getCurrentObject(int objectRollingNumber, Record record) { String currentObject = this.object; if (!this.partition.isEmpty()) { String partitionValues = getPartitionValues(record); currentObject = String.format("%s_%s", currentObject, partitionValues); } if (objectRollingNumber > 0) { currentObject = String.format("%s_%s", currentObject, objectRollingNumber); } currentObject = appedSuffixTo(currentObject); return currentObject; } private String getPartitionValues(Record record) { // config like "partition": "ds,venture" String partitionValues = ""; // assume that partition columns are located in the last of order for (int i = 0; i < this.partition.size(); i++) { partitionValues += record.getColumn(record.getColumnNumber() - 1 - i).asString(); } return partitionValues; } private String appedSuffixTo(String currentObject) { StringBuilder sbCurrentObject = new StringBuilder(currentObject); if (StringUtils.isNotBlank(this.suffix)) { if (!this.suffix.startsWith(".")) { sbCurrentObject.append("."); } sbCurrentObject.append(suffix); } return sbCurrentObject.toString(); } /** * 对于同一个UploadID,该号码不但唯一标识这一块数据,也标识了这块数据在整个文件内的相对位置。 * 如果你用同一个part号码,上传了新的数据,那么OSS上已有的这个号码的Part数据将被覆盖。 * * @throws Exception * */ private void uploadOnePart( final StringWriter sw, final int partNumber, final InitiateMultipartUploadResult currentInitiateMultipartUploadResult, final List partETags, final String currentObject) throws Exception { final String encoding = this.encoding; final byte[] byteArray = sw.toString().getBytes(encoding); this.ossWriterProxy.uploadOnePart(byteArray, partNumber, currentInitiateMultipartUploadResult, partETags, currentObject); } @Override public void prepare() { if(useHdfsWriterProxy){ hdfsWriterTask.prepare(); return; } } @Override public void post() { if(useHdfsWriterProxy){ hdfsWriterTask.post(); return; } } @Override public void destroy() { if(useHdfsWriterProxy){ hdfsWriterTask.destroy(); return; } try { // this.ossClient.shutdown(); } catch (Exception e) { LOG.warn("shutdown ossclient meet a exception:" + e.getMessage(), e); } } private boolean isPeer2PeerCopyMode() { return this.isBinaryFile || com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.SYNC_MODE_VALUE_COPY .equalsIgnoreCase(this.syncMode); } private String getObjectDir(String object) { String dir = null; if (StringUtils.isBlank(object)) { dir = ""; } else { dir = object.trim(); dir = dir.endsWith("/") ? dir : String.format("%s/", dir); } return dir; } } } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/OssWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.osswriter; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by haiwei.luo on 14-9-17. */ public enum OssWriterErrorCode implements ErrorCode { CONFIG_INVALID_EXCEPTION("OssWriter-00", "您的参数配置错误."), REQUIRED_VALUE("OssWriter-01", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("OssWriter-02", "您填写的参数值不合法."), Write_OBJECT_ERROR("OssWriter-03", "您配置的目标Object在写入时异常."), OSS_COMM_ERROR("OssWriter-05", "执行相应的OSS操作异常."), ; private final String code; private final String description; private OssWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/OssWriterProxy.java ================================================ package com.alibaba.datax.plugin.writer.osswriter; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.aliyun.oss.OSSClient; import com.aliyun.oss.model.*; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.ArrayUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.ByteArrayInputStream; import java.io.InputStream; import java.util.List; import java.util.concurrent.Callable; /** * @Author: guxuan * @Date 2022-05-17 16:29 */ public class OssWriterProxy { private static Logger logger = LoggerFactory.getLogger(OssWriterProxy.class); private OSSClient ossClient; private Configuration configuration; /** * 是否在服务器端进行加密存储 */ private Boolean encrypt; private String bucket; public OssWriterProxy (Configuration configuration, OSSClient ossClient) { this.configuration = configuration; this.ossClient = ossClient; this.encrypt = configuration.getBool(Key.ENCRYPT, false); this.bucket = configuration.getString(Key.BUCKET); } public InitiateMultipartUploadRequest getInitiateMultipartUploadRequest(String currentObject){ InitiateMultipartUploadRequest currentInitiateMultipartUploadRequest; if( !this.encrypt ) { currentInitiateMultipartUploadRequest = new InitiateMultipartUploadRequest( this.bucket, currentObject); } else { // 将数据加密存储在oss ObjectMetadata objectMetadata = new ObjectMetadata(); objectMetadata.setHeader("x-oss-server-side-encryption", ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION); currentInitiateMultipartUploadRequest = new InitiateMultipartUploadRequest( this.bucket, currentObject, objectMetadata); } return currentInitiateMultipartUploadRequest; } public InitiateMultipartUploadResult initiateMultipartUpload( final InitiateMultipartUploadRequest currentInitiateMultipartUploadRequest) throws Exception { final OSSClient ossClient = this.ossClient; return RetryUtil.executeWithRetry(new Callable() { @Override public InitiateMultipartUploadResult call() throws Exception { return ossClient.initiateMultipartUpload(currentInitiateMultipartUploadRequest); } }, 10, 1000L, false); } public CompleteMultipartUploadResult completeMultipartUpload( final CompleteMultipartUploadRequest currentCompleteMultipartUploadRequest) throws Exception { final OSSClient ossClient = this.ossClient; return RetryUtil.executeWithRetry(new Callable() { @Override public CompleteMultipartUploadResult call() throws Exception { return ossClient.completeMultipartUpload(currentCompleteMultipartUploadRequest); } }, 10, 1000L, false); } public void uploadOnePart( final byte[] byteArray, final int partNumber, final InitiateMultipartUploadResult currentInitiateMultipartUploadResult, final List partETags, final String currentObject) throws Exception { final String bucket = this.bucket; final OSSClient ossClient = this.ossClient; RetryUtil.executeWithRetry(new Callable() { @Override public Boolean call() throws Exception { InputStream inputStream = new ByteArrayInputStream( byteArray); // 创建UploadPartRequest,上传分块 UploadPartRequest uploadPartRequest = new UploadPartRequest(); uploadPartRequest.setBucketName(bucket); uploadPartRequest.setKey(currentObject); uploadPartRequest.setUploadId(currentInitiateMultipartUploadResult.getUploadId()); uploadPartRequest.setInputStream(inputStream); uploadPartRequest.setPartSize(byteArray.length); uploadPartRequest.setPartNumber(partNumber); UploadPartResult uploadPartResult = ossClient .uploadPart(uploadPartRequest); partETags.add(uploadPartResult.getPartETag()); logger.info(String .format("upload part [%s] size [%s] Byte has been completed.", partNumber, byteArray.length)); IOUtils.closeQuietly(inputStream); return true; } }, 10, 1000L, false); } public void abortMultipartUpload(final String currentObject, final String uploadId) { final String bucket = this.bucket; final OSSClient ossClient = this.ossClient; try { RetryUtil.executeWithRetry((Callable) () -> { AbortMultipartUploadRequest abortMultipartUploadRequest = new AbortMultipartUploadRequest(bucket, currentObject, uploadId); ossClient.abortMultipartUpload(abortMultipartUploadRequest); return null; }, 5, 1, true); } catch (Throwable e) { logger.error(String.format("AbortMultipartUpload failed, msg is %s",e.getMessage()), e); } } public void uploadOnePartForSingleObject( final byte[] byteArray, final String uploadId, final List partETags, final String currentObject, final HeaderProvider headerProvider) throws Exception { final String bucket = this.bucket; final OSSClient ossClient = this.ossClient; RetryUtil.executeWithRetry(new Callable() { @Override public Boolean call() throws Exception { // 创建UploadPartRequest,上传分块 UploadPartRequest uploadPartRequest = new UploadPartRequest(); uploadPartRequest.setPartNumber(OssSingleObject.currentPartNumber.getAndIncrement()); byte[] data = byteArray; if (uploadPartRequest.getPartNumber() == 1) { // write header byte[] headerBytes = headerProvider.getHeader(); logger.info("write header to part {}. header size: {}", uploadPartRequest.getPartNumber(), ArrayUtils.getLength(headerBytes)); data = ArrayUtils.addAll(headerBytes, byteArray); } ByteArrayInputStream inputStream = new ByteArrayInputStream(data); uploadPartRequest.setBucketName(bucket); uploadPartRequest.setKey(currentObject); uploadPartRequest.setUploadId(uploadId); uploadPartRequest.setInputStream(inputStream); uploadPartRequest.setPartSize(data.length); UploadPartResult uploadPartResult = ossClient .uploadPart(uploadPartRequest); partETags.add(uploadPartResult.getPartETag()); logger.info("upload part number [{}] size [{}] Byte has been completed, uploadId: {}.", uploadPartRequest.getPartNumber(), data.length, uploadId); IOUtils.closeQuietly(inputStream); return true; } }, 10, 1000L, false); } public interface HeaderProvider { byte[] getHeader() throws Exception; } } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/parquet/ParquetFileProccessor.java ================================================ package com.alibaba.datax.plugin.writer.osswriter.parquet; import org.apache.hadoop.fs.Path; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import parquet.hadoop.ParquetWriter; import parquet.hadoop.metadata.CompressionCodecName; import parquet.schema.MessageType; import java.io.IOException; /** * @Author: guxuan * @Date 2022-05-17 16:23 */ public class ParquetFileProccessor extends ParquetWriter { private Path path; public ParquetFileProccessor(Path path, MessageType schema, Configuration taskConfig, TaskPluginCollector taskPluginCollector) throws IOException { this(path, schema, false, taskConfig, taskPluginCollector); this.path = path; } public ParquetFileProccessor(Path path, MessageType schema, boolean enableDictionary, Configuration taskConfig, TaskPluginCollector taskPluginCollector) throws IOException { this(path, schema, CompressionCodecName.UNCOMPRESSED, enableDictionary, taskConfig, taskPluginCollector); this.path = path; } public ParquetFileProccessor(Path path, MessageType schema, CompressionCodecName codecName, boolean enableDictionary, Configuration taskConfig, TaskPluginCollector taskPluginCollector) throws IOException { super(path, new ParquetFileSupport(schema, taskConfig, taskPluginCollector), codecName, DEFAULT_BLOCK_SIZE, DEFAULT_PAGE_SIZE, enableDictionary, false); this.path = path; } public byte[] getParquetRawData() { if (null == this.path) { return null; } else { return null; } } } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/parquet/ParquetFileSupport.java ================================================ package com.alibaba.datax.plugin.writer.osswriter.parquet; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.plugin.unstructuredstorage.writer.Key; import com.alibaba.datax.plugin.writer.osswriter.Constant; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONArray; import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.conf.Configuration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import parquet.hadoop.api.WriteSupport; import parquet.io.api.Binary; import parquet.io.api.RecordConsumer; import parquet.schema.*; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.HashMap; import java.util.List; /** * @Author: guxuan * @Date 2022-05-17 16:25 */ public class ParquetFileSupport extends WriteSupport { public static final Logger LOGGER = LoggerFactory.getLogger(ParquetFileSupport.class); private MessageType schema; private RecordConsumer recordConsumer; private boolean printStackTrace = true; // 不通类型的nullFormat private String nullFormat; private String dateFormat; private DateFormat dateParse; private Binary binaryForNull; private TaskPluginCollector taskPluginCollector; public ParquetFileSupport(MessageType schema, com.alibaba.datax.common.util.Configuration taskConfig, TaskPluginCollector taskPluginCollector) { this.schema = schema; // 不通类型的nullFormat this.nullFormat = taskConfig.getString(Key.NULL_FORMAT, Constant.DEFAULT_NULL_FORMAT); this.binaryForNull = Binary.fromString(this.nullFormat); this.dateFormat = taskConfig.getString(Key.DATE_FORMAT, null); if (StringUtils.isNotBlank(this.dateFormat)) { this.dateParse = new SimpleDateFormat(dateFormat); } this.taskPluginCollector = taskPluginCollector; } @Override public WriteContext init(Configuration configuration) { return new WriteContext(schema, new HashMap()); } @Override public void prepareForWrite(RecordConsumer recordConsumer) { this.recordConsumer = recordConsumer; } @Override public void write(Record values) { LOGGER.info("Writing parquet data using fields mode(The correct mode.)"); List types = this.schema.getFields(); if (values != null && types != null && values.getColumnNumber() == types.size()) { recordConsumer.startMessage(); writeFields(types, values); recordConsumer.endMessage(); } } private void writeFields(List types, Record values) { for (int i = 0; i < types.size(); i++) { Type type = types.get(i); Column value = values.getColumn(i); if (value != null) { try { if (type.isPrimitive()) { writePrimitiveType(type, value, i); } else { writeGroupType(type, (JSON) JSON.parse(value.asString()), i); } } catch (Exception e) { if (printStackTrace) { printStackTrace = false; LOGGER.warn("write to parquet error: {}", e.getMessage(), e); } // dirty data if (null != this.taskPluginCollector) { // job post 里面的merge taskPluginCollector 为null this.taskPluginCollector.collectDirtyRecord(values, e, e.getMessage()); } } } } } private void writeFields(List types, JSONObject values) { for (int i = 0; i < types.size(); i++) { Type type = types.get(i); Object value = values.get(type.getName()); if (value != null) { try { if (type.isPrimitive()) { writePrimitiveType(type, value, i); } else { writeGroupType(type, (JSON) value, i); } } catch (Exception e) { if (printStackTrace) { printStackTrace = false; LOGGER.warn("write to parquet error: {}", e.getMessage(), e); } } } else { recordConsumer.addBinary(this.binaryForNull); } } } private void writeGroupType(Type type, JSON value, int index) { GroupType groupType = type.asGroupType(); OriginalType originalType = groupType.getOriginalType(); if (originalType != null) { switch (originalType) { case MAP: writeMap(groupType, value, index); break; case LIST: writeList(groupType, value, index); break; default: break; } } else { // struct writeStruct(groupType, value, index); } } private void writeMap(GroupType groupType, JSON value, int index) { if (value == null) { return; } JSONObject json = (JSONObject) value; if (json.isEmpty()) { return; } recordConsumer.startField(groupType.getName(), index); recordConsumer.startGroup(); // map // key_value start recordConsumer.startField("key_value", 0); recordConsumer.startGroup(); List keyValueFields = groupType.getFields().get(0).asGroupType().getFields(); Type keyType = keyValueFields.get(0); Type valueType = keyValueFields.get(1); for (String key : json.keySet()) { // key writePrimitiveType(keyType, key, 0); // value if (valueType.isPrimitive()) { writePrimitiveType(valueType, json.get(key), 1); } else { writeGroupType(valueType, (JSON) json.get(key), 1); } } recordConsumer.endGroup(); recordConsumer.endField("key_value", 0); // key_value end recordConsumer.endGroup(); recordConsumer.endField(groupType.getName(), index); } private void writeList(GroupType groupType, JSON value, int index) { if (value == null) { return; } JSONArray json = (JSONArray) value; if (json.isEmpty()) { return; } recordConsumer.startField(groupType.getName(), index); // list recordConsumer.startGroup(); // list start recordConsumer.startField("list", 0); recordConsumer.startGroup(); Type elementType = groupType.getFields().get(0).asGroupType().getFields().get(0); if (elementType.isPrimitive()) { for (Object elementValue : json) { writePrimitiveType(elementType, elementValue, 0); } } else { for (Object elementValue : json) { writeGroupType(elementType, (JSON) elementValue, 0); } } recordConsumer.endGroup(); recordConsumer.endField("list", 0); // list end recordConsumer.endGroup(); recordConsumer.endField(groupType.getName(), index); } private void writeStruct(GroupType groupType, JSON value, int index) { if (value == null) { return; } JSONObject json = (JSONObject) value; if (json.isEmpty()) { return; } recordConsumer.startField(groupType.getName(), index); // struct start recordConsumer.startGroup(); writeFields(groupType.getFields(), json); recordConsumer.endGroup(); // struct end recordConsumer.endField(groupType.getName(), index); } private void writePrimitiveType(Type type, Object value, int index) { if (value == null) { return; } recordConsumer.startField(type.getName(), index); PrimitiveType primitiveType = type.asPrimitiveType(); switch (primitiveType.getPrimitiveTypeName()) { case BOOLEAN: recordConsumer.addBoolean((Boolean) value); break; case FLOAT: if (value instanceof Float) { recordConsumer.addFloat(((Float) value).floatValue()); } else if (value instanceof Double) { recordConsumer.addFloat(((Double) value).floatValue()); } else if (value instanceof Long) { recordConsumer.addFloat(((Long) value).floatValue()); } else if (value instanceof Integer) { recordConsumer.addFloat(((Integer) value).floatValue()); } break; case DOUBLE: if (value instanceof Float) { recordConsumer.addDouble(((Float) value).doubleValue()); } else if (value instanceof Double) { recordConsumer.addDouble(((Double) value).doubleValue()); } else if (value instanceof Long) { recordConsumer.addDouble(((Long) value).doubleValue()); } else if (value instanceof Integer) { recordConsumer.addDouble(((Integer) value).doubleValue()); } break; case INT32: if (value instanceof Integer) { recordConsumer.addInteger((Integer) value); } else if (value instanceof Long) { recordConsumer.addInteger(((Long) value).intValue()); } else { new IllegalArgumentException( String.format("Invalid value: %s(clazz: %s) for field: %s", value, value.getClass(), type.getName()) ); } break; case INT64: case INT96: if (value instanceof Integer) { recordConsumer.addLong(((Integer) value).longValue()); } else if (value instanceof Long) { recordConsumer.addInteger(((Long) value).intValue()); } else { new IllegalArgumentException( String.format("Invalid value: %s(clazz: %s) for field: %s", value, value.getClass(), type.getName()) ); } break; case BINARY: default: recordConsumer.addBinary(Binary.fromString((String) value)); break; } recordConsumer.endField(type.getName(), index); } private void writePrimitiveType(Type type, Column value, int index) { if (value == null || value.getRawData() == null) { return; } recordConsumer.startField(type.getName(), index); PrimitiveType primitiveType = type.asPrimitiveType(); switch (primitiveType.getPrimitiveTypeName()) { case BOOLEAN: recordConsumer.addBoolean(value.asBoolean()); break; case FLOAT: recordConsumer.addFloat(value.asDouble().floatValue()); break; case DOUBLE: recordConsumer.addDouble(value.asDouble()); break; case INT32: recordConsumer.addInteger(value.asLong().intValue()); break; case INT64: case INT96: recordConsumer.addLong(value.asLong()); break; case BINARY: String valueAsString2Write = null; if (Column.Type.DATE == value.getType() && null != this.dateParse) { valueAsString2Write = dateParse.format(value.asDate()); } else { valueAsString2Write = value.asString(); } recordConsumer.addBinary(Binary.fromString(valueAsString2Write)); break; default: recordConsumer.addBinary(Binary.fromString(value.asString())); break; } recordConsumer.endField(type.getName(), index); } } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/util/HandlerUtil.java ================================================ package com.alibaba.datax.plugin.writer.osswriter.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.util.container.CoreConstant; import com.alibaba.datax.plugin.writer.osswriter.Key; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * @Author: guxuan * @Date 2022-05-17 16:35 */ public class HandlerUtil { private static final Logger LOG = LoggerFactory.getLogger(HandlerUtil.class); /** * 将configuration处理成 ODPS->OSS的 config * * @param jobConfiguration */ public static void preHandler(Configuration jobConfiguration) { LOG.info("================ OssWriter Phase 1 preHandler starting... ================ "); Configuration writerOriginPluginConf = jobConfiguration.getConfiguration( CoreConstant.DATAX_JOB_CONTENT_WRITER_PARAMETER); Configuration writerOssPluginConf = writerOriginPluginConf.getConfiguration(Key.OSS_CONFIG); Configuration newWriterPluginConf = Configuration.newDefault(); jobConfiguration.remove(CoreConstant.DATAX_JOB_CONTENT_WRITER_PARAMETER); //将postgresqlwriter的pg配置注入到postgresqlConfig中, 供后面的postHandler使用 writerOriginPluginConf.remove(Key.OSS_CONFIG); newWriterPluginConf.set(Key.POSTGRESQL_CONFIG, writerOriginPluginConf); newWriterPluginConf.merge(writerOssPluginConf, true); //设置writer的名称为osswriter jobConfiguration.set(CoreConstant.DATAX_JOB_CONTENT_WRITER_NAME, "osswriter"); jobConfiguration.set(CoreConstant.DATAX_JOB_CONTENT_WRITER_PARAMETER, newWriterPluginConf); LOG.info("================ OssWriter Phase 1 preHandler end... ================ "); } } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/util/HdfsParquetUtil.java ================================================ package com.alibaba.datax.plugin.writer.osswriter.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriter; import com.alibaba.datax.plugin.writer.osswriter.Key; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem; import org.apache.hadoop.mapred.JobConf; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.util.HashMap; import java.util.Map; /** * @Author: guxuan * @Date 2022-05-17 16:35 */ public class HdfsParquetUtil { private static final Logger logger = LoggerFactory.getLogger(HdfsParquetUtil.class); public static boolean isUseHdfsWriterProxy( String fileFormat){ if("orc".equalsIgnoreCase(fileFormat) || "parquet".equalsIgnoreCase(fileFormat)){ return true; } return false; } /** * 配置writerSliceConfig 适配hdfswriter写oss parquet * https://help.aliyun.com/knowledge_detail/74344.html * @param hdfsWriterJob * @param writerSliceConfig */ public static void adaptConfiguration(HdfsWriter.Job hdfsWriterJob, Configuration writerSliceConfig){ String fileFormat = writerSliceConfig.getString( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_FORMAT, com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.FILE_FORMAT_TEXT); String bucket = writerSliceConfig.getString(Key.BUCKET); String fs =String.format("oss://%s",bucket); writerSliceConfig.set(com.alibaba.datax.plugin.writer.hdfswriter.Key.DEFAULT_FS,fs); writerSliceConfig.set(com.alibaba.datax.plugin.writer.hdfswriter.Key.FILE_TYPE, writerSliceConfig.getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_FORMAT)); /** * "writeMode"、 "compress"、"encoding" 、path、fileName 相互一致 */ JSONObject hadoopConfig = new JSONObject(); hadoopConfig.put(Key.FS_OSS_ACCESSID,writerSliceConfig.getString(Key.ACCESSID)); hadoopConfig.put(Key.FS_OSS_ACCESSKEY,writerSliceConfig.getString(Key.ACCESSKEY)); hadoopConfig.put(Key.FS_OSS_ENDPOINT,writerSliceConfig.getString(Key.ENDPOINT)); writerSliceConfig.set(Key.HDOOP_CONFIG,Configuration.from(JSON.toJSONString(hadoopConfig))); String object = writerSliceConfig.getString(Key.OBJECT); String path = writerSliceConfig.getString(Key.PATH); String fielName = writerSliceConfig.getString(Key.FILE_NAME); if (StringUtils.isNotBlank(object) && (StringUtils.isNotBlank(path) || StringUtils.isNotBlank(fielName))) { logger.warn("You configure both the \"object\" property and the \"path\" or \"fileName\" property, ignoring the object property. " + "It is recommended to remove the \"path\" or \"fileName\" attribute, which has been deprecated."); } //兼容之前配置了PATH的datax任务, 如果已经配置了PATH,则无需从object里解析 if (StringUtils.isBlank(path)) { Validate.notBlank(object, "object can't be blank!"); writerSliceConfig.set(Key.PATH, getPathAndFileNameFromObject(object.trim()).get(Key.PATH)); } //兼容之前配置了fileName的datax任务,如果已经配置了fileName,则无需从object里解析 if (StringUtils.isBlank(fielName)) { Validate.notBlank(object, "object can't be blank!"); writerSliceConfig.set(Key.FILE_NAME, getPathAndFileNameFromObject(object.trim()).get(Key.FILE_NAME)); } if (StringUtils.equalsIgnoreCase(fileFormat, "parquet")) { hdfsWriterJob.unitizeParquetConfig(writerSliceConfig); } } /** * 从object中 解析出 path和fileName * * 举例1: * /hello/aaa/bbb/ccc.txt * path: /hello/aaa/bbb * fileName: ccc.txt * * 举例2: * hello/aaa/bbb/ccc.txt * path: /hello/aaa/bbb * fileName: ccc.txt * * 举例3: * ccc.txt * path: / * fileName: ccc.txt * * 举例4: * /ccc.txt * path: / * fileName: ccc.txt * * @param object * @return */ public static Map getPathAndFileNameFromObject(String object) { Map pathAndFileName = new HashMap<>(); boolean isContainsBackslash = object.contains("/"); //object里没有包含"/", 则将path设置为 "/", fileName设置为 object if (!isContainsBackslash) { pathAndFileName.put(Key.PATH, "/"); pathAndFileName.put(Key.FILE_NAME, object); return pathAndFileName; } if (!object.startsWith("/")) { object = "/" + object; } int lastIndex = object.lastIndexOf("/"); String path = object.substring(0, lastIndex); String fileName = object.substring(lastIndex + 1); path = StringUtils.isNotBlank(path) ? path : "/"; logger.info("path: {}", path); logger.info("fileName: {}", fileName); pathAndFileName.put(Key.PATH, path); pathAndFileName.put(Key.FILE_NAME, fileName); return pathAndFileName; } } ================================================ FILE: osswriter/src/main/java/com/alibaba/datax/plugin/writer/osswriter/util/OssUtil.java ================================================ package com.alibaba.datax.plugin.writer.osswriter.util; import java.util.ArrayList; import java.util.List; import org.apache.commons.lang3.StringUtils; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.osswriter.Constant; import com.alibaba.datax.plugin.writer.osswriter.Key; import com.alibaba.datax.plugin.writer.osswriter.OssWriterErrorCode; import com.aliyun.oss.ClientConfiguration; import com.aliyun.oss.OSSClient; public class OssUtil { public static OSSClient initOssClient(Configuration conf) { String endpoint = conf.getString(Key.ENDPOINT); String accessId = conf.getString(Key.ACCESSID); String accessKey = conf.getString(Key.ACCESSKEY); ClientConfiguration ossConf = new ClientConfiguration(); ossConf.setSocketTimeout(Constant.SOCKETTIMEOUT); // .aliyun.com, if you are .aliyun.ga you need config this String cname = conf.getString(Key.CNAME); if (StringUtils.isNotBlank(cname)) { List cnameExcludeList = new ArrayList(); cnameExcludeList.add(cname); ossConf.setCnameExcludeList(cnameExcludeList); } OSSClient client = null; try { client = new OSSClient(endpoint, accessId, accessKey, ossConf); } catch (IllegalArgumentException e) { throw DataXException.asDataXException( OssWriterErrorCode.ILLEGAL_VALUE, e.getMessage()); } return client; } } ================================================ FILE: osswriter/src/main/resources/plugin.json ================================================ { "name": "osswriter", "class": "com.alibaba.datax.plugin.writer.osswriter.OssWriter", "description": "", "developer": "alibaba" } ================================================ FILE: osswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "osswriter", "parameter": { "endpoint": "", "accessId": "", "accessKey": "", "bucket": "", "object": "", "encoding": "", "fieldDelimiter": "", "writeMode": "" } } ================================================ FILE: otsreader/doc/otsreader.md ================================================ # OTSReader 插件文档 ___ ## 1 快速介绍 OTSReader插件实现了从OTS读取数据,并可以通过用户指定抽取数据范围可方便的实现数据增量抽取的需求。目前支持三种抽取方式: * 全表抽取 * 范围抽取 * 指定分片抽取 本版本的OTSReader新增了支持多版本数据的读取功能,同时兼容旧版本的配置文件 ## 2 实现原理 简而言之,OTSReader通过OTS官方Java SDK连接到OTS服务端,获取并按照DataX官方协议标准转为DataX字段信息传递给下游Writer端。 OTSReader会根据OTS的表范围,按照Datax并发的数目N,将范围等分为N份Task。每个Task都会有一个OTSReader线程来执行。 ## 3 功能说明 ### 3.1 配置样例 #### 3.1.1 * 配置一个从OTS表读取单版本数据的reader: ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte": 1048576 } //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { "name": "otsreader-internal", "parameter": { "endpoint":"", "accessId":"", "accessKey":"", "instanceName":"", "table": "", //version定义了是否使用新版本插件 可选值:false || true "newVersion":"false", //mode定义了读取数据的格式(普通数据/多版本数据),可选值:normal || multiversion "mode": "normal", // 导出的范围,读取的范围是[begin,end),左闭右开的区间 // begin小于end,表示正序读取数据 // begin大于end,表示反序读取数据 // begin和end不能相等 // type支持的类型有如下几类: // string、int、binary // binary输入的方式采用二进制的Base64字符串形式传入 // INF_MIN 表示无限小 // INF_MAX 表示无限大 "range":{ // 可选,默认表示从无限小开始读取 // 这个值的输入可以填写空数组,或者PK前缀,亦或者完整的PK,在正序读取数据时,默认填充PK后缀为INF_MIN,反序为INF_MAX // 例子: // 如果用户的表有2个PK,类型分别为string、int,那么如下3种输入都是合法,如: // 1. [] --> 表示从表的开始位置读取 // 2. [{"type":"string", "value":"a"}] --> 表示从[{"type":"string", "value":"a"},{"type":"INF_MIN"}] // 3. [{"type":"string", "value":"a"},{"type":"INF_MIN"}] // // binary类型的PK列比较特殊,因为Json不支持直接输入二进制数,所以系统定义:用户如果要传入 // 二进制,必须使用(Java)Base64.encodeBase64String方法,将二进制转换为一个可视化的字符串,然后将这个字符串填入value中 // 例子(Java): // byte[] bytes = "hello".getBytes(); # 构造一个二进制数据,这里使用字符串hello的byte值 // String inputValue = Base64.encodeBase64String(bytes) # 调用Base64方法,将二进制转换为可视化的字符串 // 上面的代码执行之后,可以获得inputValue为"aGVsbG8=" // 最终写入配置:{"type":"binary","value" : "aGVsbG8="} "begin":[{"type":"string", "value":"a"},{"type":"INF_MIN"}], // 默认表示读取到无限大结束 // 这个值得输入可以填写空数组,或者PK前缀,亦或者完整的PK,在正序读取数据时,默认填充PK后缀为INF_MAX,反序为INF_MIN // 可选 "end":[{"type":"string", "value":"a"},{"type":"INF_MAX"}], // 当前用户数据较多时,需要开启并发导出,Split可以将当前范围的的数据按照切分点切分为多个并发任务 // 可选 // 1. split中的输入值只能PK的第一列(分片建),且值的类型必须和PartitionKey一致 // 2. 值的范围必须在begin和end之间 // 3. split内部的值必须根据begin和end的正反序关系而递增或者递减 "split":[{"type":"string", "value":"b"}, {"type":"string", "value":"c"}] }, // 指定要导出的列,支持普通列和常量列 // 格式 // 普通列格式:{"name":"{your column name}"} // 常量列格式:{"type":"", "value":""} , type支持string、int、binary、bool、double // binary类型需要使用base64转换成对应的字符串传入 // 注意: // 1. PK列也是需要用户在下面单独指定 "column": [ {"name":"pk1"}, // 普通列,下同 {"name":"pk2"}, {"name":"attr1"}, {"type":"string","value" : ""} // 指定常量列,下同 {"type":"int","value" : ""} {"type":"double","value" : ""} // binary类型的常量列比较特殊,因为Json不支持直接输入二进制数,所以系统定义:用户如果要传入 // 二进制,必须使用(Java)Base64.encodeBase64String方法,将二进制转换为一个可视化的字符串,然后将这个字符串填入value中 // 例子(Java): // byte[] bytes = "hello".getBytes(); # 构造一个二进制数据,这里使用字符串hello的byte值 // String inputValue = Base64.encodeBase64String(bytes) # 调用Base64方法,将二进制转换为可视化的字符串 // 上面的代码执行之后,可以获得inputValue为"aGVsbG8=" // 最终写入配置:{"type":"binary","value" : "aGVsbG8="} {"type":"binary","value" : "aGVsbG8="} ], } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print": true } } } ] } } ``` #### 3.1.2 * 配置一个从OTS表读取多版本数据的reader(仅在newVersion == true时支持): ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte": 1048576 } //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { "name": "otsreader-internal", "parameter": { "endpoint":"", "accessId":"", "accessKey":"", "instanceName":"", "table": "", //version定义了是否使用新版本插件 可选值:false || true "newVersion":"true", //mode定义了读取数据的格式(普通数据/多版本数据),可选值:normal || multiversion "mode": "multiversion", // 导出的范围,,读取的范围是[begin,end),左闭右开的区间 // begin小于end,表示正序读取数据 // begin大于end,表示反序读取数据 // begin和end不能相等 // type支持的类型有如下几类: // string、int、binary // binary输入的方式采用二进制的Base64字符串形式传入 // INF_MIN 表示无限小 // INF_MAX 表示无限大 "range":{ // 可选,默认表示从无限小开始读取 // 这个值的输入可以填写空数组,或者PK前缀,亦或者完整的PK,在正序读取数据时,默认填充PK后缀为INF_MIN,反序为INF_MAX // 例子: // 如果用户的表有2个PK,类型分别为string、int,那么如下3种输入都是合法,如: // 1. [] --> 表示从表的开始位置读取 // 2. [{"type":"string", "value":"a"}] --> 表示从[{"type":"string", "value":"a"},{"type":"INF_MIN"}] // 3. [{"type":"string", "value":"a"},{"type":"INF_MIN"}] // // binary类型的PK列比较特殊,因为Json不支持直接输入二进制数,所以系统定义:用户如果要传入 // 二进制,必须使用(Java)Base64.encodeBase64String方法,将二进制转换为一个可视化的字符串,然后将这个字符串填入value中 // 例子(Java): // byte[] bytes = "hello".getBytes(); # 构造一个二进制数据,这里使用字符串hello的byte值 // String inputValue = Base64.encodeBase64String(bytes) # 调用Base64方法,将二进制转换为可视化的字符串 // 上面的代码执行之后,可以获得inputValue为"aGVsbG8=" // 最终写入配置:{"type":"binary","value" : "aGVsbG8="} "begin":[{"type":"string", "value":"a"},{"type":"INF_MIN"}], // 默认表示读取到无限大结束 // 这个值得输入可以填写空数组,或者PK前缀,亦或者完整的PK,在正序读取数据时,默认填充PK后缀为INF_MAX,反序为INF_MIN // 可选 "end":[{"type":"string", "value":"g"},{"type":"INF_MAX"}], // 当前用户数据较多时,需要开启并发导出,Split可以将当前范围的的数据按照切分点切分为多个并发任务 // 可选 // 1. split中的输入值只能PK的第一列(分片建),且值的类型必须和PartitionKey一致 // 2. 值的范围必须在begin和end之间 // 3. split内部的值必须根据begin和end的正反序关系而递增或者递减 "split":[{"type":"string", "value":"b"}, {"type":"string", "value":"c"}] }, // 指定要导出的列,在多版本模式下只支持普通列 // 格式: // 普通列格式:{"name":"{your column name}"} // 可选,默认导出所有列的所有版本 // 注意: // 1.在多版本模式下,不支持常量列 // 2.PK列不能指定,导出4元组中默认包括完整的PK // 3.不能重复指定列 "column": [ {"name":"attr1"} ], // 请求数据的Time Range,读取的范围是[begin,end),左闭右开的区间 // 可选,默认读取全部版本 // 注意:begin必须小于end "timeRange":{ // 可选,默认为0 // 取值范围是0~LONG_MAX "begin":1400000000, // 可选,默认为Long Max(9223372036854775807L) // 取值范围是0~LONG_MAX "end" :1600000000 }, // 请求的指定Version // 可选,默认读取所有版本 // 取值范围是1~INT32_MAX "maxVersion":10, } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print": true } } } ] } } ``` #### 3.1.3 * 配置一个从OTS **时序表**读取数据的reader(仅在newVersion == true时支持): ```json { "job": { "setting": { "speed": { // 读取时序数据的通道数 "channel": 5 } }, "content": [ { "reader": { "name": "otsreader", "parameter": { "endpoint": "", "accessId": "", "accessKey": "", "instanceName": "", "table": "", // 读时序数据mode必须为normal "mode": "normal", // 读时序数据newVersion必须为true "newVersion": "true", // 配置该表为时序表 "isTimeseriesTable":"true", // 配置需要读取时间线的measurementName字段,非必需 // 为空则读取全表数据 "measurementName":"measurement_5", // column是一个数组,每个元素表示一列 // 对于常量列,需要配置以下字段: // 1. type : 字段值类型,必需 // 支持类型 : string, int, double, bool, binary // 2. value : 字段值,必需 // // 对于普通列,需要配置以下字段: // 1. name : 列名,必需 // 时间线的'度量名称'使用_m_name标识,数据类型为String // 时间线的'数据源'使用_data_source标识,数据类型为String // 时间线的'标签'使用_tags标识,数据类型为String // 时间线的'时间戳'使用_time标识,数据类型为Long // 2. is_timeseries_tag : 是否为tags字段内部的键值,非必需,默认为false。 // 3. type : 字段值类型,非必需,默认为string。 // 支持类型 : string, int, double, bool, binary "column": [ { "name": "_m_name" }, { "name": "tagA", "is_timeseries_tag":"true" }, { "name": "double_0", "type":"DOUBLE" }, { "name": "string_0", "type":"STRING" }, { "name": "long_0", "type":"int" }, { "name": "binary_0", "type":"BINARY" }, { "name": "bool_0", "type":"BOOL" }, { "type":"STRING", "value":"testString" } ] } }, "writer": { } } ] } } ``` ### 3.2 参数说明 * **endpoint** * 描述:OTS Server的EndPoint地址,例如http://bazhen.cn−hangzhou.ots.aliyuncs.com。 * 必选:是
* 默认值:无
* **accessId** * 描述:OTS的accessId
* 必选:是
* 默认值:无
* **accessKey** * 描述:OTS的accessKey
* 必选:是
* 默认值:无
* **instanceName** * 描述:OTS的实例名称,实例是用户使用和管理 OTS 服务的实体,用户在开通 OTS 服务之后,需要通过管理控制台来创建实例,然后在实例内进行表的创建和管理。实例是 OTS 资源管理的基础单元,OTS 对应用程序的访问控制和资源计量都在实例级别完成。
* 必选:是
* 默认值:无
* **table** * 描述:所选取的需要抽取的表名称,这里有且只能填写一张表。在OTS不存在多表同步的需求。
* 必选:是
* 默认值:无
* **newVersion** * 描述:version定义了使用的ots SDK版本。
* true,新版本插件,使用com.alicloud.openservices.tablestore的依赖(推荐) * false,旧版本插件,使用com.aliyun.openservices.ots的依赖,**不支持多版本数据的读取** * 必选:否
* 默认值:false
* **mode** * 描述:读取为多版本格式的数据,目前有两种模式。
* normal,对应普通的数据 * multiVersion,写入数据为多版本格式的数据,多版本模式下,配置参数有所不同,详见3.1.2 * 必选:否
* 默认值:normal
* **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。由于OTS本身是NoSQL系统,在OTSReader抽取数据过程中,必须指定相应地字段名称。 支持普通的列读取,例如: {"name":"col1"} 支持部分列读取,如用户不配置该列,则OTSReader不予读取。 支持常量列读取,例如: {"type":"STRING", "value" : "DataX"}。使用type描述常量类型,目前支持STRING、INT、DOUBLE、BOOL、BINARY(用户使用Base64编码填写)、INF_MIN(OTS的系统限定最小值,使用该值用户不能填写value属性,否则报错)、INF_MAX(OTS的系统限定最大值,使用该值用户不能填写value属性,否则报错)。 不支持函数或者自定义表达式,由于OTS本身不提供类似SQL的函数或者表达式功能,OTSReader也不能提供函数或表达式列功能。 * 必选:是
* 默认值:无
* **begin/end** * 描述:该配置项必须配对使用,用于支持OTS表范围抽取。begin/end中描述的是OTS **PrimaryKey**的区间分布状态,而且必须保证区间覆盖到所有的PrimaryKey,**需要指定该表下所有的PrimaryKey范围,不能遗漏任意一个PrimaryKey**,对于无限大小的区间,可以使用{"type":"INF_MIN"},{"type":"INF_MAX"}指代。例如对一张主键为 [DeviceID, SellerID]的OTS进行抽取任务,begin/end可以配置为: ```json "range": { "begin": { {"type":"INF_MIN"}, //指定deviceID最小值 {"type":"INT", "value":"0"} //指定deviceID最小值 }, "end": { {"type":"INF_MAX"}, //指定deviceID抽取最大值 {"type":"INT", "value":"9999"} //指定deviceID抽取最大值 } } ``` 如果要对上述表抽取全表,可以使用如下配置: ``` "range": { "begin": [ {"type":"INF_MIN"}, //指定deviceID最小值 {"type":"INF_MIN"} //指定SellerID最小值 ], "end": [ {"type":"INF_MAX"}, //指定deviceID抽取最大值 {"type":"INF_MAX"} //指定SellerID抽取最大值 ] } ``` * 必选:否
* 默认值:读取全部值
* **split** * 描述:该配置项属于高级配置项,是用户自己定义切分配置信息,普通情况下不建议用户使用。适用场景通常在OTS数据存储发生热点,使用OTSReader自动切分的策略不能生效情况下,使用用户自定义的切分规则。split指定是的在Begin、End区间内的切分点,且只能是partitionKey的切分点信息,即在split仅配置partitionKey,而不需要指定全部的PrimaryKey。 例如对一张主键为 [DeviceID, SellerID]的OTS进行抽取任务,可以配置为: ```json "range": { "begin": { {"type":"INF_MIN"}, //指定deviceID最小值 {"type":"INF_MIN"} //指定deviceID最小值 }, "end": { {"type":"INF_MAX"}, //指定deviceID抽取最大值 {"type":"INF_MAX"} //指定deviceID抽取最大值 }, // 用户指定的切分点,如果指定了切分点,Job将按照begin、end和split进行Task的切分, // 切分的列只能是Partition Key(ParimaryKey的第一列) // 支持INF_MIN, INF_MAX, STRING, INT "split":[ {"type":"STRING", "value":"1"}, {"type":"STRING", "value":"2"}, {"type":"STRING", "value":"3"}, {"type":"STRING", "value":"4"}, {"type":"STRING", "value":"5"} ] } ``` * 必选:否
* 默认值:无
### 3.3 类型转换 目前OTSReader支持所有OTS类型,下面列出OTSReader针对OTS类型转换列表: | DataX 内部类型| OTS 数据类型 | | -------- | ----- | | Long |Integer | | Double |Double| | String |String| | Boolean |Boolean | | Bytes |Binary | * 注意,OTS本身不支持日期型类型。应用层一般使用Long报错时间的Unix TimeStamp。 ## 4 约束限制 ### 4.1 一致性约束 OTS是类BigTable的存储系统,OTS本身能够保证单行写事务性,无法提供跨行级别的事务。对于OTSReader而言也无法提供全表的一致性视图。例如对于OTSReader在0点启动的数据同步任务,在整个表数据同步过程中,OTSReader同样会抽取到后续更新的数据,无法提供准确的0点时刻该表一致性视图。 ### 4.2 增量数据同步 OTS本质上KV存储,目前只能针对PK进行范围查询,暂不支持按照字段范围抽取数据。因此只能对于增量查询,如果PK能够表示范围信息,例如自增ID,或者时间戳。 自增ID,OTSReader可以通过记录上次最大的ID信息,通过指定Range范围进行增量抽取。这样使用的前提是OTS中的PrimaryKey必须包含主键自增列(自增主键需要使用OTS应用方生成。) 时间戳, OTSReader可以通过PK过滤时间戳,通过制定Range范围进行增量抽取。这样使用的前提是OTS中的PrimaryKey必须包含主键时间列(时间主键需要使用OTS应用方生成。) ## 5 FAQ ================================================ FILE: otsreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT otsreader otsreader com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.aliyun.openservices ots-public 2.2.4 log4j-core org.apache.logging.log4j com.aliyun.openservices tablestore 5.13.13 log4j-core org.apache.logging.log4j com.google.code.gson gson 2.2.4 com.alibaba fastjson 1.2.83_noneautotype compile src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single org.apache.maven.plugins maven-surefire-plugin 2.5 **/unittest/*.java **/functiontest/*.java ================================================ FILE: otsreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/otsreader target/ otsreader-0.0.1-SNAPSHOT.jar plugin/reader/otsreader false plugin/reader/otsreader/libs runtime ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/IOtsReaderMasterProxy.java ================================================ package com.alibaba.datax.plugin.reader.otsreader; import java.util.List; import com.alibaba.datax.common.util.Configuration; public interface IOtsReaderMasterProxy { public void init(Configuration param) throws Exception; public List split(int num) throws Exception; public void close(); } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/IOtsReaderSlaveProxy.java ================================================ package com.alibaba.datax.plugin.reader.otsreader; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.util.Configuration; /** * OTS Reader工作进程接口 */ public interface IOtsReaderSlaveProxy { /** * 初始化函数,解析配置、初始化相关资源 */ public void init(Configuration configuration); /** * 关闭函数,释放资源 */ public void close(); /** * 数据导出函数 * @param recordSender * @throws Exception */ public void startRead(RecordSender recordSender) throws Exception; } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReader.java ================================================ package com.alibaba.datax.plugin.reader.otsreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alibaba.datax.plugin.reader.otsreader.model.OTSMode; import com.alibaba.datax.plugin.reader.otsreader.utils.Constant; import com.alibaba.datax.plugin.reader.otsreader.utils.GsonParser; import com.alibaba.datax.plugin.reader.otsreader.utils.OtsReaderError; import com.alicloud.openservices.tablestore.TableStoreException; import com.aliyun.openservices.ots.ClientException; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class OtsReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); //private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OtsReader.class); private IOtsReaderMasterProxy proxy = null; @Override public void init() { LOG.info("init() begin ..."); proxy = new OtsReaderMasterProxy(); try { this.proxy.init(getPluginJobConf()); } catch (TableStoreException e) { LOG.error("OTSException: {}", e.toString(), e); throw DataXException.asDataXException(new OtsReaderError(e.getErrorCode(), "OTS ERROR"), e.toString(), e); } catch (ClientException e) { LOG.error("ClientException: {}", e.toString(), e); throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.toString(), e); throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } LOG.info("init() end ..."); } @Override public void destroy() { this.proxy.close(); } @Override public List split(int adviceNumber) { LOG.info("split() begin ..."); if (adviceNumber <= 0) { throw DataXException.asDataXException(OtsReaderError.ERROR, "Datax input adviceNumber <= 0."); } List confs = null; try { confs = this.proxy.split(adviceNumber); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } LOG.info("split() end ..."); return confs; } } public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); //private static final MessageSource MESSAGE_SOURCE = MessageSource.loadResourceBundle(OtsReader.class); private IOtsReaderSlaveProxy proxy = null; @Override public void init() { OTSConf conf = GsonParser.jsonToConf((String) this.getPluginJobConf().get(Constant.ConfigKey.CONF)); // 是否使用新接口 if(conf.isNewVersion()) { if (conf.getMode() == OTSMode.MULTI_VERSION) { LOG.info("init OtsReaderSlaveProxyMultiVersion"); proxy = new OtsReaderSlaveMultiVersionProxy(); } else { LOG.info("init OtsReaderSlaveProxyNormal"); proxy = new OtsReaderSlaveNormalProxy(); } } else{ String metaMode = conf.getMetaMode(); if (StringUtils.isNotBlank(metaMode) && !metaMode.equalsIgnoreCase("false")) { LOG.info("init OtsMetaReaderSlaveProxy"); proxy = new OtsReaderSlaveMetaProxy(); } else { LOG.info("init OtsReaderSlaveProxyOld"); proxy = new OtsReaderSlaveProxyOld(); } } proxy.init(this.getPluginJobConf()); } @Override public void destroy() { try { proxy.close(); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.toString(), e); throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } } @Override public void startRead(RecordSender recordSender) { try { proxy.startRead(recordSender); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.toString(), e); throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderMasterProxy.java ================================================ package com.alibaba.datax.plugin.reader.otsreader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.callable.GetFirstRowPrimaryKeyCallable; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; import com.alibaba.datax.plugin.reader.otsreader.utils.*; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.*; import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.lang.reflect.Field; import java.util.ArrayList; import java.util.List; public class OtsReaderMasterProxy implements IOtsReaderMasterProxy { private static final Logger LOG = LoggerFactory.getLogger(OtsReaderMasterProxy.class); private OTSConf conf = null; private TableMeta meta = null; private SyncClientInterface ots = null; private Direction direction = null; public OTSConf getConf() { return conf; } public TableMeta getMeta() { return meta; } public SyncClientInterface getOts() { return ots; } public void setOts(SyncClientInterface ots) { this.ots = ots; } /** * 基于配置传入的配置文件,解析为对应的参数 * * @param param * @throws Exception */ public void init(Configuration param) throws Exception { // 基于预定义的Json格式,检查传入参数是否符合Conf定义规范 conf = OTSConf.load(param); // Init ots ots = OtsHelper.getOTSInstance(conf); // 宽行表init if (!conf.isTimeseriesTable()) { // 获取TableMeta meta = OtsHelper.getTableMeta( ots, conf.getTableName(), conf.getRetry(), conf.getRetryPauseInMillisecond()); // 基于Meta检查Conf是否正确 ParamChecker.checkAndSetOTSConf(conf, meta); direction = ParamChecker.checkDirectionAndEnd(meta, conf.getRange().getBegin(), conf.getRange().getEnd()); } // 时序表 检查tablestore SDK version if (conf.isTimeseriesTable()){ Common.checkTableStoreSDKVersion(); } } public List split(int mandatoryNumber) throws Exception { LOG.info("Expect split num : " + mandatoryNumber); List configurations = new ArrayList(); if (conf.isTimeseriesTable()) {{ // 时序表全部采用默认切分策略 LOG.info("Begin timeseries table defaultRangeSplit"); configurations = getTimeseriesConfigurationBySplit(mandatoryNumber); LOG.info("End timeseries table defaultRangeSplit"); }} else if (this.conf.getRange().getSplit().size() != 0) { // 用户显示指定了拆分范围 LOG.info("Begin userDefinedRangeSplit"); configurations = getNormalConfigurationBySplit(); LOG.info("End userDefinedRangeSplit"); } else { // 采用默认的切分算法 LOG.info("Begin defaultRangeSplit"); configurations = getDefaultConfiguration(mandatoryNumber); LOG.info("End defaultRangeSplit"); } LOG.info("Expect split num: "+ mandatoryNumber +", and final configuration list count : " + configurations.size()); return configurations; } public void close() { ots.shutdown(); } /** * timeseries split信息,根据切分数配置多个Task */ private List getTimeseriesConfigurationBySplit(int mandatoryNumber) throws Exception { List timeseriesScanSplitInfoList = OtsHelper.splitTimeseriesScan( ots, conf.getTableName(), conf.getMeasurementName(), mandatoryNumber, conf.getRetry(), conf.getRetryPauseInMillisecond()); List configurations = new ArrayList<>(); for (int i = 0; i < timeseriesScanSplitInfoList.size(); i++) { Configuration configuration = Configuration.newDefault(); configuration.set(Constant.ConfigKey.CONF, GsonParser.confToJson(conf)); configuration.set(Constant.ConfigKey.SPLIT_INFO, GsonParser.timeseriesScanSplitInfoToString(timeseriesScanSplitInfoList.get(i))); configurations.add(configuration); } return configurations; } /** * 根据用户配置的split信息,将配置文件基于Range范围转换为多个Task的配置 */ private List getNormalConfigurationBySplit() { List> primaryKeys = new ArrayList>(); primaryKeys.add(conf.getRange().getBegin()); for (PrimaryKeyColumn column : conf.getRange().getSplit()) { List point = new ArrayList(); point.add(column); ParamChecker.fillPrimaryKey(this.meta.getPrimaryKeyList(), point, PrimaryKeyValue.INF_MIN); primaryKeys.add(point); } primaryKeys.add(conf.getRange().getEnd()); List configurations = new ArrayList(primaryKeys.size() - 1); for (int i = 0; i < primaryKeys.size() - 1; i++) { OTSRange range = new OTSRange(); range.setBegin(primaryKeys.get(i)); range.setEnd(primaryKeys.get(i + 1)); Configuration configuration = Configuration.newDefault(); configuration.set(Constant.ConfigKey.CONF, GsonParser.confToJson(conf)); configuration.set(Constant.ConfigKey.RANGE, GsonParser.rangeToJson(range)); configuration.set(Constant.ConfigKey.META, GsonParser.metaToJson(meta)); configurations.add(configuration); } return configurations; } private List getDefaultConfiguration(int num) throws Exception { if (num == 1) { List ranges = new ArrayList(); OTSRange range = new OTSRange(); range.setBegin(conf.getRange().getBegin()); range.setEnd(conf.getRange().getEnd()); ranges.add(range); return getConfigurationsFromRanges(ranges); } OTSRange reverseRange = new OTSRange(); reverseRange.setBegin(conf.getRange().getEnd()); reverseRange.setEnd(conf.getRange().getBegin()); Direction reverseDirection = (direction == Direction.FORWARD ? Direction.BACKWARD : Direction.FORWARD); List realBegin = getPKOfFirstRow(conf.getRange(), direction); List realEnd = getPKOfFirstRow(reverseRange, reverseDirection); // 因为如果其中一行为空,表示这个范围内至多有一行数据 // 所以不再细分,直接使用用户定义的范围 if (realBegin == null || realEnd == null) { List ranges = new ArrayList(); ranges.add(conf.getRange()); return getConfigurationsFromRanges(ranges); } // 如果出现realBegin,realEnd的方向和direction不一致的情况,直接返回range int cmp = Common.compareRangeBeginAndEnd(meta, realBegin, realEnd); Direction realDirection = cmp > 0 ? Direction.BACKWARD : Direction.FORWARD; if (realDirection != direction) { LOG.warn("Expect '" + direction + "', but direction of realBegin and readlEnd is '" + realDirection + "'"); List ranges = new ArrayList(); ranges.add(conf.getRange()); return getConfigurationsFromRanges(ranges); } List ranges = RangeSplit.rangeSplitByCount(meta, realBegin, realEnd, num); if (ranges.isEmpty()) { // 当PartitionKey相等时,工具内部不会切分Range ranges.add(conf.getRange()); } else { // replace first and last OTSRange first = ranges.get(0); OTSRange last = ranges.get(ranges.size() - 1); first.setBegin(conf.getRange().getBegin()); last.setEnd(conf.getRange().getEnd()); } return getConfigurationsFromRanges(ranges); } private List getConfigurationsFromRanges(List ranges){ List configurationList = new ArrayList<>(); for (OTSRange range:ranges ) { Configuration configuration = Configuration.newDefault(); configuration.set(Constant.ConfigKey.CONF, GsonParser.confToJson(conf)); configuration.set(Constant.ConfigKey.RANGE, GsonParser.rangeToJson(range)); configuration.set(Constant.ConfigKey.META, GsonParser.metaToJson(meta)); configurationList.add(configuration); } return configurationList; } private List getPKOfFirstRow( OTSRange range , Direction direction) throws Exception { RangeRowQueryCriteria cur = new RangeRowQueryCriteria(this.conf.getTableName()); cur.setInclusiveStartPrimaryKey(new PrimaryKey(range.getBegin())); cur.setExclusiveEndPrimaryKey(new PrimaryKey(range.getEnd())); cur.setLimit(1); cur.addColumnsToGet(Common.getPrimaryKeyNameList(meta)); cur.setDirection(direction); cur.setMaxVersions(1); return RetryHelper.executeWithRetry( new GetFirstRowPrimaryKeyCallable(ots, meta, cur), conf.getRetry(), conf.getRetryPauseInMillisecond() ); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveMetaProxy.java ================================================ package com.alibaba.datax.plugin.reader.otsreader; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Map.Entry; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; import com.alibaba.datax.plugin.reader.otsreader.utils.Constant; import com.alibaba.datax.plugin.reader.otsreader.utils.Key; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.utils.ParamCheckerOld; import com.alibaba.datax.plugin.reader.otsreader.utils.ReaderModelParser; import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; import com.alibaba.datax.plugin.reader.otsreader.utils.DefaultNoRetry; import com.alibaba.datax.plugin.reader.otsreader.utils.GsonParser; import com.alibaba.fastjson.JSON; import com.aliyun.openservices.ots.OTSClient; import com.aliyun.openservices.ots.OTSServiceConfiguration; import com.aliyun.openservices.ots.model.DescribeTableRequest; import com.aliyun.openservices.ots.model.DescribeTableResult; import com.aliyun.openservices.ots.model.ListTableResult; import com.aliyun.openservices.ots.model.PrimaryKeyType; import com.aliyun.openservices.ots.model.ReservedThroughputDetails; import com.aliyun.openservices.ots.model.TableMeta; public class OtsReaderSlaveMetaProxy implements IOtsReaderSlaveProxy { private OTSClient ots = null; private OTSConf conf = null; private OTSRange range = null; private com.alicloud.openservices.tablestore.model.TableMeta meta = null; private Configuration configuration = null; private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveMetaProxy.class); @Override public void init(Configuration configuration) { OTSServiceConfiguration configure = new OTSServiceConfiguration(); configure.setRetryStrategy(new DefaultNoRetry()); this.configuration = configuration; conf = GsonParser.jsonToConf((String) configuration.get(Constant.ConfigKey.CONF)); range = GsonParser.jsonToRange((String) configuration.get(Constant.ConfigKey.RANGE)); meta = GsonParser.jsonToMeta((String) configuration.get(Constant.ConfigKey.META)); String endpoint = conf.getEndpoint(); String accessId = conf.getAccessId(); String accessKey = conf.getAccessKey(); String instanceName = conf.getInstanceName(); ots = new OTSClient(endpoint, accessId, accessKey, instanceName, null, configure, null); } @Override public void close() { ots.shutdown(); } @Override public void startRead(RecordSender recordSender) throws Exception { List columns = ReaderModelParser .parseOTSColumnList(ParamCheckerOld.checkListAndGet(configuration, Key.COLUMN, true)); String metaMode = conf.getMetaMode(); // column ListTableResult listTableResult = null; try { listTableResult = ots.listTable(); LOG.info(String.format("ots listTable requestId:%s, traceId:%s", listTableResult.getRequestID(), listTableResult.getTraceId())); List allTables = listTableResult.getTableNames(); for (String eachTable : allTables) { DescribeTableRequest describeTableRequest = new DescribeTableRequest(); describeTableRequest.setTableName(eachTable); DescribeTableResult describeTableResult = ots.describeTable(describeTableRequest); LOG.info(String.format("ots describeTable requestId:%s, traceId:%s", describeTableResult.getRequestID(), describeTableResult.getTraceId())); TableMeta tableMeta = describeTableResult.getTableMeta(); // table_name: first_table // table primary key: type, data type: STRING // table primary key: db_name, data type: STRING // table primary key: table_name, data type: STRING // Reserved throughput: read(0), write(0) // last increase time: 1502881295 // last decrease time: None // number of decreases today: 0 String tableName = tableMeta.getTableName(); Map primaryKey = tableMeta.getPrimaryKey(); ReservedThroughputDetails reservedThroughputDetails = describeTableResult .getReservedThroughputDetails(); int reservedThroughputRead = reservedThroughputDetails.getCapacityUnit().getReadCapacityUnit(); int reservedThroughputWrite = reservedThroughputDetails.getCapacityUnit().getWriteCapacityUnit(); long lastIncreaseTime = reservedThroughputDetails.getLastIncreaseTime(); long lastDecreaseTime = reservedThroughputDetails.getLastDecreaseTime(); int numberOfDecreasesToday = reservedThroughputDetails.getNumberOfDecreasesToday(); Map allData = new HashMap(); allData.put("endpoint", conf.getEndpoint()); allData.put("instanceName", conf.getInstanceName()); allData.put("table", tableName); // allData.put("primaryKey", JSON.toJSONString(primaryKey)); allData.put("reservedThroughputRead", reservedThroughputRead + ""); allData.put("reservedThroughputWrite", reservedThroughputWrite + ""); allData.put("lastIncreaseTime", lastIncreaseTime + ""); allData.put("lastDecreaseTime", lastDecreaseTime + ""); allData.put("numberOfDecreasesToday", numberOfDecreasesToday + ""); // 可扩展的可配置的形式 if ("column".equalsIgnoreCase(metaMode)) { // 如果是列元数据模式并且column中配置的name是primaryKey,映射成多行DataX Record List primaryKeyRecords = new ArrayList(); for (Entry eachPk : primaryKey.entrySet()) { Record line = recordSender.createRecord(); for (OTSColumn col : columns) { if (col.getColumnType() == OTSColumn.OTSColumnType.CONST) { line.addColumn(col.getValue()); } else if ("primaryKey.name".equalsIgnoreCase(col.getName())) { line.addColumn(new StringColumn(eachPk.getKey())); } else if ("primaryKey.type".equalsIgnoreCase(col.getName())) { line.addColumn(new StringColumn(eachPk.getValue().name())); } else { String v = allData.get(col.getName()); line.addColumn(new StringColumn(v)); } } LOG.debug("Reader send record : {}", line.toString()); recordSender.sendToWriter(line); primaryKeyRecords.add(line); } } else { Record line = recordSender.createRecord(); for (OTSColumn col : columns) { if (col.getColumnType() == OTSColumn.OTSColumnType.CONST) { line.addColumn(col.getValue()); } else { String v = allData.get(col.getName()); line.addColumn(new StringColumn(v)); } } LOG.debug("Reader send record : {}", line.toString()); recordSender.sendToWriter(line); } } } catch (Exception e) { LOG.warn(JSON.toJSONString(listTableResult), e); } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveMultiVersionProxy.java ================================================ package com.alibaba.datax.plugin.reader.otsreader; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; import com.alibaba.datax.plugin.reader.otsreader.utils.*; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class OtsReaderSlaveMultiVersionProxy implements IOtsReaderSlaveProxy { private OTSConf conf = null; private OTSRange range = null; private TableMeta meta = null; private SyncClientInterface ots = null; private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveMultiVersionProxy.class); @Override public void init(Configuration configuration) { conf = GsonParser.jsonToConf((String) configuration.get(Constant.ConfigKey.CONF)); range = GsonParser.jsonToRange((String) configuration.get(Constant.ConfigKey.RANGE)); meta = GsonParser.jsonToMeta((String) configuration.get(Constant.ConfigKey.META)); this.ots = OtsHelper.getOTSInstance(conf); } @Override public void close() { ots.shutdown(); } private void sendToDatax(RecordSender recordSender, PrimaryKey pk, Column c) { Record line = recordSender.createRecord(); //------------------------- // 四元组 pk, column name, timestamp, value //------------------------- // pk for( PrimaryKeyColumn pkc : pk.getPrimaryKeyColumns()) { line.addColumn(TranformHelper.otsPrimaryKeyColumnToDataxColumn(pkc)); } // column name line.addColumn(new StringColumn(c.getName())); // Timestamp line.addColumn(new LongColumn(c.getTimestamp())); // Value line.addColumn(TranformHelper.otsColumnToDataxColumn(c)); recordSender.sendToWriter(line); } private void sendToDatax(RecordSender recordSender, Row row) { PrimaryKey pk = row.getPrimaryKey(); for (Column c : row.getColumns()) { sendToDatax(recordSender, pk, c); } } /** * 将获取到的数据采用4元组的方式传递给datax * @param recordSender * @param result */ private void sendToDatax(RecordSender recordSender, GetRangeResponse result) { LOG.debug("Per request get row count : " + result.getRows().size()); for (Row row : result.getRows()) { sendToDatax(recordSender, row); } } @Override public void startRead(RecordSender recordSender) throws Exception { PrimaryKey inclusiveStartPrimaryKey = new PrimaryKey(range.getBegin()); PrimaryKey exclusiveEndPrimaryKey = new PrimaryKey(range.getEnd()); PrimaryKey next = inclusiveStartPrimaryKey; RangeRowQueryCriteria rangeRowQueryCriteria = new RangeRowQueryCriteria(conf.getTableName()); rangeRowQueryCriteria.setExclusiveEndPrimaryKey(exclusiveEndPrimaryKey); rangeRowQueryCriteria.setDirection(Common.getDirection(range.getBegin(), range.getEnd())); rangeRowQueryCriteria.setTimeRange(conf.getMulti().getTimeRange()); rangeRowQueryCriteria.setMaxVersions(conf.getMulti().getMaxVersion()); rangeRowQueryCriteria.addColumnsToGet(Common.toColumnToGet(conf.getColumn(), meta)); do{ rangeRowQueryCriteria.setInclusiveStartPrimaryKey(next); GetRangeResponse result = OtsHelper.getRange( ots, rangeRowQueryCriteria, conf.getRetry(), conf.getRetryPauseInMillisecond()); sendToDatax(recordSender, result); next = result.getNextStartPrimaryKey(); } while(next != null); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveNormalProxy.java ================================================ package com.alibaba.datax.plugin.reader.otsreader; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alibaba.datax.plugin.reader.otsreader.model.OTSCriticalException; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; import com.alibaba.datax.plugin.reader.otsreader.utils.*; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.core.utils.Pair; import com.alicloud.openservices.tablestore.model.*; import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataRequest; import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesRow; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import java.util.Map; public class OtsReaderSlaveNormalProxy implements IOtsReaderSlaveProxy { private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveNormalProxy.class); private OTSConf conf = null; private OTSRange range = null; private TableMeta meta = null; private SyncClientInterface ots = null; private TimeseriesScanSplitInfo splitInfo = null; @Override public void init(Configuration configuration) { conf = GsonParser.jsonToConf((String) configuration.get(Constant.ConfigKey.CONF)); if (!conf.isTimeseriesTable()) { range = GsonParser.jsonToRange((String) configuration.get(Constant.ConfigKey.RANGE)); meta = GsonParser.jsonToMeta((String) configuration.get(Constant.ConfigKey.META)); } else { splitInfo = GsonParser.stringToTimeseriesScanSplitInfo((String) configuration.get(Constant.ConfigKey.SPLIT_INFO)); // 时序表 检查tablestore SDK version try{ Common.checkTableStoreSDKVersion(); } catch (Exception e){ LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); throw DataXException.asDataXException(OtsReaderError.ERROR, e.toString(), e); } } this.ots = OtsHelper.getOTSInstance(conf); } @Override public void close() { ots.shutdown(); } private void sendToDatax(RecordSender recordSender, Row row) { Record line = recordSender.createRecord(); PrimaryKey pk = row.getPrimaryKey(); for (OTSColumn column : conf.getColumn()) { if (column.getColumnType() == OTSColumn.OTSColumnType.NORMAL) { // 获取指定的列 PrimaryKeyColumn value = pk.getPrimaryKeyColumn(column.getName()); if (value != null) { line.addColumn(TranformHelper.otsPrimaryKeyColumnToDataxColumn(value)); } else { Column c = row.getLatestColumn(column.getName()); if (c != null) { line.addColumn(TranformHelper.otsColumnToDataxColumn(c)); } else { // 这里使用StringColumn的无参构造函数构造对象,而不是用null,下 // 游(writer)应该通过获取Column,然后通过Column的数据接口的返回值 // 是否是null来判断改Column是否为null // Datax其他插件的也是使用这种方式,约定俗成,并没有使用直接向record中注入null方式代表空 line.addColumn(new StringColumn()); } } } else { line.addColumn(column.getValue()); } } recordSender.sendToWriter(line); } private void sendToDatax(RecordSender recordSender, TimeseriesRow row) { Record line = recordSender.createRecord(); // 对于配置项中的每一列 for (int i = 0; i < conf.getColumn().size(); i++) { OTSColumn column = conf.getColumn().get(i); // 如果不是常数列 if (column.getColumnType() == OTSColumn.OTSColumnType.NORMAL) { // 如果是tags内字段 if (conf.getColumn().get(i).getTimeseriesTag()) { String s = row.getTimeseriesKey().getTags().get(column.getName()); line.addColumn(new StringColumn(s)); } // 如果为measurement字段 else if (column.getName().equals(Constant.ConfigKey.TimeseriesPKColumn.MEASUREMENT_NAME)) { String s = row.getTimeseriesKey().getMeasurementName(); line.addColumn(new StringColumn(s)); } // 如果为dataSource字段 else if (column.getName().equals(Constant.ConfigKey.TimeseriesPKColumn.DATA_SOURCE)) { String s = row.getTimeseriesKey().getDataSource(); line.addColumn(new StringColumn(s)); } // 如果为tags字段 else if (column.getName().equals(Constant.ConfigKey.TimeseriesPKColumn.TAGS)) { line.addColumn(new StringColumn(row.getTimeseriesKey().buildTagsString())); } else if (column.getName().equals(Constant.ConfigKey.TimeseriesPKColumn.TIME)) { Long l = row.getTimeInUs(); line.addColumn(new LongColumn(l)); } // 否则为field内字段 else { ColumnValue c = row.getFields().get(column.getName()); if (c == null) { LOG.warn("Get column {} : type {} failed, use empty string instead", column.getName(), conf.getColumn().get(i).getValueType()); line.addColumn(new StringColumn()); } else if (c.getType() != conf.getColumn().get(i).getValueType()) { LOG.warn("Get column {} failed, expected type: {}, actual type: {}. Sending actual type to writer.", column.getName(), conf.getColumn().get(i).getValueType(), c.getType()); line.addColumn(TranformHelper.otsColumnToDataxColumn(c)); } else { line.addColumn(TranformHelper.otsColumnToDataxColumn(c)); } } } // 如果是常数列 else { line.addColumn(column.getValue()); } } recordSender.sendToWriter(line); } /** * 将获取到的数据根据用户配置Column的方式传递给datax * * @param recordSender * @param result */ private void sendToDatax(RecordSender recordSender, GetRangeResponse result) { for (Row row : result.getRows()) { sendToDatax(recordSender, row); } } private void sendToDatax(RecordSender recordSender, ScanTimeseriesDataResponse result) { for (TimeseriesRow row : result.getRows()) { sendToDatax(recordSender, row); } } @Override public void startRead(RecordSender recordSender) throws Exception { if (conf.isTimeseriesTable()) { readTimeseriesTable(recordSender); } else { readNormalTable(recordSender); } } public void readTimeseriesTable(RecordSender recordSender) throws Exception { List timeseriesPkName = new ArrayList<>(); timeseriesPkName.add(Constant.ConfigKey.TimeseriesPKColumn.MEASUREMENT_NAME); timeseriesPkName.add(Constant.ConfigKey.TimeseriesPKColumn.DATA_SOURCE); timeseriesPkName.add(Constant.ConfigKey.TimeseriesPKColumn.TAGS); timeseriesPkName.add(Constant.ConfigKey.TimeseriesPKColumn.TIME); ScanTimeseriesDataRequest scanTimeseriesDataRequest = new ScanTimeseriesDataRequest(conf.getTableName()); List> fieldsToGet = new ArrayList<>(); for (int i = 0; i < conf.getColumn().size(); i++) { /** * 如果所配置列 * 1. 不是常量列(即列名不为null) * 2. 列名不在["measurementName","dataSource","tags"]中 * 3. 不是tags内的字段 * 则为需要获取的field字段。 */ String fieldName = conf.getColumn().get(i).getName(); if (fieldName != null && !timeseriesPkName.contains(fieldName) && !conf.getColumn().get(i).getTimeseriesTag()) { Pair pair = new Pair<>(fieldName, conf.getColumn().get(i).getValueType()); fieldsToGet.add(pair); } } scanTimeseriesDataRequest.setFieldsToGet(fieldsToGet); scanTimeseriesDataRequest.setSplitInfo(splitInfo); while (true) { ScanTimeseriesDataResponse response = OtsHelper.scanTimeseriesData( ots, scanTimeseriesDataRequest, conf.getRetry(), conf.getRetryPauseInMillisecond()); sendToDatax(recordSender, response); if (response.getNextToken() == null) { break; } scanTimeseriesDataRequest.setNextToken(response.getNextToken()); } } public void readNormalTable(RecordSender recordSender) throws Exception { PrimaryKey inclusiveStartPrimaryKey = new PrimaryKey(range.getBegin()); PrimaryKey exclusiveEndPrimaryKey = new PrimaryKey(range.getEnd()); PrimaryKey next = inclusiveStartPrimaryKey; RangeRowQueryCriteria rangeRowQueryCriteria = new RangeRowQueryCriteria(conf.getTableName()); rangeRowQueryCriteria.setExclusiveEndPrimaryKey(exclusiveEndPrimaryKey); rangeRowQueryCriteria.setDirection(Common.getDirection(range.getBegin(), range.getEnd())); rangeRowQueryCriteria.setMaxVersions(1); rangeRowQueryCriteria.addColumnsToGet(Common.toColumnToGet(conf.getColumn(), meta)); do { rangeRowQueryCriteria.setInclusiveStartPrimaryKey(next); GetRangeResponse result = OtsHelper.getRange( ots, rangeRowQueryCriteria, conf.getRetry(), conf.getRetryPauseInMillisecond()); sendToDatax(recordSender, result); next = result.getNextStartPrimaryKey(); } while (next != null); } public void setConf(OTSConf conf) { this.conf = conf; } public void setRange(OTSRange range) { this.range = range; } public void setMeta(TableMeta meta) { this.meta = meta; } public void setOts(SyncClientInterface ots) { this.ots = ots; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/OtsReaderSlaveProxyOld.java ================================================ package com.alibaba.datax.plugin.reader.otsreader; import java.util.List; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alibaba.datax.plugin.reader.otsreader.utils.*; import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; import com.aliyun.openservices.ots.model.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.callable.GetRangeCallableOld; import com.aliyun.openservices.ots.OTSClientAsync; import com.aliyun.openservices.ots.OTSServiceConfiguration; public class OtsReaderSlaveProxyOld implements IOtsReaderSlaveProxy { private OTSClientAsync ots = null; private OTSConf conf = null; private OTSRange range = null; class RequestItem { private RangeRowQueryCriteria criteria; private OTSFuture future; RequestItem(RangeRowQueryCriteria criteria, OTSFuture future) { this.criteria = criteria; this.future = future; } public RangeRowQueryCriteria getCriteria() { return criteria; } public OTSFuture getFuture() { return future; } } private static final Logger LOG = LoggerFactory.getLogger(OtsReaderSlaveProxyOld.class); private void rowsToSender(List rows, RecordSender sender, List columns) { for (Row row : rows) { Record line = sender.createRecord(); line = CommonOld.parseRowToLine(row, columns, line); LOG.debug("Reader send record : {}", line.toString()); sender.sendToWriter(line); } } private RangeRowQueryCriteria generateRangeRowQueryCriteria(String tableName, RowPrimaryKey begin, RowPrimaryKey end, Direction direction, List columns) { RangeRowQueryCriteria criteria = new RangeRowQueryCriteria(tableName); criteria.setInclusiveStartPrimaryKey(begin); criteria.setDirection(direction); criteria.setColumnsToGet(columns); criteria.setLimit(-1); criteria.setExclusiveEndPrimaryKey(end); return criteria; } private RequestItem generateRequestItem( OTSClientAsync ots, OTSConf conf, RowPrimaryKey begin, RowPrimaryKey end, Direction direction, List columns) throws Exception { RangeRowQueryCriteria criteria = generateRangeRowQueryCriteria(conf.getTableName(), begin, end, direction, columns); GetRangeRequest request = new GetRangeRequest(); request.setRangeRowQueryCriteria(criteria); OTSFuture future = ots.getRange(request); return new RequestItem(criteria, future); } @Override public void init(Configuration configuration) { conf = GsonParser.jsonToConf(configuration.getString(Constant.ConfigKey.CONF)); range = GsonParser.jsonToRange(configuration.getString(Constant.ConfigKey.RANGE)); OTSServiceConfiguration configure = new OTSServiceConfiguration(); configure.setRetryStrategy(new DefaultNoRetry()); ots = new OTSClientAsync( conf.getEndpoint(), conf.getAccessId(), conf.getAccessKey(), conf.getInstanceName(), null, configure, null); } @Override public void close() { ots.shutdown(); } @Override public void startRead(RecordSender recordSender) throws Exception { RowPrimaryKey token = pKColumnList2RowPrimaryKey(range.getBegin()); List columns = CommonOld.getNormalColumnNameList(conf.getColumn()); Direction direction = null; switch (Common.getDirection(range.getBegin(), range.getEnd())){ case FORWARD: direction = Direction.FORWARD; break; case BACKWARD: default: direction = Direction.BACKWARD; } RequestItem request = null; do { LOG.debug("Next token : {}", GsonParser.rowPrimaryKeyToJson(token)); if (request == null) { request = generateRequestItem(ots, conf, token, pKColumnList2RowPrimaryKey(range.getEnd()), direction, columns); } else { RequestItem req = request; GetRangeResult result = RetryHelperOld.executeWithRetry( new GetRangeCallableOld(ots, req.getCriteria(), req.getFuture()), conf.getRetry(), // TODO 100 ); if ((token = result.getNextStartPrimaryKey()) != null) { request = generateRequestItem(ots, conf, token, pKColumnList2RowPrimaryKey(range.getEnd()), direction, columns); } rowsToSender(result.getRows(), recordSender, conf.getColumn()); } } while (token != null); } /** * 将 {@link com.alicloud.openservices.tablestore.model.PrimaryKeyColumn}的列表转为{@link com.aliyun.openservices.ots.model.RowPrimaryKey} * @param list * @return */ public RowPrimaryKey pKColumnList2RowPrimaryKey(List list){ RowPrimaryKey rowPrimaryKey = new RowPrimaryKey(); for(PrimaryKeyColumn pk : list){ PrimaryKeyValue v = null; if(pk.getValue() == com.alicloud.openservices.tablestore.model.PrimaryKeyValue.INF_MAX){ v = PrimaryKeyValue.INF_MAX; } else if (pk.getValue() == com.alicloud.openservices.tablestore.model.PrimaryKeyValue.INF_MIN) { v = PrimaryKeyValue.INF_MIN; } // 非INF_MAX 或 INF_MIN else{ switch (pk.getValue().getType()){ case STRING: v = PrimaryKeyValue.fromString(pk.getValue().asString()); break; case INTEGER: v = PrimaryKeyValue.fromLong(pk.getValue().asLong()); break; case BINARY: v = PrimaryKeyValue.fromBinary(pk.getValue().asBinary()); break; default: throw new IllegalArgumentException("the pKColumnList to RowPrimaryKey conversion failed"); } } rowPrimaryKey.addPrimaryKeyColumn(pk.getName(),v); } return rowPrimaryKey; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/ColumnAdaptor.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.adaptor; import com.alibaba.datax.common.element.*; import com.google.gson.*; import org.apache.commons.codec.binary.Base64; import java.lang.reflect.Type; public class ColumnAdaptor implements JsonDeserializer, JsonSerializer{ private final static String TYPE = "type"; private final static String RAW = "rawData"; @Override public JsonElement serialize(Column obj, Type t, JsonSerializationContext c) { JsonObject json = new JsonObject(); String rawData = null; switch (obj.getType()){ case BOOL: rawData = String.valueOf(obj.getRawData()); break; case BYTES: rawData = Base64.encodeBase64String((byte[]) obj.getRawData()); break; case DOUBLE: rawData = String.valueOf(obj.getRawData());break; case LONG: rawData = String.valueOf(obj.getRawData());break; case STRING: rawData = String.valueOf(obj.getRawData());break; default: throw new IllegalArgumentException("Unsupport parse the column type:" + obj.getType().toString()); } json.add(TYPE, new JsonPrimitive(obj.getType().toString())); json.add(RAW, new JsonPrimitive(rawData)); return json; } @Override public Column deserialize(JsonElement ele, Type t, JsonDeserializationContext c) throws JsonParseException { JsonObject obj = ele.getAsJsonObject(); String strType = obj.getAsJsonPrimitive(TYPE).getAsString(); String strRaw = obj.getAsJsonPrimitive(RAW).getAsString(); Column.Type type = Column.Type.valueOf(strType); switch (type){ case BOOL: return new BoolColumn(strRaw); case BYTES: return new BytesColumn(Base64.decodeBase64(strRaw)); case DOUBLE: return new DoubleColumn(strRaw); case LONG: return new LongColumn(strRaw); case STRING: return new StringColumn(strRaw); default: throw new IllegalArgumentException("Unsupport parse the column type:" + type.toString()); } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/adaptor/PrimaryKeyValueAdaptor.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.adaptor; import com.alicloud.openservices.tablestore.model.ColumnType; import com.alicloud.openservices.tablestore.model.PrimaryKeyType; import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; import com.google.gson.*; import org.apache.commons.codec.binary.Base64; import java.lang.reflect.Type; /** * {"type":"INF_MIN", "value":""} * {"type":"INF_MAX", "value":""} * {"type":"STRING", "value":"hello"} * {"type":"INTEGER", "value":"1222"} */ public class PrimaryKeyValueAdaptor implements JsonDeserializer, JsonSerializer{ private final static String TYPE = "type"; private final static String VALUE = "value"; private final static String INF_MIN = "INF_MIN"; private final static String INF_MAX = "INF_MAX"; @Override public JsonElement serialize(PrimaryKeyValue obj, Type t, JsonSerializationContext c) { JsonObject json = new JsonObject(); if (obj.isInfMin()) { json.add(TYPE, new JsonPrimitive(INF_MIN)); return json; } if (obj.isInfMax()) { json.add(TYPE, new JsonPrimitive(INF_MAX)); return json; } switch (obj.getType()) { case STRING : json.add(TYPE, new JsonPrimitive(ColumnType.STRING.toString())); json.add(VALUE, new JsonPrimitive(obj.asString())); break; case INTEGER : json.add(TYPE, new JsonPrimitive(ColumnType.INTEGER.toString())); json.add(VALUE, new JsonPrimitive(obj.asLong())); break; case BINARY : json.add(TYPE, new JsonPrimitive(ColumnType.BINARY.toString())); json.add(VALUE, new JsonPrimitive(Base64.encodeBase64String(obj.asBinary()))); break; default: throw new IllegalArgumentException("Unsupport serialize the type : " + obj.getType() + ""); } return json; } @Override public PrimaryKeyValue deserialize(JsonElement ele, Type t, JsonDeserializationContext c) throws JsonParseException { JsonObject obj = ele.getAsJsonObject(); String strType = obj.getAsJsonPrimitive(TYPE).getAsString(); if (strType.equalsIgnoreCase(INF_MIN)) { return PrimaryKeyValue.INF_MIN; } if (strType.equalsIgnoreCase(INF_MAX)) { return PrimaryKeyValue.INF_MAX; } JsonPrimitive jsonValue = obj.getAsJsonPrimitive(VALUE); PrimaryKeyValue value = null; PrimaryKeyType type = PrimaryKeyType.valueOf(strType); switch(type) { case STRING : value = PrimaryKeyValue.fromString(jsonValue.getAsString()); break; case INTEGER : value = PrimaryKeyValue.fromLong(jsonValue.getAsLong()); break; case BINARY : value = PrimaryKeyValue.fromBinary(Base64.decodeBase64(jsonValue.getAsString())); break; default: throw new IllegalArgumentException("Unsupport deserialize the type : " + type + ""); } return value; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetFirstRowPrimaryKeyCallable.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.callable; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.*; import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.concurrent.Callable; public class GetFirstRowPrimaryKeyCallable implements Callable> { private SyncClientInterface ots = null; private TableMeta meta = null; private RangeRowQueryCriteria criteria = null; public GetFirstRowPrimaryKeyCallable(SyncClientInterface ots, TableMeta meta, RangeRowQueryCriteria criteria) { this.ots = ots; this.meta = meta; this.criteria = criteria; } @Override public List call() throws Exception { List ret = new ArrayList<>(); GetRangeRequest request = new GetRangeRequest(); request.setRangeRowQueryCriteria(criteria); GetRangeResponse response = ots.getRange(request); List rows = response.getRows(); if (rows.isEmpty()) { return null;// no data } Row row = rows.get(0); Map pk = meta.getPrimaryKeyMap(); for (String key : pk.keySet()) { PrimaryKeyColumn v = row.getPrimaryKey().getPrimaryKeyColumnsMap().get(key); ret.add(v); } return ret; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallable.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.callable; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.GetRangeRequest; import com.alicloud.openservices.tablestore.model.GetRangeResponse; import com.alicloud.openservices.tablestore.model.RangeRowQueryCriteria; import java.util.concurrent.Callable; public class GetRangeCallable implements Callable { private SyncClientInterface ots; private RangeRowQueryCriteria criteria; public GetRangeCallable(SyncClientInterface ots, RangeRowQueryCriteria criteria) { this.ots = ots; this.criteria = criteria; } @Override public GetRangeResponse call() throws Exception { GetRangeRequest request = new GetRangeRequest(); request.setRangeRowQueryCriteria(criteria); return ots.getRange(request); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetRangeCallableOld.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.callable; import java.util.concurrent.Callable; import com.aliyun.openservices.ots.OTSClientAsync; import com.aliyun.openservices.ots.model.GetRangeRequest; import com.aliyun.openservices.ots.model.GetRangeResult; import com.aliyun.openservices.ots.model.OTSFuture; import com.aliyun.openservices.ots.model.RangeRowQueryCriteria; public class GetRangeCallableOld implements Callable { private OTSClientAsync ots; private RangeRowQueryCriteria criteria; private OTSFuture future; public GetRangeCallableOld(OTSClientAsync ots, RangeRowQueryCriteria criteria, OTSFuture future) { this.ots = ots; this.criteria = criteria; this.future = future; } @Override public GetRangeResult call() throws Exception { try { return future.get(); } catch (Exception e) { GetRangeRequest request = new GetRangeRequest(); request.setRangeRowQueryCriteria(criteria); future = ots.getRange(request); throw e; } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTableMetaCallable.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.callable; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.DescribeTableRequest; import com.alicloud.openservices.tablestore.model.DescribeTableResponse; import com.alicloud.openservices.tablestore.model.TableMeta; import java.util.concurrent.Callable; public class GetTableMetaCallable implements Callable{ private SyncClientInterface ots = null; private String tableName = null; public GetTableMetaCallable(SyncClientInterface ots, String tableName) { this.ots = ots; this.tableName = tableName; } @Override public TableMeta call() throws Exception { DescribeTableRequest describeTableRequest = new DescribeTableRequest(); describeTableRequest.setTableName(tableName); DescribeTableResponse result = ots.describeTable(describeTableRequest); TableMeta tableMeta = result.getTableMeta(); return tableMeta; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/GetTimeseriesSplitCallable.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.callable; import com.alicloud.openservices.tablestore.SyncClient; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.TimeseriesClient; import com.alicloud.openservices.tablestore.model.timeseries.SplitTimeseriesScanTaskRequest; import com.alicloud.openservices.tablestore.model.timeseries.SplitTimeseriesScanTaskResponse; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; import java.util.List; import java.util.concurrent.Callable; public class GetTimeseriesSplitCallable implements Callable> { private TimeseriesClient client = null; private String timeseriesTableName = null; private String measurementName = null; private int splitCountHint = 1; public GetTimeseriesSplitCallable(SyncClientInterface ots, String timeseriesTableName, String measurementName, int splitCountHint) { this.client = ((SyncClient) ots).asTimeseriesClient(); this.timeseriesTableName = timeseriesTableName; this.measurementName = measurementName; this.splitCountHint = splitCountHint; } @Override public List call() throws Exception { SplitTimeseriesScanTaskRequest request = new SplitTimeseriesScanTaskRequest(timeseriesTableName, splitCountHint); if (measurementName.length() != 0) { request.setMeasurementName(measurementName); } SplitTimeseriesScanTaskResponse response = client.splitTimeseriesScanTask(request); return response.getSplitInfos(); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/callable/ScanTimeseriesDataCallable.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.callable; import com.alicloud.openservices.tablestore.SyncClient; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.TimeseriesClient; import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataRequest; import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; import java.util.List; import java.util.concurrent.Callable; public class ScanTimeseriesDataCallable implements Callable { private TimeseriesClient client = null; private ScanTimeseriesDataRequest request = null; public ScanTimeseriesDataCallable(SyncClientInterface ots, ScanTimeseriesDataRequest scanTimeseriesDataRequest){ this.client = ((SyncClient) ots).asTimeseriesClient(); this.request = scanTimeseriesDataRequest; } @Override public ScanTimeseriesDataResponse call() throws Exception { return client.scanTimeseriesData(request); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/DefaultNoRetry.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; import com.alicloud.openservices.tablestore.model.DefaultRetryStrategy; import com.alicloud.openservices.tablestore.model.RetryStrategy; public class DefaultNoRetry extends DefaultRetryStrategy { public DefaultNoRetry() { super(); } @Override public RetryStrategy clone() { return super.clone(); } @Override public int getRetries() { return super.getRetries(); } @Override public boolean shouldRetry(String action, Exception ex) { return false; } @Override public long nextPause(String action, Exception ex) { return super.nextPause(action, ex); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSColumn.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; import com.alibaba.datax.common.element.*; import com.alicloud.openservices.tablestore.model.ColumnType; public class OTSColumn { private String name; private Column value; private OTSColumnType columnType; // 时序数据column配置 private ColumnType valueType; private Boolean isTimeseriesTag; public static enum OTSColumnType { NORMAL, // 普通列 CONST // 常量列 } private OTSColumn(String name) { this.name = name; this.columnType = OTSColumnType.NORMAL; } private OTSColumn(Column value) { this.value = value; this.columnType = OTSColumnType.CONST; } public static OTSColumn fromNormalColumn(String name) { if (name.isEmpty()) { throw new IllegalArgumentException("The column name is empty."); } return new OTSColumn(name); } public static OTSColumn fromConstStringColumn(String value) { return new OTSColumn(new StringColumn(value)); } public static OTSColumn fromConstIntegerColumn(long value) { return new OTSColumn(new LongColumn(value)); } public static OTSColumn fromConstDoubleColumn(double value) { return new OTSColumn(new DoubleColumn(value)); } public static OTSColumn fromConstBoolColumn(boolean value) { return new OTSColumn(new BoolColumn(value)); } public static OTSColumn fromConstBytesColumn(byte[] value) { return new OTSColumn(new BytesColumn(value)); } public Column getValue() { return value; } public OTSColumnType getColumnType() { return columnType; } public String getName() { return name; } public ColumnType getValueType() { return valueType; } public void setValueType(ColumnType valueType) { this.valueType = valueType; } public Boolean getTimeseriesTag() { return isTimeseriesTag; } public void setTimeseriesTag(Boolean timeseriesTag) { isTimeseriesTag = timeseriesTag; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSConf.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.utils.Constant; import com.alibaba.datax.plugin.reader.otsreader.utils.Key; import com.alibaba.datax.plugin.reader.otsreader.utils.ParamChecker; import com.alicloud.openservices.tablestore.model.ColumnType; import java.util.List; public class OTSConf { private String endpoint = null; private String accessId = null; private String accessKey = null; private String instanceName = null; private String tableName = null; private OTSRange range = null; private List column = null; private OTSMode mode = null; @Deprecated private String metaMode = ""; private boolean newVersion = false; /** * 以下配置仅用于timeseries数据读取 */ private boolean isTimeseriesTable = false; private String measurementName = null; /** * 以上配置仅用于timeseries数据读取 */ private OTSMultiVersionConf multi = null; private int retry = Constant.ConfigDefaultValue.RETRY; private int retryPauseInMillisecond = Constant.ConfigDefaultValue.RETRY_PAUSE_IN_MILLISECOND; private int ioThreadCount = Constant.ConfigDefaultValue.IO_THREAD_COUNT; private int maxConnectionCount = Constant.ConfigDefaultValue.MAX_CONNECTION_COUNT; private int socketTimeoutInMillisecond = Constant.ConfigDefaultValue.SOCKET_TIMEOUT_IN_MILLISECOND; private int connectTimeoutInMillisecond = Constant.ConfigDefaultValue.CONNECT_TIMEOUT_IN_MILLISECOND; public int getIoThreadCount() { return ioThreadCount; } public void setIoThreadCount(int ioThreadCount) { this.ioThreadCount = ioThreadCount; } public int getMaxConnectCount() { return maxConnectionCount; } public void setMaxConnectCount(int maxConnectCount) { this.maxConnectionCount = maxConnectCount; } public int getSocketTimeoutInMillisecond() { return socketTimeoutInMillisecond; } public void setSocketTimeoutInMillisecond(int socketTimeoutInMillisecond) { this.socketTimeoutInMillisecond = socketTimeoutInMillisecond; } public int getConnectTimeoutInMillisecond() { return connectTimeoutInMillisecond; } public void setConnectTimeoutInMillisecond(int connectTimeoutInMillisecond) { this.connectTimeoutInMillisecond = connectTimeoutInMillisecond; } public int getRetry() { return retry; } public void setRetry(int retry) { this.retry = retry; } public int getRetryPauseInMillisecond() { return retryPauseInMillisecond; } public void setRetryPauseInMillisecond(int sleepInMillisecond) { this.retryPauseInMillisecond = sleepInMillisecond; } public String getEndpoint() { return endpoint; } public void setEndpoint(String endpoint) { this.endpoint = endpoint; } public String getAccessId() { return accessId; } public void setAccessId(String accessId) { this.accessId = accessId; } public String getAccessKey() { return accessKey; } public void setAccessKey(String accessKey) { this.accessKey = accessKey; } public String getInstanceName() { return instanceName; } public void setInstanceName(String instanceName) { this.instanceName = instanceName; } public String getTableName() { return tableName; } public void setTableName(String tableName) { this.tableName = tableName; } public OTSRange getRange() { return range; } public void setRange(OTSRange range) { this.range = range; } public OTSMode getMode() { return mode; } public void setMode(OTSMode mode) { this.mode = mode; } public OTSMultiVersionConf getMulti() { return multi; } public void setMulti(OTSMultiVersionConf multi) { this.multi = multi; } public List getColumn() { return column; } public void setColumn(List column) { this.column = column; } public boolean isNewVersion() { return newVersion; } public void setNewVersion(boolean newVersion) { this.newVersion = newVersion; } @Deprecated public String getMetaMode() { return metaMode; } @Deprecated public void setMetaMode(String metaMode) { this.metaMode = metaMode; } public boolean isTimeseriesTable() { return isTimeseriesTable; } public void setTimeseriesTable(boolean timeseriesTable) { isTimeseriesTable = timeseriesTable; } public String getMeasurementName() { return measurementName; } public void setMeasurementName(String measurementName) { this.measurementName = measurementName; } public static OTSConf load(Configuration param) throws OTSCriticalException { OTSConf c = new OTSConf(); // account c.setEndpoint(ParamChecker.checkStringAndGet(param, Key.OTS_ENDPOINT, true)); c.setAccessId(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSID, true)); c.setAccessKey(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSKEY, true)); c.setInstanceName(ParamChecker.checkStringAndGet(param, Key.OTS_INSTANCE_NAME, true)); c.setTableName(ParamChecker.checkStringAndGet(param, Key.TABLE_NAME, true)); c.setRetry(param.getInt(Constant.ConfigKey.RETRY, Constant.ConfigDefaultValue.RETRY)); c.setRetryPauseInMillisecond(param.getInt(Constant.ConfigKey.RETRY_PAUSE_IN_MILLISECOND, Constant.ConfigDefaultValue.RETRY_PAUSE_IN_MILLISECOND)); c.setIoThreadCount(param.getInt(Constant.ConfigKey.IO_THREAD_COUNT, Constant.ConfigDefaultValue.IO_THREAD_COUNT)); c.setMaxConnectCount(param.getInt(Constant.ConfigKey.MAX_CONNECTION_COUNT, Constant.ConfigDefaultValue.MAX_CONNECTION_COUNT)); c.setSocketTimeoutInMillisecond(param.getInt(Constant.ConfigKey.SOCKET_TIMEOUTIN_MILLISECOND, Constant.ConfigDefaultValue.SOCKET_TIMEOUT_IN_MILLISECOND)); c.setConnectTimeoutInMillisecond(param.getInt(Constant.ConfigKey.CONNECT_TIMEOUT_IN_MILLISECOND, Constant.ConfigDefaultValue.CONNECT_TIMEOUT_IN_MILLISECOND)); // range c.setRange(ParamChecker.checkRangeAndGet(param)); // mode 可选参数 c.setMode(ParamChecker.checkModeAndGet(param)); //isNewVersion 可选参数 c.setNewVersion(param.getBool(Key.NEW_VERSION, false)); // metaMode 旧版本配置 c.setMetaMode(param.getString(Key.META_MODE, "")); // 读时序表配置项 c.setTimeseriesTable(param.getBool(Key.IS_TIMESERIES_TABLE, false)); // column if(!c.isTimeseriesTable()){ //非时序表 c.setColumn(ParamChecker.checkOTSColumnAndGet(param, c.getMode())); } else{ // 时序表 c.setMeasurementName(param.getString(Key.MEASUREMENT_NAME, "")); c.setColumn(ParamChecker.checkTimeseriesColumnAndGet(param)); ParamChecker.checkTimeseriesMode(c.getMode(), c.isNewVersion()); } if (c.getMode() == OTSMode.MULTI_VERSION) { c.setMulti(OTSMultiVersionConf.load(param)); } return c; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSConst.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; public class OTSConst { // Reader support type public final static String TYPE_STRING = "STRING"; public final static String TYPE_INTEGER = "INT"; public final static String TYPE_DOUBLE = "DOUBLE"; public final static String TYPE_BOOLEAN = "BOOL"; public final static String TYPE_BINARY = "BINARY"; public final static String TYPE_INF_MIN = "INF_MIN"; public final static String TYPE_INF_MAX = "INF_MAX"; // Column public final static String NAME = "name"; public final static String TYPE = "type"; public final static String VALUE = "value"; public final static String OTS_CONF = "OTS_CONF"; public final static String OTS_RANGE = "OTS_RANGE"; public final static String OTS_DIRECTION = "OTS_DIRECTION"; // options public final static String RETRY = "maxRetryTime"; public final static String SLEEP_IN_MILLI_SECOND = "retrySleepInMillionSecond"; } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSCriticalException.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; /** * 插件错误异常,该异常主要用于描述插件的异常退出 * @author redchen */ public class OTSCriticalException extends Exception{ private static final long serialVersionUID = 5820460098894295722L; public OTSCriticalException() {} public OTSCriticalException(String message) { super(message); } public OTSCriticalException(Throwable a) { super(a); } public OTSCriticalException(String message, Throwable a) { super(message, a); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSErrorCode.java ================================================ /** * Copyright (C) Alibaba Cloud Computing * All rights reserved. * * 版权所有 (C)阿里云计算有限公司 */ package com.alibaba.datax.plugin.reader.otsreader.model; /** * 表示来自开放结构化数据服务(Open Table Service,OTS)的错误代码。 * */ public class OTSErrorCode { /** * 用户身份验证失败。 */ public static final String AUTHORIZATION_FAILURE = "OTSAuthFailed"; /** * 服务器内部错误。 */ public static final String INTERNAL_SERVER_ERROR = "OTSInternalServerError"; /** * 参数错误。 */ public static final String INVALID_PARAMETER = "OTSParameterInvalid"; /** * 整个请求过大。 */ public static final String REQUEST_TOO_LARGE = "OTSRequestBodyTooLarge"; /** * 客户端请求超时。 */ public static final String REQUEST_TIMEOUT = "OTSRequestTimeout"; /** * 用户的配额已经用满。 */ public static final String QUOTA_EXHAUSTED = "OTSQuotaExhausted"; /** * 内部服务器发生failover,导致表的部分分区不可服务。 */ public static final String PARTITION_UNAVAILABLE = "OTSPartitionUnavailable"; /** * 表刚被创建还无法立马提供服务。 */ public static final String TABLE_NOT_READY = "OTSTableNotReady"; /** * 请求的表不存在。 */ public static final String OBJECT_NOT_EXIST = "OTSObjectNotExist"; /** * 请求创建的表已经存在。 */ public static final String OBJECT_ALREADY_EXIST = "OTSObjectAlreadyExist"; /** * 多个并发的请求写同一行数据,导致冲突。 */ public static final String ROW_OPEARTION_CONFLICT = "OTSRowOperationConflict"; /** * 主键不匹配。 */ public static final String INVALID_PK = "OTSInvalidPK"; /** * 读写能力调整过于频繁。 */ public static final String TOO_FREQUENT_RESERVED_THROUGHPUT_ADJUSTMENT = "OTSTooFrequentReservedThroughputAdjustment"; /** * 该行总列数超出限制。 */ public static final String OUT_OF_COLUMN_COUNT_LIMIT = "OTSOutOfColumnCountLimit"; /** * 该行所有列数据大小总和超出限制。 */ public static final String OUT_OF_ROW_SIZE_LIMIT = "OTSOutOfRowSizeLimit"; /** * 剩余预留读写能力不足。 */ public static final String NOT_ENOUGH_CAPACITY_UNIT = "OTSNotEnoughCapacityUnit"; /** * 预查条件检查失败。 */ public static final String CONDITION_CHECK_FAIL = "OTSConditionCheckFail"; /** * 在OTS内部操作超时。 */ public static final String STORAGE_TIMEOUT = "OTSTimeout"; /** * 在OTS内部有服务器不可访问。 */ public static final String SERVER_UNAVAILABLE = "OTSServerUnavailable"; /** * OTS内部服务器繁忙。 */ public static final String SERVER_BUSY = "OTSServerBusy"; } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSMode.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; public enum OTSMode { NORMAL, MULTI_VERSION } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSMultiVersionConf.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.utils.Constant; import com.alibaba.datax.plugin.reader.otsreader.utils.ParamChecker; import com.alicloud.openservices.tablestore.model.TimeRange; public class OTSMultiVersionConf { private TimeRange timeRange = null; private int maxVersion = -1; public TimeRange getTimeRange() { return timeRange; } public void setTimeRange(TimeRange timeRange) { this.timeRange = timeRange; } public int getMaxVersion() { return maxVersion; } public void setMaxVersion(int maxVersion) { this.maxVersion = maxVersion; } public static OTSMultiVersionConf load(Configuration param) throws OTSCriticalException { OTSMultiVersionConf conf = new OTSMultiVersionConf(); conf.setTimeRange(ParamChecker.checkTimeRangeAndGet(param)); conf.setMaxVersion(param.getInt(Constant.ConfigKey.MAX_VERSION, Constant.ConfigDefaultValue.MAX_VERSION)); return conf; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSPrimaryKeyColumn.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; import com.aliyun.openservices.ots.model.PrimaryKeyType; public class OTSPrimaryKeyColumn { private String name; private PrimaryKeyType type; public String getName() { return name; } public void setName(String name) { this.name = name; } public PrimaryKeyType getType() { return type; } public com.alicloud.openservices.tablestore.model.PrimaryKeyType getType(Boolean newVersion) { com.alicloud.openservices.tablestore.model.PrimaryKeyType res = null; switch (this.type){ case BINARY: res = com.alicloud.openservices.tablestore.model.PrimaryKeyType.BINARY; break; case INTEGER: res = com.alicloud.openservices.tablestore.model.PrimaryKeyType.INTEGER; break; case STRING: default: res = com.alicloud.openservices.tablestore.model.PrimaryKeyType.STRING; break; } return res; } public void setType(PrimaryKeyType type) { this.type = type; } public void setType(com.alicloud.openservices.tablestore.model.PrimaryKeyType type) { switch (type){ case BINARY: this.type = PrimaryKeyType.BINARY; break; case INTEGER: this.type = PrimaryKeyType.INTEGER; break; case STRING: default: this.type = PrimaryKeyType.STRING; break; } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/model/OTSRange.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.model; import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; import java.util.List; public class OTSRange { private List begin = null; private List end = null; private List split = null; public List getBegin() { return begin; } public void setBegin(List begin) { this.begin = begin; } public List getEnd() { return end; } public void setEnd(List end) { this.end = end; } public List getSplit() { return split; } public void setSplit(List split) { this.split = split; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Common.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSCriticalException; import com.alibaba.datax.plugin.reader.otsreader.model.OTSPrimaryKeyColumn; import com.alicloud.openservices.tablestore.model.*; import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; import java.lang.reflect.Field; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; public class Common { public static List toColumnToGet(List columns, TableMeta meta) { Map pk = meta.getPrimaryKeyMap(); List names = new ArrayList(); for (OTSColumn c : columns) { if (c.getColumnType() == OTSColumn.OTSColumnType.NORMAL && !pk.containsKey(c.getName())) { names.add(c.getName()); } } return names; } public static List getPrimaryKeyNameList(TableMeta meta) { List names = new ArrayList(); names.addAll(meta.getPrimaryKeyMap().keySet()); return names; } public static OTSPrimaryKeyColumn getPartitionKey(TableMeta meta) { List keys = new ArrayList(); keys.addAll(meta.getPrimaryKeyMap().keySet()); String key = keys.get(0); OTSPrimaryKeyColumn col = new OTSPrimaryKeyColumn(); col.setName(key); col.setType(meta.getPrimaryKeyMap().get(key)); return col; } public static Direction getDirection(List begin, List end) throws OTSCriticalException { int cmp = CompareHelper.comparePrimaryKeyColumnList(begin, end); if (cmp < 0) { return Direction.FORWARD; } else if (cmp > 0) { return Direction.BACKWARD; } else { throw new OTSCriticalException("Bug branch, the begin of range equals end of range."); } } public static int compareRangeBeginAndEnd(TableMeta meta, List begin, List end) { if (begin.size() != end.size()) { throw new IllegalArgumentException("Input size of begin not equal size of end, begin size : " + begin.size() + ", end size : " + end.size() + "."); } Map beginMap = new HashMap<>(); Map endMap = new HashMap<>(); for(PrimaryKeyColumn primaryKeyColumn : begin){ beginMap.put(primaryKeyColumn.getName(), primaryKeyColumn.getValue()); } for(PrimaryKeyColumn primaryKeyColumn : end){ endMap.put(primaryKeyColumn.getName(), primaryKeyColumn.getValue()); } for (String key : meta.getPrimaryKeyMap().keySet()) { PrimaryKeyValue v1 = beginMap.get(key); PrimaryKeyValue v2 = endMap.get(key); int cmp = primaryKeyValueCmp(v1, v2); if (cmp != 0) { return cmp; } } return 0; } public static int primaryKeyValueCmp(PrimaryKeyValue v1, PrimaryKeyValue v2) { if (v1.getType() != null && v2.getType() != null) { if (v1.getType() != v2.getType()) { throw new IllegalArgumentException( "Not same column type, column1:" + v1.getType() + ", column2:" + v2.getType()); } switch (v1.getType()) { case INTEGER: Long l1 = Long.valueOf(v1.asLong()); Long l2 = Long.valueOf(v2.asLong()); return l1.compareTo(l2); case STRING: return v1.asString().compareTo(v2.asString()); default: throw new IllegalArgumentException("Unsuporrt compare the type: " + v1.getType() + "."); } } else { if (v1 == v2) { return 0; } else { if (v1 == PrimaryKeyValue.INF_MIN) { return -1; } else if (v1 == PrimaryKeyValue.INF_MAX) { return 1; } if (v2 == PrimaryKeyValue.INF_MAX) { return -1; } else if (v2 == PrimaryKeyValue.INF_MIN) { return 1; } } } return 0; } public static void checkTableStoreSDKVersion() throws OTSCriticalException { Field[] fields = ScanTimeseriesDataResponse.class.getFields(); String sdkVersion = null; for (Field f : fields){ if (f.getName().equals("_VERSION_")){ sdkVersion = ScanTimeseriesDataResponse._VERSION_; break; } } if (sdkVersion == null){ throw new OTSCriticalException("Check ots java SDK failed. Please check the version of tableStore maven dependency."); }else if (Integer.parseInt(sdkVersion) < 20230111){ throw new OTSCriticalException("Check tableStore java SDK failed. The expected version number is greater than 20230111, actually version : " + sdkVersion + "."); } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/CommonOld.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import java.util.ArrayList; import java.util.List; import java.util.Map; import com.alibaba.datax.common.element.BoolColumn; import com.alibaba.datax.common.element.BytesColumn; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSPrimaryKeyColumn; import com.aliyun.openservices.ots.ClientException; import com.aliyun.openservices.ots.OTSException; import com.aliyun.openservices.ots.model.ColumnValue; import com.aliyun.openservices.ots.model.PrimaryKeyValue; import com.aliyun.openservices.ots.model.Row; import com.aliyun.openservices.ots.model.RowPrimaryKey; import com.aliyun.openservices.ots.model.TableMeta; public class CommonOld { public static int primaryKeyValueCmp(PrimaryKeyValue v1, PrimaryKeyValue v2) { if (v1.getType() != null && v2.getType() != null) { if (v1.getType() != v2.getType()) { throw new IllegalArgumentException( "Not same column type, column1:" + v1.getType() + ", column2:" + v2.getType()); } switch (v1.getType()) { case INTEGER: Long l1 = Long.valueOf(v1.asLong()); Long l2 = Long.valueOf(v2.asLong()); return l1.compareTo(l2); case STRING: return v1.asString().compareTo(v2.asString()); default: throw new IllegalArgumentException("Unsuporrt compare the type: " + v1.getType() + "."); } } else { if (v1 == v2) { return 0; } else { if (v1 == PrimaryKeyValue.INF_MIN) { return -1; } else if (v1 == PrimaryKeyValue.INF_MAX) { return 1; } if (v2 == PrimaryKeyValue.INF_MAX) { return -1; } else if (v2 == PrimaryKeyValue.INF_MIN) { return 1; } } } return 0; } public static List getNormalColumnNameList(List columns) { List normalColumns = new ArrayList(); for (OTSColumn col : columns) { if (col.getColumnType() == OTSColumn.OTSColumnType.NORMAL) { normalColumns.add(col.getName()); } } return normalColumns; } public static Record parseRowToLine(Row row, List columns, Record line) { Map values = row.getColumns(); for (OTSColumn col : columns) { if (col.getColumnType() == OTSColumn.OTSColumnType.CONST) { line.addColumn(col.getValue()); } else { ColumnValue v = values.get(col.getName()); if (v == null) { line.addColumn(new StringColumn(null)); } else { switch(v.getType()) { case STRING: line.addColumn(new StringColumn(v.asString())); break; case INTEGER: line.addColumn(new LongColumn(v.asLong())); break; case DOUBLE: line.addColumn(new DoubleColumn(v.asDouble())); break; case BOOLEAN: line.addColumn(new BoolColumn(v.asBoolean())); break; case BINARY: line.addColumn(new BytesColumn(v.asBinary())); break; default: throw new IllegalArgumentException("Unsuporrt tranform the type: " + col.getValue().getType() + "."); } } } } return line; } public static long getDelaySendMillinSeconds(int hadRetryTimes, int initSleepInMilliSecond) { if (hadRetryTimes <= 0) { return 0; } int sleepTime = initSleepInMilliSecond; for (int i = 1; i < hadRetryTimes; i++) { sleepTime += sleepTime; if (sleepTime > 30000) { sleepTime = 30000; break; } } return sleepTime; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/CompareHelper.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; import java.util.List; public class CompareHelper { /** * 比较PrimaryKeyColumn List的大小 * 返回 * -1 表示before小于after * 0 表示before等于after * 1 表示before大于after * * @param before * @param after * @return */ public static int comparePrimaryKeyColumnList(List before, List after) { int size = before.size() < after.size() ? before.size() : after.size(); for (int i = 0; i < size; i++) { int cmp = before.get(i).compareTo(after.get(i)); if (cmp != 0) { return cmp; } } if (before.size() < after.size() ) { return -1; } else if (before.size() > after.size() ) { return 1; } return 0; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Constant.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; public class Constant { /** * Json中的Key名字定义 */ public class ConfigKey { public static final String CONF = "conf"; public static final String RANGE = "range"; public static final String META = "meta"; public static final String SPLIT_INFO = "splitInfo"; public static final String TIME_RANGE = "timeRange"; public static final String MAX_VERSION = "maxVersion"; public static final String RETRY = "maxRetryTime"; public static final String RETRY_PAUSE_IN_MILLISECOND = "retryPauseInMillisecond"; public static final String IO_THREAD_COUNT = "ioThreadCount"; public static final String MAX_CONNECTION_COUNT = "maxConnectionCount"; public static final String SOCKET_TIMEOUTIN_MILLISECOND = "socketTimeoutInMillisecond"; public static final String CONNECT_TIMEOUT_IN_MILLISECOND = "connectTimeoutInMillisecond"; public class Range { public static final String BEGIN = "begin"; public static final String END = "end"; public static final String SPLIT = "split"; }; public class PrimaryKeyColumn { public static final String TYPE = "type"; public static final String VALUE = "value"; }; public class TimeseriesPKColumn { public static final String MEASUREMENT_NAME = "_m_name"; public static final String DATA_SOURCE = "_data_source"; public static final String TAGS = "_tags"; public static final String TIME = "_time"; } public class Column { public static final String NAME = "name"; public static final String TYPE = "type"; public static final String VALUE = "value"; public static final String IS_TAG = "is_timeseries_tag"; }; public class TimeRange { public static final String BEGIN = "begin"; public static final String END = "end"; } }; /** * 定义的配置文件中value type中可取的值 */ public class ValueType { public static final String INF_MIN = "INF_MIN"; public static final String INF_MAX = "INF_MAX"; public static final String STRING = "string"; public static final String INTEGER = "int"; public static final String BINARY = "binary"; public static final String DOUBLE = "double"; public static final String BOOLEAN = "bool"; }; /** * 全局默认常量定义 */ public class ConfigDefaultValue { public static final int RETRY = 18; public static final int RETRY_PAUSE_IN_MILLISECOND = 100; public static final int IO_THREAD_COUNT = 1; public static final int MAX_CONNECTION_COUNT = 1; public static final int SOCKET_TIMEOUT_IN_MILLISECOND = 10000; public static final int CONNECT_TIMEOUT_IN_MILLISECOND = 10000; public static final int MAX_VERSION = Integer.MAX_VALUE; public static final String DEFAULT_NAME = "DEFAULT_NAME"; public class Mode { public static final String NORMAL = "normal"; public static final String MULTI_VERSION = "multiVersion"; } public class TimeRange { public static final long MIN = 0; public static final long MAX = Long.MAX_VALUE; } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/DefaultNoRetry.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.aliyun.openservices.ots.internal.OTSDefaultRetryStrategy; public class DefaultNoRetry extends OTSDefaultRetryStrategy { @Override public boolean shouldRetry(String action, Exception ex, int retries) { return false; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/GsonParser.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.plugin.reader.otsreader.adaptor.ColumnAdaptor; import com.alibaba.datax.plugin.reader.otsreader.adaptor.PrimaryKeyValueAdaptor; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; import com.alicloud.openservices.tablestore.model.TableMeta; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; import com.aliyun.openservices.ots.model.Direction; import com.aliyun.openservices.ots.model.RowPrimaryKey; import com.google.gson.Gson; import com.google.gson.GsonBuilder; import java.util.Map; public class GsonParser { private static Gson gsonBuilder() { return new GsonBuilder() .registerTypeAdapter(PrimaryKeyValue.class, new PrimaryKeyValueAdaptor()) .registerTypeAdapter(Column.class, new ColumnAdaptor()) .create(); } public static String rangeToJson (OTSRange range) { Gson g = gsonBuilder(); return g.toJson(range); } public static OTSRange jsonToRange (String jsonStr) { Gson g = gsonBuilder(); return g.fromJson(jsonStr, OTSRange.class); } public static String confToJson (OTSConf conf) { Gson g = gsonBuilder(); return g.toJson(conf); } public static OTSConf jsonToConf (String jsonStr) { Gson g = gsonBuilder(); return g.fromJson(jsonStr, OTSConf.class); } public static String metaToJson (TableMeta meta) { Gson g = gsonBuilder(); return g.toJson(meta); } public static TableMeta jsonToMeta (String jsonStr) { Gson g = gsonBuilder(); return g.fromJson(jsonStr, TableMeta.class); } public static String timeseriesScanSplitInfoToString(TimeseriesScanSplitInfo timeseriesScanSplitInfo){ Gson g = gsonBuilder(); return g.toJson(timeseriesScanSplitInfo); } public static TimeseriesScanSplitInfo stringToTimeseriesScanSplitInfo(String jsonStr){ Gson g = gsonBuilder(); return g.fromJson(jsonStr, TimeseriesScanSplitInfo.class); } public static Direction jsonToDirection (String jsonStr) { Gson g = gsonBuilder(); return g.fromJson(jsonStr, Direction.class); } public static String rowPrimaryKeyToJson (RowPrimaryKey row) { Gson g = gsonBuilder(); return g.toJson(row); } public static String mapToJson (Map map) { Gson g = gsonBuilder(); return g.toJson(map); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/Key.java ================================================ /** * (C) 2010-2014 Alibaba Group Holding Limited. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.alibaba.datax.plugin.reader.otsreader.utils; public final class Key { /* ots account configuration */ public final static String OTS_ENDPOINT = "endpoint"; public final static String OTS_ACCESSID = "accessId"; public final static String OTS_ACCESSKEY = "accessKey"; public final static String OTS_INSTANCE_NAME = "instanceName"; public final static String TABLE_NAME = "table"; public final static String COLUMN = "column"; //====================================================== // 注意:如果range-begin大于range-end,那么系统将逆序导出所有数据 //====================================================== // Range的组织格式 // "range":{ // "begin":[], // "end":[], // "split":[] // } public final static String RANGE = "range"; public final static String RANGE_BEGIN = "begin"; public final static String RANGE_END = "end"; public final static String RANGE_SPLIT = "split"; public final static String META_MODE = "metaMode"; public final static String MODE = "mode"; public final static String NEW_VERSION = "newVersion"; public final static String IS_TIMESERIES_TABLE = "isTimeseriesTable"; public final static String MEASUREMENT_NAME = "measurementName"; } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/OtsHelper.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.plugin.reader.otsreader.callable.GetRangeCallable; import com.alibaba.datax.plugin.reader.otsreader.callable.GetTableMetaCallable; import com.alibaba.datax.plugin.reader.otsreader.callable.GetTimeseriesSplitCallable; import com.alibaba.datax.plugin.reader.otsreader.callable.ScanTimeseriesDataCallable; import com.alibaba.datax.plugin.reader.otsreader.model.DefaultNoRetry; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConf; import com.alicloud.openservices.tablestore.ClientConfiguration; import com.alicloud.openservices.tablestore.SyncClient; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.core.utils.Pair; import com.alicloud.openservices.tablestore.model.ColumnType; import com.alicloud.openservices.tablestore.model.GetRangeResponse; import com.alicloud.openservices.tablestore.model.RangeRowQueryCriteria; import com.alicloud.openservices.tablestore.model.TableMeta; import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataRequest; import com.alicloud.openservices.tablestore.model.timeseries.ScanTimeseriesDataResponse; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesScanSplitInfo; import java.util.HashMap; import java.util.List; import java.util.Map; public class OtsHelper { public static SyncClientInterface getOTSInstance(OTSConf conf) { ClientConfiguration clientConfigure = new ClientConfiguration(); clientConfigure.setIoThreadCount(conf.getIoThreadCount()); clientConfigure.setMaxConnections(conf.getMaxConnectCount()); clientConfigure.setSocketTimeoutInMillisecond(conf.getSocketTimeoutInMillisecond()); clientConfigure.setConnectionTimeoutInMillisecond(conf.getConnectTimeoutInMillisecond()); clientConfigure.setRetryStrategy(new DefaultNoRetry()); SyncClient ots = new SyncClient( conf.getEndpoint(), conf.getAccessId(), conf.getAccessKey(), conf.getInstanceName(), clientConfigure); Map extraHeaders = new HashMap(); extraHeaders.put("x-ots-sdk-type", "public"); extraHeaders.put("x-ots-request-source", "datax-otsreader"); ots.setExtraHeaders(extraHeaders); return ots; } public static TableMeta getTableMeta(SyncClientInterface ots, String tableName, int retry, int sleepInMillisecond) throws Exception { return RetryHelper.executeWithRetry( new GetTableMetaCallable(ots, tableName), retry, sleepInMillisecond ); } public static GetRangeResponse getRange(SyncClientInterface ots, RangeRowQueryCriteria rangeRowQueryCriteria, int retry, int sleepInMillisecond) throws Exception { return RetryHelper.executeWithRetry( new GetRangeCallable(ots, rangeRowQueryCriteria), retry, sleepInMillisecond ); } public static List splitTimeseriesScan(SyncClientInterface ots, String tableName, String measurementName, int splitCountHint, int retry, int sleepInMillisecond) throws Exception { return RetryHelper.executeWithRetry( new GetTimeseriesSplitCallable(ots, tableName, measurementName, splitCountHint), retry, sleepInMillisecond ); } public static ScanTimeseriesDataResponse scanTimeseriesData(SyncClientInterface ots, ScanTimeseriesDataRequest scanTimeseriesDataRequest, int retry, int sleepInMillisecond) throws Exception { return RetryHelper.executeWithRetry( new ScanTimeseriesDataCallable(ots, scanTimeseriesDataRequest), retry, sleepInMillisecond ); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/OtsReaderError.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.common.spi.ErrorCode; public class OtsReaderError implements ErrorCode { private String code; private String description; // TODO // 这一块需要DATAX来统一定义分类, OTS基于这些分类在细化 // 所以暂定两个基础的Error Code,其他错误统一使用OTS的错误码和错误消息 public final static OtsReaderError ERROR = new OtsReaderError( "OtsReaderError", "This error represents an internal error of the otsreader plugin, which indicates that the system is not processed."); public final static OtsReaderError INVALID_PARAM = new OtsReaderError( "OtsReaderInvalidParameter", "This error represents a parameter error, indicating that the user entered the wrong parameter format."); public OtsReaderError (String code) { this.code = code; this.description = code; } public OtsReaderError (String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamChecker.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsreader.model.*; import com.alicloud.openservices.tablestore.model.*; import java.util.*; public class ParamChecker { private static void throwNotExistException() { throw new IllegalArgumentException("missing the key."); } private static void throwStringLengthZeroException() { throw new IllegalArgumentException("input the key is empty string."); } public static String checkStringAndGet(Configuration param, String key, boolean isTrim) throws OTSCriticalException { try { String value = param.getString(key); if (isTrim) { value = value != null ? value.trim() : null; } if (null == value) { throwNotExistException(); } else if (value.length() == 0) { throwStringLengthZeroException(); } return value; } catch(RuntimeException e) { throw new OTSCriticalException("Parse '"+ key +"' fail, " + e.getMessage(), e); } } public static Direction checkDirectionAndEnd(TableMeta meta, List begin, List end) { Direction direction = null; int cmp = Common.compareRangeBeginAndEnd(meta, begin, end) ; if (cmp > 0) { direction = Direction.BACKWARD; } else if (cmp < 0) { direction = Direction.FORWARD; } else { throw new IllegalArgumentException("Value of 'range-begin' equal value of 'range-end'."); } return direction; } public static List checkInputPrimaryKeyAndGet(TableMeta meta, List range) { if (meta.getPrimaryKeyMap().size() != range.size()) { throw new IllegalArgumentException(String.format( "Input size of values not equal size of primary key. input size:%d, primary key size:%d .", range.size(), meta.getPrimaryKeyMap().size())); } List pk = new ArrayList<>(); int i = 0; for (Map.Entry e: meta.getPrimaryKeyMap().entrySet()) { PrimaryKeyValue value = range.get(i); if (e.getValue() != value.getType() && value != PrimaryKeyValue.INF_MIN && value != PrimaryKeyValue.INF_MAX) { throw new IllegalArgumentException( "Input range type not match primary key. Input type:" + value.getType() + ", Primary Key Type:"+ e.getValue() +", Index:" + i ); } else { pk.add(new PrimaryKeyColumn(e.getKey(), value)); } i++; } return pk; } public static OTSRange checkRangeAndGet(Configuration param) throws OTSCriticalException { try { OTSRange range = new OTSRange(); Map value = param.getMap(Key.RANGE); // 用户可以不用配置range,默认表示导出全表 if (value == null) { return range; } /** * Range格式:{ * "begin":[], * "end":[] * } */ // begin // 如果不存在,表示从表开始位置读取 Object arrayObj = value.get(Constant.ConfigKey.Range.BEGIN); if (arrayObj != null) { range.setBegin(ParamParser.parsePrimaryKeyColumnArray(arrayObj)); } // end // 如果不存在,表示读取到表的结束位置 arrayObj = value.get(Constant.ConfigKey.Range.END); if (arrayObj != null) { range.setEnd(ParamParser.parsePrimaryKeyColumnArray(arrayObj)); } // split // 如果不存在,表示不做切分 arrayObj = value.get(Constant.ConfigKey.Range.SPLIT); if (arrayObj != null) { range.setSplit(ParamParser.parsePrimaryKeyColumnArray(arrayObj)); } return range; } catch (RuntimeException e) { throw new OTSCriticalException("Parse 'range' fail, " + e.getMessage(), e); } } public static TimeRange checkTimeRangeAndGet(Configuration param) throws OTSCriticalException { try { long begin = Constant.ConfigDefaultValue.TimeRange.MIN; long end = Constant.ConfigDefaultValue.TimeRange.MAX; Map value = param.getMap(Constant.ConfigKey.TIME_RANGE); // 用户可以不用配置time range,默认表示导出全表 if (value == null) { return new TimeRange(begin, end); } /** * TimeRange格式:{ * "begin":, * "end": * } */ // begin // 如果不存在,表示从表开始位置读取 Object obj = value.get(Constant.ConfigKey.TimeRange.BEGIN); if (obj != null) { begin = ParamParser.parseTimeRangeItem(obj, Constant.ConfigKey.TimeRange.BEGIN); } // end // 如果不存在,表示读取到表的结束位置 obj = value.get(Constant.ConfigKey.TimeRange.END); if (obj != null) { end = ParamParser.parseTimeRangeItem(obj, Constant.ConfigKey.TimeRange.END); } TimeRange range = new TimeRange(begin, end); return range; } catch (RuntimeException e) { throw new OTSCriticalException("Parse 'timeRange' fail, " + e.getMessage(), e); } } private static void checkColumnByMode(List columns , OTSMode mode) { if (mode == OTSMode.MULTI_VERSION) { for (OTSColumn c : columns) { if (c.getColumnType() != OTSColumn.OTSColumnType.NORMAL) { throw new IllegalArgumentException("in mode:'multiVersion', the 'column' only support specify column_name not const column."); } } } else { if (columns.isEmpty()) { throw new IllegalArgumentException("in mode:'normal', the 'column' must specify at least one column_name or const column."); } } } public static List checkOTSColumnAndGet(Configuration param, OTSMode mode) throws OTSCriticalException { try { List value = param.getList(Key.COLUMN); // 用户可以不用配置Column if (value == null) { value = Collections.emptyList(); } /** * Column格式:[ * {"Name":"pk1"}, * {"type":"Binary","value" : "base64()"} * ] */ List columns = ParamParser.parseOTSColumnArray(value); checkColumnByMode(columns, mode); return columns; } catch (RuntimeException e) { throw new OTSCriticalException("Parse 'column' fail, " + e.getMessage(), e); } } public static List checkTimeseriesColumnAndGet(Configuration param) throws OTSCriticalException { try { List value = param.getList(Key.COLUMN); List columns = ParamParser.parseOTSColumnArray(value); List columnTypes = checkColumnTypeAndGet(param); List isTags = checkColumnIsTagAndGet(param); for (int i = 0; i < columns.size(); i++) { columns.get(i).setValueType(columnTypes.get(i)); columns.get(i).setTimeseriesTag(isTags.get(i)); } checkColumnByMode(columns, OTSMode.NORMAL); return columns; } catch (RuntimeException e) { throw new OTSCriticalException("Parse 'column' fail, " + e.getMessage(), e); } } public static List checkColumnTypeAndGet(Configuration param) throws OTSCriticalException { try { List value = param.getList(Key.COLUMN); List columnTypes = ParamParser.parseColumnTypeArray(value); return columnTypes; } catch (RuntimeException e) { throw new OTSCriticalException("Parse 'type of column' fail, " + e.getMessage(), e); } } public static List checkColumnIsTagAndGet(Configuration param) throws OTSCriticalException { try { List value = param.getList(Key.COLUMN); List columnIsTag = ParamParser.parseColumnIsTagArray(value); return columnIsTag; } catch (RuntimeException e) { throw new OTSCriticalException("Parse 'isTag of column' fail, " + e.getMessage(), e); } } public static OTSMode checkModeAndGet(Configuration param) throws OTSCriticalException { try { String modeValue = param.getString(Key.MODE, "normal"); if (modeValue.equalsIgnoreCase(Constant.ConfigDefaultValue.Mode.NORMAL)) { return OTSMode.NORMAL; } else if (modeValue.equalsIgnoreCase(Constant.ConfigDefaultValue.Mode.MULTI_VERSION)) { return OTSMode.MULTI_VERSION; } else { throw new IllegalArgumentException("the 'mode' only support 'normal' and 'multiVersion' not '"+ modeValue +"'."); } } catch(RuntimeException e) { throw new OTSCriticalException("Parse 'mode' fail, " + e.getMessage(), e); } } public static void checkTimeseriesMode(OTSMode mode, Boolean isNewVersion) throws OTSCriticalException { if (mode == OTSMode.MULTI_VERSION){ throw new OTSCriticalException("Timeseries table do not support mode : multiVersion." ); } else if (!isNewVersion){ throw new OTSCriticalException("Timeseries table is only supported in newVersion, please set \"newVersion\": \"true\"." ); } } public static List checkAndGetPrimaryKey( List pk, List pkSchema, String jsonKey){ List result = new ArrayList(); if(pk != null) { if (pk.size() > pkSchema.size()) { throw new IllegalArgumentException("The '"+ jsonKey +"', input primary key column size more than table meta, input size: "+ pk.size() +", meta pk size:" + pkSchema.size()); } else { //类型检查 for (int i = 0; i < pk.size(); i++) { PrimaryKeyValue pkc = pk.get(i).getValue(); PrimaryKeySchema pkcs = pkSchema.get(i); if (!pkc.isInfMin() && !pkc.isInfMax() ) { if (pkc.getType() != pkcs.getType()) { throw new IllegalArgumentException( "The '"+ jsonKey +"', input primary key column type mismath table meta, input type:"+ pkc.getType() +", meta pk type:"+ pkcs.getType() +", index:" + i); } } result.add(new PrimaryKeyColumn(pkcs.getName(), pkc)); } } return result; } else { return new ArrayList(); } } /** * 检查split的类型是否和PartitionKey一致 * @param points * @param pkSchema */ private static List checkAndGetSplit( List points, List pkSchema){ List result = new ArrayList(); if (points == null) { return result; } // check 类型是否和PartitionKey一致即可 PrimaryKeySchema partitionKeySchema = pkSchema.get(0); for (int i = 0 ; i < points.size(); i++) { PrimaryKeyColumn p = points.get(i); if (!p.getValue().isInfMin() && !p.getValue().isInfMax()) { if (p.getValue().getType() != partitionKeySchema.getType()) { throw new IllegalArgumentException("The 'split', input primary key column type is mismatch partition key, input type: "+ p.getValue().getType().toString() +", partition key type:" + partitionKeySchema.getType().toString() +", index:" + i); } } result.add(new PrimaryKeyColumn(partitionKeySchema.getName(), p.getValue())); } return result; } public static void fillPrimaryKey(List pkSchema, List pk, PrimaryKeyValue fillValue) { for(int i = pk.size(); i < pkSchema.size(); i++) { pk.add(new PrimaryKeyColumn(pkSchema.get(i).getName(), fillValue)); } } private static void fillBeginAndEnd( List begin, List end, List pkSchema) { if (begin.isEmpty()) { fillPrimaryKey(pkSchema, begin, PrimaryKeyValue.INF_MIN); } if (end.isEmpty()) { fillPrimaryKey(pkSchema, end, PrimaryKeyValue.INF_MAX); } int cmp = CompareHelper.comparePrimaryKeyColumnList(begin, end); if (cmp == 0) { // begin.size()和end.size()理论上必然相等,但是考虑到语义的清晰性,显示的给出begin.size() == end.size() if (begin.size() == end.size() && begin.size() < pkSchema.size()) { fillPrimaryKey(pkSchema, begin, PrimaryKeyValue.INF_MIN); fillPrimaryKey(pkSchema, end, PrimaryKeyValue.INF_MAX); } else { throw new IllegalArgumentException("The 'begin' can not be equal with 'end'."); } } else if (cmp < 0) { // 升序 fillPrimaryKey(pkSchema, begin, PrimaryKeyValue.INF_MIN); fillPrimaryKey(pkSchema, end, PrimaryKeyValue.INF_MAX); } else { // 降序 fillPrimaryKey(pkSchema, begin, PrimaryKeyValue.INF_MAX); fillPrimaryKey(pkSchema, end, PrimaryKeyValue.INF_MIN); } } private static void checkBeginAndEndAndSplit( List begin, List end, List split) { int cmp = CompareHelper.comparePrimaryKeyColumnList(begin, end); if (!split.isEmpty()) { if (cmp < 0) { // 升序 // 检查是否是升序 for (int i = 0 ; i < split.size() - 1; i++) { PrimaryKeyColumn before = split.get(i); PrimaryKeyColumn after = split.get(i + 1); if (before.compareTo(after) >=0) { // 升序 throw new IllegalArgumentException("In 'split', the item value is not increasing, index: " + i); } } if (begin.get(0).compareTo(split.get(0)) >= 0) { throw new IllegalArgumentException("The 'begin' must be less than head of 'split'."); } if (split.get(split.size() - 1).compareTo(end.get(0)) >= 0) { throw new IllegalArgumentException("tail of 'split' must be less than 'end'."); } } else if (cmp > 0) {// 降序 // 检查是否是降序 for (int i = 0 ; i < split.size() - 1; i++) { PrimaryKeyColumn before = split.get(i); PrimaryKeyColumn after = split.get(i + 1); if (before.compareTo(after) <= 0) { // 升序 throw new IllegalArgumentException("In 'split', the item value is not descending, index: " + i); } } if (begin.get(0).compareTo(split.get(0)) <= 0) { throw new IllegalArgumentException("The 'begin' must be large than head of 'split'."); } if (split.get(split.size() - 1).compareTo(end.get(0)) <= 0) { throw new IllegalArgumentException("tail of 'split' must be large than 'end'."); } } else { throw new IllegalArgumentException("The 'begin' can not equal with 'end'."); } } } /** * 填充不完整的PK * 检查Begin、End、Split 3者之间的关系是否符合预期 * @param begin * @param end * @param split */ private static void fillAndcheckBeginAndEndAndSplit( List begin, List end, List split, List pkSchema ) { fillBeginAndEnd(begin, end, pkSchema); checkBeginAndEndAndSplit(begin, end, split); } public static void checkAndSetOTSRange(OTSRange range, TableMeta meta) throws OTSCriticalException { try { List pkSchema = meta.getPrimaryKeyList(); // 检查是begin和end否和PK类型一致 range.setBegin(checkAndGetPrimaryKey(range.getBegin(), pkSchema, Constant.ConfigKey.Range.BEGIN)); range.setEnd(checkAndGetPrimaryKey(range.getEnd(), pkSchema, Constant.ConfigKey.Range.END)); range.setSplit(checkAndGetSplit(range.getSplit(), pkSchema)); // 1.填充Begin和End // 2.检查begin,end,split顺序是否正确 fillAndcheckBeginAndEndAndSplit(range.getBegin(), range.getEnd(), range.getSplit(), pkSchema); } catch(RuntimeException e) { throw new OTSCriticalException("Parse 'range' fail, " + e.getMessage(), e); } } public static void checkAndSetColumn(List columns, TableMeta meta, OTSMode mode) throws OTSCriticalException { try { if (mode == OTSMode.MULTI_VERSION) { Set uniqueColumn = new HashSet(); Map pk = meta.getPrimaryKeyMap(); for (OTSColumn c : columns) { // 是否包括PK列 if (pk.get(c.getName()) != null) { throw new IllegalArgumentException("in mode:'multiVersion', the 'column' can not include primary key column, input:"+ c.getName() +"."); } // 是否有重复列 if (uniqueColumn.contains(c.getName())) { throw new IllegalArgumentException("in mode:'multiVersion', the 'column' can not include same column, input:"+ c.getName() +"."); } else { uniqueColumn.add(c.getName()); } } } } catch(RuntimeException e) { throw new OTSCriticalException("Parse 'column' fail, " + e.getMessage(), e); } } public static void normalCheck(OTSConf conf) { // 旧版本不支持multiVersion模式 if(!conf.isNewVersion() && conf.getMode() == OTSMode.MULTI_VERSION){ throw new IllegalArgumentException("in mode:'multiVersion' :The old version do not support multiVersion mode. Please add config in otsreader: \"newVersion\":\"true\" ."); } } public static void checkAndSetOTSConf(OTSConf conf, TableMeta meta) throws OTSCriticalException { normalCheck(conf); checkAndSetOTSRange(conf.getRange(), meta); checkAndSetColumn(conf.getColumn(), meta, conf.getMode()); } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamCheckerOld.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.common.util.Configuration; import java.util.List; public class ParamCheckerOld { private static void throwNotExistException(String key) { throw new IllegalArgumentException("The param '" + key + "' is not exist."); } private static void throwEmptyException(String key) { throw new IllegalArgumentException("The param '" + key + "' is empty."); } private static void throwNotListException(String key) { throw new IllegalArgumentException("The param '" + key + "' is not a json array."); } public static List checkListAndGet(Configuration param, String key, boolean isCheckEmpty) { List value = null; try { value = param.getList(key); } catch (ClassCastException e) { throwNotListException(key); } if (null == value) { throwNotExistException(key); } else if (isCheckEmpty && value.isEmpty()) { throwEmptyException(key); } return value; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ParamParser.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSCriticalException; import com.alicloud.openservices.tablestore.model.ColumnType; import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; import org.apache.commons.codec.binary.Base64; import java.util.ArrayList; import java.util.List; import java.util.Map; public class ParamParser { // ------------------------------------------------------------------------ // Range解析相关的逻辑 // ------------------------------------------------------------------------ private static PrimaryKeyValue parsePrimaryKeyValue(String type) { return parsePrimaryKeyValue(type, null); } private static PrimaryKeyValue parsePrimaryKeyValue(String type, String value) { if (type.equalsIgnoreCase(Constant.ValueType.INF_MIN)) { return PrimaryKeyValue.INF_MIN; } else if (type.equalsIgnoreCase(Constant.ValueType.INF_MAX)) { return PrimaryKeyValue.INF_MAX; } else { if (value != null) { if (type.equalsIgnoreCase(Constant.ValueType.STRING)) { return PrimaryKeyValue.fromString(value); } else if (type.equalsIgnoreCase(Constant.ValueType.INTEGER)) { return PrimaryKeyValue.fromLong(Long.valueOf(value)); } else if (type.equalsIgnoreCase(Constant.ValueType.BINARY)) { return PrimaryKeyValue.fromBinary(Base64.decodeBase64(value)); } else { throw new IllegalArgumentException("the column type only support :['INF_MIN', 'INF_MAX', 'string', 'int', 'binary']"); } } else { throw new IllegalArgumentException("the column is missing the field 'value', input 'type':" + type); } } } private static PrimaryKeyColumn parsePrimaryKeyColumn(Map item) { Object typeObj = item.get(Constant.ConfigKey.PrimaryKeyColumn.TYPE); Object valueObj = item.get(Constant.ConfigKey.PrimaryKeyColumn.VALUE); if (typeObj != null && valueObj != null) { if (typeObj instanceof String && valueObj instanceof String) { return new PrimaryKeyColumn( Constant.ConfigDefaultValue.DEFAULT_NAME, parsePrimaryKeyValue((String)typeObj, (String)valueObj) ); } else { throw new IllegalArgumentException( "the column's 'type' and 'value' must be string value, " + "but type of 'type' is :" + typeObj.getClass() + ", type of 'value' is :" + valueObj.getClass() ); } } else if (typeObj != null) { if (typeObj instanceof String) { return new PrimaryKeyColumn( Constant.ConfigDefaultValue.DEFAULT_NAME, parsePrimaryKeyValue((String)typeObj) ); } else { throw new IllegalArgumentException( "the column's 'type' must be string value, " + "but type of 'type' is :" + typeObj.getClass() ); } } else { throw new IllegalArgumentException("the column must include 'type' and 'value'."); } } @SuppressWarnings("unchecked") public static List parsePrimaryKeyColumnArray(Object arrayObj) throws OTSCriticalException { try { List columns = new ArrayList(); if (arrayObj instanceof List) { List array = (List) arrayObj; for (Object o : array) { if (o instanceof Map) { Map column = (Map) o; columns.add(parsePrimaryKeyColumn(column)); } else { throw new IllegalArgumentException("input primary key column must be map object, but input type:" + o.getClass()); } } } else { throw new IllegalArgumentException("input 'begin','end','split' must be list object, but input type:" + arrayObj.getClass()); } return columns; } catch (RuntimeException e) { // 因为基础模块本身可能抛出一些错误,为了方便定位具体的出错位置,在此把Range加入到Error Message中 throw new OTSCriticalException("Parse 'range' fail, " + e.getMessage(), e); } } // ------------------------------------------------------------------------ // Column解析相关的逻辑 // ------------------------------------------------------------------------ private static OTSColumn parseOTSColumn(Object obj) { if (obj instanceof String) { return OTSColumn.fromNormalColumn((String)obj); } else { throw new IllegalArgumentException("the 'name' must be string, but input:" + obj.getClass()); } } private static OTSColumn parseOTSColumn(Object typeObj, Object valueObj) { if (typeObj instanceof String && valueObj instanceof String) { String type = (String)typeObj; String value = (String)valueObj; if (type.equalsIgnoreCase(Constant.ValueType.STRING)) { return OTSColumn.fromConstStringColumn(value); } else if (type.equalsIgnoreCase(Constant.ValueType.INTEGER)) { return OTSColumn.fromConstIntegerColumn(Long.valueOf(value)); } else if (type.equalsIgnoreCase(Constant.ValueType.DOUBLE)) { return OTSColumn.fromConstDoubleColumn(Double.valueOf(value)); } else if (type.equalsIgnoreCase(Constant.ValueType.BOOLEAN)) { return OTSColumn.fromConstBoolColumn(Boolean.valueOf(value)); } else if (type.equalsIgnoreCase(Constant.ValueType.BINARY)) { return OTSColumn.fromConstBytesColumn(Base64.decodeBase64(value)); } else { throw new IllegalArgumentException("the const column type only support :['string', 'int', 'double', 'bool', 'binary']"); } } else { throw new IllegalArgumentException("the 'type' and 'value' must be string, but 'type''s type:" + typeObj.getClass() + " 'value''s type:" + valueObj.getClass()); } } private static OTSColumn parseOTSColumn(Map column) { Object typeObj = column.get(Constant.ConfigKey.Column.TYPE); Object valueObj = column.get(Constant.ConfigKey.Column.VALUE); Object nameObj = column.get(Constant.ConfigKey.Column.NAME); if (nameObj != null) { return parseOTSColumn(nameObj); } else if (typeObj != null && valueObj != null) { return parseOTSColumn(typeObj, valueObj); } else { throw new IllegalArgumentException("the item of column format support '{\"name\":\"\"}' or '{\"type\":\"\", \"value\":\"\"}'."); } } @SuppressWarnings("unchecked") public static List parseOTSColumnArray(List value) throws OTSCriticalException { try { List result = new ArrayList(); for (Object item:value) { if (item instanceof Map){ Map column = (Map) item; result.add(ParamParser.parseOTSColumn(column)); } else { throw new IllegalArgumentException("the item of column must be map object, but input: " + item.getClass()); } } return result; } catch (RuntimeException e) { // 因为基础模块本身可能抛出一些错误,为了方便定位具体的出错位置,在此把Column加入到Error Message中 throw new OTSCriticalException("Parse 'column' fail. " + e.getMessage(), e); } } private static ColumnType parseTimeseriesColumnType(Map column) { Object typeObj = column.getOrDefault(Constant.ConfigKey.Column.TYPE, ""); if (typeObj instanceof String) { String type = (String)typeObj; if (type.equalsIgnoreCase(Constant.ValueType.STRING)) { return ColumnType.STRING; } else if (type.equalsIgnoreCase(Constant.ValueType.INTEGER)) { return ColumnType.INTEGER; } else if (type.equalsIgnoreCase(Constant.ValueType.DOUBLE)) { return ColumnType.DOUBLE; } else if (type.equalsIgnoreCase(Constant.ValueType.BOOLEAN)) { return ColumnType.BOOLEAN; } else if (type.equalsIgnoreCase(Constant.ValueType.BINARY)) { return ColumnType.BINARY; } else if (type.length() == 0){ return ColumnType.STRING; }else { throw new IllegalArgumentException("the timeseries column type only support :['string', 'int', 'double', 'bool', 'binary']"); } } else { throw new IllegalArgumentException("the 'type' must be string, but 'type''s type:" + typeObj.getClass()); } } public static List parseColumnTypeArray(List value) throws OTSCriticalException { try { List result = new ArrayList(); for (Object item:value) { if (item instanceof Map){ Map column = (Map) item; result.add(ParamParser.parseTimeseriesColumnType(column)); } else { throw new IllegalArgumentException("the item of column must be map object, but input: " + item.getClass()); } } return result; } catch (RuntimeException e) { throw new OTSCriticalException("Parse 'timeseries column type' fail. " + e.getMessage(), e); } } private static Boolean parseTimeseriesColumnIsTag(Map column) { Object isTagParameter = column.getOrDefault(Constant.ConfigKey.Column.IS_TAG, ""); if (isTagParameter instanceof String) { String isTag = (String)isTagParameter; return Boolean.valueOf(isTag); } else { throw new IllegalArgumentException("the 'isTag' must be string, but 'isTag''s type:" + isTagParameter.getClass()); } } public static List parseColumnIsTagArray(List value) throws OTSCriticalException { try { List result = new ArrayList(); for (Object item:value) { if (item instanceof Map){ Map column = (Map) item; result.add(ParamParser.parseTimeseriesColumnIsTag(column)); } else { throw new IllegalArgumentException("the item of column must be map object, but input: " + item.getClass()); } } return result; } catch (RuntimeException e) { throw new OTSCriticalException("Parse 'timeseries column isTag' fail. " + e.getMessage(), e); } } // ------------------------------------------------------------------------ // TimeRange解析相关的逻辑 // ------------------------------------------------------------------------ public static long parseTimeRangeItem(Object obj, String key) { if (obj instanceof Integer) { return (Integer)obj; } else if (obj instanceof Long) { return (Long)obj; } else { throw new IllegalArgumentException("the '"+ key +"' must be int, but input:" + obj.getClass()); } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RangeSplit.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.plugin.reader.otsreader.model.OTSPrimaryKeyColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSRange; import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; import com.alicloud.openservices.tablestore.model.PrimaryKeyType; import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; import com.alicloud.openservices.tablestore.model.TableMeta; import java.math.BigInteger; import java.util.*; /** * 主要提供对范围的解析 */ public class RangeSplit { private static String bigIntegerToString(BigInteger baseValue, BigInteger bValue, BigInteger multi, int lenOfString) { BigInteger tmp = bValue; StringBuilder sb = new StringBuilder(); for (int tmpLength = 0; tmpLength < lenOfString; tmpLength++) { sb.insert(0, (char) (baseValue.add(tmp.remainder(multi)).intValue())); tmp = tmp.divide(multi); } return sb.toString(); } /** * 切分String的Unicode Unit * * 注意:该方法只支持begin小于end * * @param begin * @param end * @param count * @return */ private static List splitCodePoint(int begin, int end, int count) { if (begin >= end) { throw new IllegalArgumentException("Only support begin < end."); } List results = new ArrayList(); BigInteger beginBig = BigInteger.valueOf(begin); BigInteger endBig = BigInteger.valueOf(end); BigInteger countBig = BigInteger.valueOf(count); BigInteger multi = endBig.subtract(beginBig).add(BigInteger.ONE); BigInteger range = endBig.subtract(beginBig); BigInteger interval = BigInteger.ZERO; int length = 1; BigInteger tmpBegin = BigInteger.ZERO; BigInteger tmpEnd = endBig.subtract(beginBig); // 扩大之后的数值 BigInteger realBegin = tmpBegin; BigInteger realEnd = tmpEnd; while (range.compareTo(countBig) < 0) { // 不够切分 realEnd = realEnd.multiply(multi).add(tmpEnd); range = realEnd.subtract(realBegin); length++; } interval = range.divide(countBig); BigInteger cur = realBegin; for (int i = 0; i < (count - 1); i++) { results.add(bigIntegerToString(beginBig, cur, multi, length)); cur = cur.add(interval); } results.add(bigIntegerToString(beginBig, realEnd, multi, length)); return results; } /** * 注意: 当begin和end相等时,函数将返回空的List * * @param begin * @param end * @param count * @return */ public static List splitStringRange(String begin, String end, int count) { if (count <= 1) { throw new IllegalArgumentException("Input count <= 1 ."); } List results = new ArrayList(); int beginValue = 0; if (!begin.isEmpty()) { beginValue = begin.codePointAt(0); } int endValue = 0; if (!end.isEmpty()) { endValue = end.codePointAt(0); } int cmp = beginValue - endValue; if (cmp == 0) { return results; } results.add(begin); Comparator comparator = new Comparator(){ public int compare(String arg0, String arg1) { return arg0.compareTo(arg1); } }; List tmp = null; if (cmp > 0) { // 如果是逆序,则 reverse Comparator comparator = Collections.reverseOrder(comparator); tmp = splitCodePoint(endValue, beginValue, count); } else { // 正序 tmp = splitCodePoint(beginValue, endValue, count); } Collections.sort(tmp, comparator); for (String value : tmp) { if (comparator.compare(value, begin) > 0 && comparator.compare(value, end) < 0) { results.add(value); } } results.add(end); return results; } /** * begin 一定要小于 end * @param bigBegin * @param bigEnd * @param bigCount * @return */ private static List splitIntegerRange(BigInteger bigBegin, BigInteger bigEnd, BigInteger bigCount) { List is = new ArrayList(); BigInteger interval = (bigEnd.subtract(bigBegin)).divide(bigCount); BigInteger cur = bigBegin; BigInteger i = BigInteger.ZERO; while (cur.compareTo(bigEnd) < 0 && i.compareTo(bigCount) < 0) { is.add(cur.longValue()); cur = cur.add(interval); i = i.add(BigInteger.ONE); } is.add(bigEnd.longValue()); return is; } /** * 切分数值类型 注意: 当begin和end相等时,函数将返回空的List * * @param begin * @param end * @param count * @return */ public static List splitIntegerRange(long begin, long end, int count) { if (count <= 1) { throw new IllegalArgumentException("Input count <= 1 ."); } List is = new ArrayList(); BigInteger bigBegin = BigInteger.valueOf(begin); BigInteger bigEnd = BigInteger.valueOf(end); BigInteger bigCount = BigInteger.valueOf(count); BigInteger abs = (bigEnd.subtract(bigBegin)).abs(); if (abs.compareTo(BigInteger.ZERO) == 0) { // partition key 相等的情况 return is; } if (bigCount.compareTo(abs) > 0) { bigCount = abs; } if (bigEnd.subtract(bigBegin).compareTo(BigInteger.ZERO) > 0) { // 正向 return splitIntegerRange(bigBegin, bigEnd, bigCount); } else { // 逆向 List tmp = splitIntegerRange(bigEnd, bigBegin, bigCount); Comparator comparator = new Comparator(){ public int compare(Long arg0, Long arg1) { return arg0.compareTo(arg1); } }; Collections.sort(tmp,Collections.reverseOrder(comparator)); return tmp; } } public static List splitRangeByPrimaryKeyType( PrimaryKeyType type, PrimaryKeyValue begin, PrimaryKeyValue end, int count) { List result = new ArrayList(); if (type == PrimaryKeyType.STRING) { List points = splitStringRange(begin.asString(), end.asString(), count); for (String s : points) { result.add(PrimaryKeyValue.fromString(s)); } } else { List points = splitIntegerRange(begin.asLong(), end.asLong(), count); for (Long l : points) { result.add(PrimaryKeyValue.fromLong(l)); } } return result; } public static List rangeSplitByCount(TableMeta meta, List begin, List end, int count) { List results = new ArrayList(); OTSPrimaryKeyColumn partitionKey = Common.getPartitionKey(meta); Map beginMap = new HashMap<>(); Map endMap = new HashMap<>(); for(PrimaryKeyColumn primaryKeyColumn : begin){ beginMap.put(primaryKeyColumn.getName(), primaryKeyColumn.getValue()); } for(PrimaryKeyColumn primaryKeyColumn : end){ endMap.put(primaryKeyColumn.getName(), primaryKeyColumn.getValue()); } PrimaryKeyValue beginPartitionKey = beginMap.get( partitionKey.getName()); PrimaryKeyValue endPartitionKey = endMap.get( partitionKey.getName()); // 第一,先对PartitionKey列进行拆分 List ranges = RangeSplit.splitRangeByPrimaryKeyType( partitionKey.getType(true), beginPartitionKey, endPartitionKey, count); if (ranges.isEmpty()) { return results; } int size = ranges.size(); for (int i = 0; i < size - 1; i++) { List bPk = new ArrayList<>(); List ePk = new ArrayList<>(); bPk.add(new PrimaryKeyColumn(partitionKey.getName(), ranges.get(i))); ePk.add(new PrimaryKeyColumn(partitionKey.getName(), ranges.get(i + 1))); OTSRange range = new OTSRange(); range.setBegin(bPk); range.setEnd(ePk); results.add(range); } // 第二,填充非PartitionKey的ParimaryKey列 // 注意:在填充过程中,需要使用用户给定的Begin和End来替换切分出来的第一个Range // 的Begin和最后一个Range的End List keys = new ArrayList(meta.getPrimaryKeyMap().size()); keys.addAll(meta.getPrimaryKeyMap().keySet()); for (int i = 0; i < results.size(); i++) { for (int j = 1; j < keys.size(); j++) { OTSRange c = results.get(i); List beginPK = c.getBegin(); List endPK = c.getEnd(); String key = keys.get(j); if (i == 0) { // 第一行 beginPK.add(new PrimaryKeyColumn(key, beginMap.get(key))); endPK.add(new PrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN)); } else if (i == results.size() - 1) {// 最后一行 beginPK.add(new PrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN)); endPK.add(new PrimaryKeyColumn(key, endMap.get(key))); } else { beginPK.add(new PrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN)); endPK.add(new PrimaryKeyColumn(key, PrimaryKeyValue.INF_MIN)); } } } return results; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/ReaderModelParser.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import java.util.ArrayList; import java.util.List; import java.util.Map; import org.apache.commons.codec.binary.Base64; import com.alibaba.datax.plugin.reader.otsreader.model.OTSColumn; import com.alibaba.datax.plugin.reader.otsreader.model.OTSConst; import com.aliyun.openservices.ots.model.PrimaryKeyValue; /** * 主要对OTS PrimaryKey,OTSColumn的解析 */ public class ReaderModelParser { private static long getLongValue(String value) { try { return Long.parseLong(value); } catch (NumberFormatException e) { throw new IllegalArgumentException("Can not parse the value '"+ value +"' to Int."); } } private static double getDoubleValue(String value) { try { return Double.parseDouble(value); } catch (NumberFormatException e) { throw new IllegalArgumentException("Can not parse the value '"+ value +"' to Double."); } } private static boolean getBoolValue(String value) { if (!(value.equalsIgnoreCase("true") || value.equalsIgnoreCase("false"))) { throw new IllegalArgumentException("Can not parse the value '"+ value +"' to Bool."); } return Boolean.parseBoolean(value); } public static OTSColumn parseConstColumn(String type, String value) { if (type.equalsIgnoreCase(OTSConst.TYPE_STRING)) { return OTSColumn.fromConstStringColumn(value); } else if (type.equalsIgnoreCase(OTSConst.TYPE_INTEGER)) { return OTSColumn.fromConstIntegerColumn(getLongValue(value)); } else if (type.equalsIgnoreCase(OTSConst.TYPE_DOUBLE)) { return OTSColumn.fromConstDoubleColumn(getDoubleValue(value)); } else if (type.equalsIgnoreCase(OTSConst.TYPE_BOOLEAN)) { return OTSColumn.fromConstBoolColumn(getBoolValue(value)); } else if (type.equalsIgnoreCase(OTSConst.TYPE_BINARY)) { return OTSColumn.fromConstBytesColumn(Base64.decodeBase64(value)); } else { throw new IllegalArgumentException("Invalid 'column', Can not parse map to 'OTSColumn', input type:" + type + ", value:" + value + "."); } } public static OTSColumn parseOTSColumn(Map item) { if (item.containsKey(OTSConst.NAME)) { Object name = item.get(OTSConst.NAME); if (name instanceof String) { String nameStr = (String) name; return OTSColumn.fromNormalColumn(nameStr); } else { throw new IllegalArgumentException("Invalid 'column', Can not parse map to 'OTSColumn', the value is not a string."); } } else if (item.containsKey(OTSConst.TYPE) && item.containsKey(OTSConst.VALUE) && item.size() == 2) { Object type = item.get(OTSConst.TYPE); Object value = item.get(OTSConst.VALUE); if (type instanceof String && value instanceof String) { String typeStr = (String) type; String valueStr = (String) value; return parseConstColumn(typeStr, valueStr); } else { throw new IllegalArgumentException("Invalid 'column', Can not parse map to 'OTSColumn', the value is not a string."); } } else { throw new IllegalArgumentException( "Invalid 'column', Can not parse map to 'OTSColumn', valid format: '{\"name\":\"\"}' or '{\"type\":\"\", \"value\":\"\"}'."); } } private static void checkIsAllConstColumn(List columns) { for (OTSColumn c : columns) { if (c.getColumnType() == OTSColumn.OTSColumnType.NORMAL) { return ; } } throw new IllegalArgumentException("Invalid 'column', 'column' should include at least one or more Normal Column."); } public static List parseOTSColumnList(List input) { if (input.isEmpty()) { throw new IllegalArgumentException("Input count of 'column' is zero."); } List columns = new ArrayList(input.size()); for (Object item:input) { if (item instanceof Map){ @SuppressWarnings("unchecked") Map column = (Map) item; columns.add(parseOTSColumn(column)); } else { throw new IllegalArgumentException("Invalid 'column', Can not parse Object to 'OTSColumn', item of list is not a map."); } } checkIsAllConstColumn(columns); return columns; } public static PrimaryKeyValue parsePrimaryKeyValue(String type, String value) { if (type.equalsIgnoreCase(OTSConst.TYPE_STRING)) { return PrimaryKeyValue.fromString(value); } else if (type.equalsIgnoreCase(OTSConst.TYPE_INTEGER)) { return PrimaryKeyValue.fromLong(getLongValue(value)); } else if (type.equalsIgnoreCase(OTSConst.TYPE_INF_MIN)) { throw new IllegalArgumentException("Format error, the " + OTSConst.TYPE_INF_MIN + " only support {\"type\":\"" + OTSConst.TYPE_INF_MIN + "\"}."); } else if (type.equalsIgnoreCase(OTSConst.TYPE_INF_MAX)) { throw new IllegalArgumentException("Format error, the " + OTSConst.TYPE_INF_MAX + " only support {\"type\":\"" + OTSConst.TYPE_INF_MAX + "\"}."); } else { throw new IllegalArgumentException("Not supprot parsing type: "+ type +" for PrimaryKeyValue."); } } public static PrimaryKeyValue parsePrimaryKeyValue(String type) { if (type.equalsIgnoreCase(OTSConst.TYPE_INF_MIN)) { return PrimaryKeyValue.INF_MIN; } else if (type.equalsIgnoreCase(OTSConst.TYPE_INF_MAX)) { return PrimaryKeyValue.INF_MAX; } else { throw new IllegalArgumentException("Not supprot parsing type: "+ type +" for PrimaryKeyValue."); } } public static PrimaryKeyValue parsePrimaryKeyValue(Map item) { if (item.containsKey(OTSConst.TYPE) && item.containsKey(OTSConst.VALUE) && item.size() == 2) { Object type = item.get(OTSConst.TYPE); Object value = item.get(OTSConst.VALUE); if (type instanceof String && value instanceof String) { String typeStr = (String) type; String valueStr = (String) value; return parsePrimaryKeyValue(typeStr, valueStr); } else { throw new IllegalArgumentException("The 'type' and 'value‘ only support string."); } } else if (item.containsKey(OTSConst.TYPE) && item.size() == 1) { Object type = item.get(OTSConst.TYPE); if (type instanceof String) { String typeStr = (String) type; return parsePrimaryKeyValue(typeStr); } else { throw new IllegalArgumentException("The 'type' only support string."); } } else { throw new IllegalArgumentException("The map must consist of 'type' and 'value'."); } } public static List parsePrimaryKey(List input) { if (null == input) { return null; } List columns = new ArrayList(input.size()); for (Object item:input) { if (item instanceof Map) { @SuppressWarnings("unchecked") Map column = (Map) item; columns.add(parsePrimaryKeyValue(column)); } else { throw new IllegalArgumentException("Can not parse Object to 'PrimaryKeyValue', item of list is not a map."); } } return columns; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelper.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.plugin.reader.otsreader.model.OTSErrorCode; import com.alicloud.openservices.tablestore.ClientException; import com.alicloud.openservices.tablestore.TableStoreException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.HashSet; import java.util.Set; import java.util.concurrent.Callable; public class RetryHelper { private static final Logger LOG = LoggerFactory.getLogger(RetryHelper.class); private static final Set noRetryErrorCode = prepareNoRetryErrorCode(); public static V executeWithRetry(Callable callable, int maxRetryTimes, int sleepInMilliSecond) throws Exception { int retryTimes = 0; while (true){ Thread.sleep(getDelaySendMillinSeconds(retryTimes, sleepInMilliSecond)); try { return callable.call(); } catch (Exception e) { LOG.warn("Call callable fail, {}", e.getMessage()); if (!canRetry(e)){ LOG.error("Can not retry for Exception.", e); throw e; } else if (retryTimes >= maxRetryTimes) { LOG.error("Retry times more than limition. maxRetryTimes : {}", maxRetryTimes); throw e; } retryTimes++; LOG.warn("Retry time : {}", retryTimes); } } } private static Set prepareNoRetryErrorCode() { Set pool = new HashSet(); pool.add(OTSErrorCode.AUTHORIZATION_FAILURE); pool.add(OTSErrorCode.INVALID_PARAMETER); pool.add(OTSErrorCode.REQUEST_TOO_LARGE); pool.add(OTSErrorCode.OBJECT_NOT_EXIST); pool.add(OTSErrorCode.OBJECT_ALREADY_EXIST); pool.add(OTSErrorCode.INVALID_PK); pool.add(OTSErrorCode.OUT_OF_COLUMN_COUNT_LIMIT); pool.add(OTSErrorCode.OUT_OF_ROW_SIZE_LIMIT); pool.add(OTSErrorCode.CONDITION_CHECK_FAIL); return pool; } public static boolean canRetry(String otsErrorCode) { if (noRetryErrorCode.contains(otsErrorCode)) { return false; } else { return true; } } public static boolean canRetry(Exception exception) { TableStoreException e = null; if (exception instanceof TableStoreException) { e = (TableStoreException) exception; LOG.warn( "OTSException:ErrorCode:{}, ErrorMsg:{}, RequestId:{}", new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()} ); return canRetry(e.getErrorCode()); } else if (exception instanceof ClientException) { ClientException ce = (ClientException) exception; LOG.warn( "ClientException:{}", new Object[]{ce.getMessage()} ); return true; } else { return false; } } public static long getDelaySendMillinSeconds(int hadRetryTimes, int initSleepInMilliSecond) { if (hadRetryTimes <= 0) { return 0; } int sleepTime = initSleepInMilliSecond; for (int i = 1; i < hadRetryTimes; i++) { sleepTime += sleepTime; if (sleepTime > 30000) { sleepTime = 30000; break; } } return sleepTime; } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/RetryHelperOld.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import java.util.HashSet; import java.util.Set; import java.util.concurrent.Callable; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.aliyun.openservices.ots.ClientException; import com.aliyun.openservices.ots.OTSErrorCode; import com.aliyun.openservices.ots.OTSException; public class RetryHelperOld { private static final Logger LOG = LoggerFactory.getLogger(RetryHelperOld.class); private static final Set noRetryErrorCode = prepareNoRetryErrorCode(); public static V executeWithRetry(Callable callable, int maxRetryTimes, int sleepInMilliSecond) throws Exception { int retryTimes = 0; while (true){ Thread.sleep(CommonOld.getDelaySendMillinSeconds(retryTimes, sleepInMilliSecond)); try { return callable.call(); } catch (Exception e) { LOG.warn("Call callable fail, {}", e.getMessage()); if (!canRetry(e)){ LOG.error("Can not retry for Exception.", e); throw e; } else if (retryTimes >= maxRetryTimes) { LOG.error("Retry times more than limition. maxRetryTimes : {}", maxRetryTimes); throw e; } retryTimes++; LOG.warn("Retry time : {}", retryTimes); } } } private static Set prepareNoRetryErrorCode() { Set pool = new HashSet(); pool.add(OTSErrorCode.AUTHORIZATION_FAILURE); pool.add(OTSErrorCode.INVALID_PARAMETER); pool.add(OTSErrorCode.REQUEST_TOO_LARGE); pool.add(OTSErrorCode.OBJECT_NOT_EXIST); pool.add(OTSErrorCode.OBJECT_ALREADY_EXIST); pool.add(OTSErrorCode.INVALID_PK); pool.add(OTSErrorCode.OUT_OF_COLUMN_COUNT_LIMIT); pool.add(OTSErrorCode.OUT_OF_ROW_SIZE_LIMIT); pool.add(OTSErrorCode.CONDITION_CHECK_FAIL); return pool; } public static boolean canRetry(String otsErrorCode) { if (noRetryErrorCode.contains(otsErrorCode)) { return false; } else { return true; } } public static boolean canRetry(Exception exception) { OTSException e = null; if (exception instanceof OTSException) { e = (OTSException) exception; LOG.warn( "OTSException:ErrorCode:{}, ErrorMsg:{}, RequestId:{}", new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()} ); return canRetry(e.getErrorCode()); } else if (exception instanceof ClientException) { ClientException ce = (ClientException) exception; LOG.warn( "ClientException:{}, ErrorMsg:{}", new Object[]{ce.getErrorCode(), ce.getMessage()} ); return true; } else { return false; } } } ================================================ FILE: otsreader/src/main/java/com/alibaba/datax/plugin/reader/otsreader/utils/TranformHelper.java ================================================ package com.alibaba.datax.plugin.reader.otsreader.utils; import com.alibaba.datax.common.element.*; import com.alicloud.openservices.tablestore.model.PrimaryKeyColumn; public class TranformHelper { public static Column otsPrimaryKeyColumnToDataxColumn(PrimaryKeyColumn pkc) { switch (pkc.getValue().getType()) { case STRING:return new StringColumn(pkc.getValue().asString()); case INTEGER:return new LongColumn(pkc.getValue().asLong()); case BINARY:return new BytesColumn(pkc.getValue().asBinary()); default: throw new IllegalArgumentException("PrimaryKey unsuporrt tranform the type: " + pkc.getValue().getType() + "."); } } public static Column otsColumnToDataxColumn(com.alicloud.openservices.tablestore.model.Column c) { switch (c.getValue().getType()) { case STRING:return new StringColumn(c.getValue().asString()); case INTEGER:return new LongColumn(c.getValue().asLong()); case BINARY:return new BytesColumn(c.getValue().asBinary()); case BOOLEAN:return new BoolColumn(c.getValue().asBoolean()); case DOUBLE:return new DoubleColumn(c.getValue().asDouble()); default: throw new IllegalArgumentException("Column unsuporrt tranform the type: " + c.getValue().getType() + "."); } } public static Column otsColumnToDataxColumn(com.alicloud.openservices.tablestore.model.ColumnValue c) { switch (c.getType()) { case STRING:return new StringColumn(c.asString()); case INTEGER:return new LongColumn(c.asLong()); case BINARY:return new BytesColumn(c.asBinary()); case BOOLEAN:return new BoolColumn(c.asBoolean()); case DOUBLE:return new DoubleColumn(c.asDouble()); default: throw new IllegalArgumentException("Column unsuporrt tranform the type: " + c.getType() + "."); } } } ================================================ FILE: otsreader/src/main/resources/plugin.json ================================================ { "name": "otsreader", "class": "com.alibaba.datax.plugin.reader.otsreader.OtsReader", "description": "", "developer": "alibaba" } ================================================ FILE: otsreader/src/main/resources/plugin_job_template.json ================================================ { "name": "otsreader", "parameter": { "endpoint":"", "accessId":"", "accessKey":"", "instanceName":"", "column":[], "range":{ "begin":[], "end":[] } } } ================================================ FILE: otsstreamreader/README.md ================================================ ## TableStore增量数据导出通道:TableStoreStreamReader 本文为您介绍OTSStream Reader支持的数据类型、读取方式、字段映射和数据源等参数及配置示例。 ## 列模式 ### 背景信息 OTSStream Reader插件主要用于导出Table Store的增量数据。您可以将增量数据看作操作日志,除数据本身外还附有操作信息。 与全量导出插件不同,增量导出插件只有多版本模式,且不支持指定列。使用插件前,您必须确保表上已经开启Stream功能。您可以在建表时指定开启,也可以使用SDK的UpdateTable接口开启。 开启Stream的方法,如下所示。 ```java SyncClient client = new SyncClient("", "", "", ""); #建表的时候开启: CreateTableRequest createTableRequest = new CreateTableRequest(tableMeta); createTableRequest.setStreamSpecification(new StreamSpecification(true, 24)); // 24代表增量数据保留24小时。 client.createTable(createTableRequest); #如果建表时未开启,您可以通过UpdateTable开启: UpdateTableRequest updateTableRequest = new UpdateTableRequest("tableName"); updateTableRequest.setStreamSpecification(new StreamSpecification(true, 24)); client.updateTable(updateTableRequest); ``` 您使用SDK的UpdateTable功能,指定开启Stream并设置过期时间,即开启了Table Store增量数据导出功能。开启后,Table Store服务端就会将您的操作日志额外保存起来,每个分区有一个有序的操作日志队列,每条操作日志会在一定时间后被垃圾回收,该时间即为您指定的过期时间。 Table Store的SDK提供了几个Stream相关的API用于读取这部分的操作日志,增量插件也是通过Table Store SDK的接口获取到增量数据,默认情况下会将增量数据转化为多个6元组的形式(pk、colName、version、colValue、opType和sequenceInfo)导入至MaxCompute中。 ### 列模式 在Table Store多版本模型下,表中的数据组织为行>列>版本三级的模式, 一行可以有任意列,列名并不是固定的,每一列可以含有多个版本,每个版本都有一个特定的时间戳(版本号)。 您可以通过Table Store的API进行一系列读写操作,Table Store通过记录您最近对表的一系列写操作(或数据更改操作)来实现记录增量数据的目的,所以您也可以把增量数据看作一批操作记录。 Table Store支持**PutRow**、**UpdateRow**和**DeleteRow**操作: - **PutRow**:写入一行,如果该行已存在即覆盖该行。 - **UpdateRow**:更新一行,不更改原行的其它数据。更新包括新增或覆盖(如果对应列的对应版本已存在)一些列值、删除某一列的全部版本、删除某一列的某个版本。 - **DeleteRow**:删除一行。 Table Store会根据每种操作生成对应的增量数据记录,Reader插件会读出这些记录,并导出为数据集成的数据格式。 同时,由于Table Store具有动态列、多版本的特性,所以Reader插件导出的一行不对应Table Store中的一行,而是对应Table Store中的一列的一个版本。即Table Store中的一行可能会导出很多行,每行包含主键值、该列的列名、该列下该版本的时间戳(版本号)、该版本的值、操作类型。如果设置isExportSequenceInfo为true,还会包括时序信息。 转换为数据集成的数据格式后,定义了以下四种操作类型: - **U(UPDATE)**:写入一列的一个版本。 - **DO(DELETE_ONE_VERSION)**:删除某一列的某个版本。 - **DA(DELETE_ALL_VERSION)**:删除某一列的全部版本,此时需要根据主键和列名,删除对应列的全部版本。 - **DR(DELETE_ROW)**:删除某一行,此时需要根据主键,删除该行数据。 假设该表有两个主键列,主键列名分别为pkName1, pkName2,示例如下。 | **pkName1** | **pkName2** | **columnName** | **timestamp** | **columnValue** | **opType** | | --- | --- | --- | --- | --- | --- | | pk1_V1 | pk2_V1 | col_a | 1441803688001 | col_val1 | U | | pk1_V1 | pk2_V1 | col_a | 1441803688002 | col_val2 | U | | pk1_V1 | pk2_V1 | col_b | 1441803688003 | col_val3 | U | | pk1_V2 | pk2_V2 | col_a | 1441803688000 | — | DO | | pk1_V2 | pk2_V2 | col_b | — | — | DA | | pk1_V3 | pk2_V3 | — | — | — | DR | | pk1_V3 | pk2_V3 | col_a | 1441803688005 | col_val1 | U | 假设导出的数据如上,共7行,对应Table Store表内的3行,主键分别是(pk1_V1,pk2_V1),(pk1_V2, pk2_V2),(pk1_V3, pk2_V3): - 对于主键为(pk1_V1,pk2_V1)的一行,包括写入col_a列的两个版本和col_b列的一个版本等操作。 - 对于主键为(pk1_V2,pk2_V2)的一行,包括删除col_a列的一个版本和删除col_b列的全部版本等操作。 - 对于主键为(pk1_V3,pk2_V3)的一行,包括删除整行和写入col_a列的一个版本等操作。 ### 行模式 #### 宽行表 您可以通过行模式导出数据,该模式将用户每次更新的记录,抽取成行的形式导出,需要设置mode属性并配置列名。 ```json "parameter": { #parameter中配置下面三项配置(例如datasource、table等其它配置项照常配置)。 "mode": "single_version_and_update_only", # 配置导出模式。 "column":[ #按照需求添加需要导出TableStore中的列,您可以自定义设置配置个数。 { "name": "uid" #列名示例,可以是主键或属性列。 }, { "name": "name" #列名示例,可以是主键或属性列。 }, ], "isExportSequenceInfo": false, #single_version_and_update_only模式下只能是false。 } ``` #### 时序表 `otsstreamreader`支持导出时序表中的增量数据,当表为时序表时,需要配置的信息如下: ```json "parameter": { #parameter中配置下面四项配置(例如datasource、table等其它配置项照常配置)。 "mode": "single_version_and_update_only", # 配置导出模式。 "isTimeseriesTable":"true", # 配置导出为时序表。 "column":[ #按照需求添加需要导出TableStore中的列,您可以自定义设置配置个数。 { "name": "_m_name" #度量名称字段。 }, { "name": "_data_source" #数据源字段。 }, { "name": "_tags" #标签字段,将tags转换为string类型。 }, { "name": "tag1_1", #标签内部字段键名称。 "is_timeseries_tag":"true" #表明改字段为tags内部字段。 }, { "name": "time" #时间戳字段。 }, { "name": "name" #属性列名称。 }, ], "isExportSequenceInfo": false, #single_version_and_update_only模式下只能是false。 } ``` 行模式导出的数据更接近于原始的行,易于后续处理,但需要注意以下问题: - 每次导出的行是从用户每次更新的记录中抽取,每一行数据与用户的写入或更新操作一一对应。如果用户存在单独更新某些列的行为,则会出现有一些记录只有被更新的部分列,其它列为空的情况。 - 行模式不会导出数据的版本号(即每列的时间戳),也无法进行删除操作。 ### 数据类型转换列表 目前OTSStream Reader支持所有的Table Store类型,其针对Table Store类型的转换列表,如下所示。 | **类型分类** | **OTSStream数据类型** | | --- | --- | | 整数类 | INTEGER | | 浮点类 | DOUBLE | | 字符串类 | STRING | | 布尔类 | BOOLEAN | | 二进制类 | BINARY | ### 参数说明 | **参数** | **描述** | **是否必选** | **默认值** | | --- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| --- |---------| | **dataSource** | 数据源名称,脚本模式支持添加数据源,该配置项填写的内容必须与添加的数据源名称保持一致。 | 是 | 无 | | **dataTable** | 导出增量数据的表的名称。该表需要开启Stream,可以在建表时开启,或者使用UpdateTable接口开启。 | 是 | 无 | | **statusTable** | Reader插件用于记录状态的表的名称,这些状态可用于减少对非目标范围内的数据的扫描,从而加快导出速度。statusTable是Reader用于保存状态的表,如果该表不存在,Reader会自动创建该表。一次离线导出任务完成后,您无需删除该表,该表中记录的状态可用于下次导出任务中:
  • 您无需创建该表,只需要给出一个表名。Reader插件会尝试在您的instance下创建该表,如果该表不存在即创建新表。如果该表已存在,会判断该表的Meta是否与期望一致,如果不一致会抛出异常。
  • 在一次导出完成之后,您无需删除该表,该表的状态可以用于下次的导出任务。
  • 该表会开启TTL,数据自动过期,会认为其数据量很小。
  • 针对同一个instance下的多个不同的dataTable的Reader配置,可以使用同一个statusTable,记录的状态信息互不影响。您配置一个类似**TableStoreStreamReaderStatusTable**的名称即可,请注意不要与业务相关的表重名。 | 是 | 无 | | **startTimestampMillis** | 增量数据的时间范围(左闭右开)的左边界,单位为毫秒:
  • Reader插件会从statusTable中找对应**startTimestampMillis**的位点,从该点开始读取开始导出数据。
  • 如果statusTable中找不到对应的位点,则从系统保留的增量数据的第一条开始读取,并跳过写入时间小于**startTimestampMillis**的数据。 | 否 | 无 | | **endTimestampMillis** | 增量数据的时间范围(左闭右开)的右边界,单位为毫秒:
  • Reader插件从**startTimestampMillis**位置开始导出数据后,当遇到第一条时间戳大于等于**endTimestampMillis**的数据时,结束导出数据,导出完成。
  • 当读取完当前全部的增量数据时,即使未达到**endTimestampMillis**,也会结束读取。 | 否 | 无 | | **date** | 日期格式为**yyyyMMdd**,例如20151111,表示导出该日的数据。如果没有指定**date**,则需要指定**startTimestampMillis**和**endTimestampMillis**或**startTimeString**和**endTimeString**,反之也成立。例如,采云间调度仅支持天级别,所以提供该配置,作用与**startTimestampMillis**和**endTimestampMillis**或**startTimeString**和**endTimeString**类似。 | 否 | 无 | | **isExportSequenceInfo** | 是否导出时序信息,时序信息包含了数据的写入时间等。默认该值为false,即不导出。 | 否 | false | | **maxRetries** | 从TableStore中读增量数据时,每次请求的最大重试次数,默认为30次。重试之间有间隔,重试30次的总时间约为5分钟,通常无需更改。 | 否 | 30 | | **startTimeString** | 任务的开始时间,即增量数据的时间范围(左闭右开)的左边界,格式为**yyyymmddhh24miss**,单位为秒。 | 否 | 无 | | **endTimeString** | 任务的结束时间,即增量数据的时间范围(左闭右开)的右边界,格式为**yyyymmddhh24miss**,单位为秒。 | 否 | 无 | | **enableSeekIterator** | Reader插件需要先确定增量位点,然后再拉取数据,如果是经常运行的任务,插件会根据之前扫描的位点来确定位置。如果之前没运行过这个插件,将会从增量开始位置(默认增量保留7天,即7天前)开始扫描,因此当还没有扫描到设置的开始时间之后的数据时,会存在开始一段时间没有数据导出的情况,您可以在reader的配置参数里增加** "enableSeekIterator": true**的配置,帮助您加快位点定位。 | 否 | false | | **mode** | 导出模式,设置为**single_version_and_update_only**时为行模式,默认不设置为列模式。 | 否 | 无 | | **isTimeseriesTable** | 是否为时序表,只有在行模式,即**mode**为**single_version_and_update_only**时配置生效。 | 否 | false | ================================================ FILE: otsstreamreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT com.alibaba.datax otsstreamreader 0.0.1-SNAPSHOT com.aliyun.openservices tablestore-streamclient 1.0.0 com.aliyun.openservices tablestore com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba fastjson 1.2.83_noneautotype compile com.aliyun.openservices tablestore 5.13.12 log4j-core org.apache.logging.log4j com.google.code.gson gson 2.2.4 src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single org.apache.maven.plugins maven-surefire-plugin 2.5 **/unittest/*.java **/functiontest/*.java ================================================ FILE: otsstreamreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin/reader/otsstreamreader target/ otsstreamreader-0.0.1-SNAPSHOT.jar plugin/reader/otsstreamreader false plugin/reader/otsstreamreader/libs runtime ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_en_US.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSReaderError.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal; import com.alibaba.datax.common.spi.ErrorCode; public class OTSReaderError implements ErrorCode { private String code; private String description; public final static OTSReaderError ERROR = new OTSReaderError("OTSStreamReaderError", "OTS Stream Reader Error"); public final static OTSReaderError INVALID_PARAM = new OTSReaderError( "OTSStreamReaderInvalidParameter", "OTS Stream Reader Invalid Parameter"); public OTSReaderError(String code, String description) { this.code = code; this.description = description; } public String getCode() { return this.code; } public String getDescription() { return this.description; } public String toString() { return "[ code:" + this.code + ", message" + this.description + "]"; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReader.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.core.CheckpointTimeTracker; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.OTSStreamJobShard; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.GsonParser; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.OTSHelper; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.OTSStreamJobShardUtil; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.TableStoreException; import com.alicloud.openservices.tablestore.model.StreamShard; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.concurrent.Callable; import java.util.concurrent.ConcurrentSkipListSet; import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.*; public class OTSStreamReader { public static class Job extends Reader.Job { private OTSStreamReaderMasterProxy proxy = new OTSStreamReaderMasterProxy(); @Override public List split(int adviceNumber) { return proxy.split(adviceNumber); } public void init() { try { OTSStreamReaderConfig config = OTSStreamReaderConfig.load(getPluginJobConf()); proxy.init(config); } catch (TableStoreException ex) { throw DataXException.asDataXException(new OTSReaderError(ex.getErrorCode(), "OTS ERROR"), ex.toString(), ex); } catch (Exception ex) { throw DataXException.asDataXException(OTSReaderError.ERROR, ex.toString(), ex); } } public void destroy() { this.proxy.close(); } } public static class Task extends Reader.Task { private OTSStreamReaderSlaveProxy proxy = new OTSStreamReaderSlaveProxy(); @Override public void init() { try { OTSStreamReaderConfig config = GsonParser.jsonToConfig( (String) this.getPluginJobConf().get(OTSStreamReaderConstants.CONF)); List ownedShards = GsonParser.jsonToList( (String) this.getPluginJobConf().get(OTSStreamReaderConstants.OWNED_SHARDS)); boolean confSimplifyEnable = this.getPluginJobConf().getBool(CONF_SIMPLIFY_ENABLE, DEFAULT_CONF_SIMPLIFY_ENABLE_VALUE); StreamJob streamJob; List allShards; if (confSimplifyEnable) { //不要从conf里获取, 避免分布式模式下Job Split切分出来的Config膨胀过大 String version = this.getPluginJobConf().getString(OTSStreamReaderConstants.VERSION); OTSStreamJobShard otsStreamJobShard = OTSStreamJobShardUtil.getOTSStreamJobShard(config, version); streamJob = otsStreamJobShard.getStreamJob(); allShards = otsStreamJobShard.getAllShards(); } else { streamJob = StreamJob.fromJson( (String) this.getPluginJobConf().get(OTSStreamReaderConstants.STREAM_JOB)); allShards = GsonParser.fromJson( (String) this.getPluginJobConf().get(OTSStreamReaderConstants.ALL_SHARDS)); } proxy.init(config, streamJob, allShards, new HashSet(ownedShards)); } catch (TableStoreException ex) { throw DataXException.asDataXException(new OTSReaderError(ex.getErrorCode(), "OTS ERROR"), ex.toString(), ex); } catch (Exception ex) { throw DataXException.asDataXException(OTSReaderError.ERROR, ex.toString(), ex); } } @Override public void startRead(RecordSender recordSender) { proxy.startRead(recordSender); } public void destroy() { proxy.close(); } } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderException.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal; public class OTSStreamReaderException extends RuntimeException { public OTSStreamReaderException(String message) { super(message); } public OTSStreamReaderException(String message, Exception cause) { super(message, cause); } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderMasterProxy.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.core.CheckpointTimeTracker; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.core.OTSStreamReaderChecker; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.GsonParser; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.OTSHelper; import com.alicloud.openservices.tablestore.*; import com.alicloud.openservices.tablestore.model.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.CONF_SIMPLIFY_ENABLE; public class OTSStreamReaderMasterProxy { private OTSStreamReaderConfig conf = null; private SyncClientInterface ots = null; private StreamJob streamJob; private List allShards; private String version; private static final Logger LOG = LoggerFactory.getLogger(OTSStreamReaderConfig.class); public void init(OTSStreamReaderConfig config) throws Exception { this.conf = config; // Init ots ots = OTSHelper.getOTSInstance(conf); // 创建Checker OTSStreamReaderChecker checker = new OTSStreamReaderChecker(ots, conf); // 检查Stream是否开启,选取的时间范围是否可以导出。 checker.checkStreamEnabledAndTimeRangeOK(); // 检查StatusTable是否存在,若不存在则创建StatusTable。 checker.checkAndCreateStatusTableIfNotExist(); // 删除StatusTable记录的对应EndTime时刻的Checkpoint信息。防止本次任务受到之前导出任务的影响。 String streamId = OTSHelper.getStreamResponse(ots, config.getDataTable(), config.isTimeseriesTable()).getStreamId(); CheckpointTimeTracker checkpointInfoTracker = new CheckpointTimeTracker(ots, config.getStatusTable(), streamId); checkpointInfoTracker.clearAllCheckpoints(config.getEndTimestampMillis()); SyncClientInterface ots = OTSHelper.getOTSInstance(config); allShards = OTSHelper.getOrderedShardList(ots, streamId, conf.isTimeseriesTable()); List shardIds = new ArrayList(); for (StreamShard shard : allShards) { shardIds.add(shard.getShardId()); } this.version = "" + System.currentTimeMillis() + "-" + UUID.randomUUID(); LOG.info("version is: {}", this.version); streamJob = new StreamJob(conf.getDataTable(), streamId, version, new HashSet(shardIds), conf.getStartTimestampMillis(), conf.getEndTimestampMillis()); checkpointInfoTracker.writeStreamJob(streamJob); LOG.info("Start stream job: {}.", streamJob.toJson()); } /** * For testing purpose. * * @param streamJob */ void setStreamJob(StreamJob streamJob) { this.streamJob = streamJob; } public StreamJob getStreamJob() { return streamJob; } public List split(int adviceNumber) { int shardCount = streamJob.getShardIds().size(); int splitNumber = Math.min(adviceNumber, shardCount); int splitSize = shardCount / splitNumber; List configurations = new ArrayList(); List shardIds = new ArrayList(streamJob.getShardIds()); Collections.shuffle(shardIds); int start = 0; int end = 0; int remain = shardCount % splitNumber; for (int i = 0; i < splitNumber; i++) { start = end; end = start + splitSize; if (remain > 0) { end += 1; remain -= 1; } Configuration configuration = Configuration.newDefault(); configuration.set(OTSStreamReaderConstants.CONF, GsonParser.configToJson(conf)); // Fix #39430646 [离线同步分布式]DataX OTSStreamReader插件分布式模式优化瘦身 if (conf.isConfSimplifyEnable()) { configuration.set(OTSStreamReaderConstants.VERSION, this.version); configuration.set(CONF_SIMPLIFY_ENABLE, true); } else { configuration.set(OTSStreamReaderConstants.STREAM_JOB, streamJob.toJson()); configuration.set(OTSStreamReaderConstants.ALL_SHARDS, GsonParser.toJson(allShards)); } configuration.set(OTSStreamReaderConstants.OWNED_SHARDS, GsonParser.listToJson(shardIds.subList(start, end))); configurations.add(configuration); } LOG.info("Master split to {} slave, with advice number {}.", configurations.size(), adviceNumber); return configurations; } public void close(){ ots.shutdown(); } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/OTSStreamReaderSlaveProxy.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.core.*; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.ShardCheckpoint; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.OTSHelper; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.TimeUtils; import com.alicloud.openservices.tablestore.*; import com.alicloud.openservices.tablestore.model.*; import com.aliyun.openservices.ots.internal.streamclient.model.CheckpointPosition; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; import java.util.concurrent.*; import java.util.concurrent.atomic.AtomicInteger; public class OTSStreamReaderSlaveProxy { private static final Logger LOG = LoggerFactory.getLogger(OTSStreamReaderSlaveProxy.class); private static AtomicInteger slaveNumber = new AtomicInteger(0); private OTSStreamReaderConfig config; private SyncClientInterface ots; private Map shardToCheckpointMap = new ConcurrentHashMap(); private CheckpointTimeTracker checkpointInfoTracker; private OTSStreamReaderChecker checker; private StreamJob streamJob; private Map allShardsMap; // all shards from job master private Map ownedShards; // shards to read arranged by job master private boolean findCheckpoints; // whether find checkpoint for last job, if so, we should read from checkpoint and skip nothing. private String slaveId = UUID.randomUUID().toString(); private StreamDetails streamDetails; private boolean enableSeekIteratorByTimestamp; public void init(final OTSStreamReaderConfig otsStreamReaderConfig, StreamJob streamJob, List allShards, Set ownedShardIds) { slaveNumber.getAndIncrement(); this.config = otsStreamReaderConfig; this.ots = OTSHelper.getOTSInstance(config); this.streamJob = streamJob; this.streamDetails = OTSHelper.getStreamDetails(ots, this.streamJob.getTableName(),config.isTimeseriesTable()); this.checkpointInfoTracker = new CheckpointTimeTracker(ots, config.getStatusTable(), this.streamJob.getStreamId()); this.checker = new OTSStreamReaderChecker(ots, config); this.allShardsMap = OTSHelper.toShardMap(allShards); this.enableSeekIteratorByTimestamp = otsStreamReaderConfig.getEnableSeekIteratorByTimestamp(); LOG.info("SlaveId: {}, ShardIds: {}, OwnedShards: {}.", slaveId, allShards, ownedShardIds); this.ownedShards = new HashMap(); for (String ownedShardId : ownedShardIds) { ownedShards.put(ownedShardId, allShardsMap.get(ownedShardId)); } for (String shardId : this.streamJob.getShardIds()) { shardToCheckpointMap.put(shardId, new ShardCheckpoint(shardId, this.streamJob.getVersion(), CheckpointPosition.TRIM_HORIZON, 0)); } findCheckpoints = checker.checkAndSetCheckpoints(checkpointInfoTracker, allShardsMap, streamJob, shardToCheckpointMap); if (!findCheckpoints && !enableSeekIteratorByTimestamp) { LOG.info("Checkpoint for stream '{}' in timestamp '{}' is not found. EnableSeekIteratorByTimestamp: {}", streamJob.getStreamId(), streamJob.getStartTimeInMillis(), this.enableSeekIteratorByTimestamp); setWithNearestCheckpoint(); } LOG.info("Find checkpoints: {}, EnableSeekIteratorByTimestamp: {}", findCheckpoints, enableSeekIteratorByTimestamp); for (Map.Entry shard : ownedShards.entrySet()) { LOG.info("Shard to process, ShardInfo: [{}], StartCheckpoint: [{}].", shard.getValue(), shardToCheckpointMap.get(shard.getKey())); } LOG.info("Count of owned shards: {}. ShardIds: {}.", ownedShardIds.size(), ownedShardIds); } public boolean isFindCheckpoints() { return findCheckpoints; } public Map getAllShardsMap() { return allShardsMap; } public Map getOwnedShards() { return ownedShards; } public Map getShardToCheckpointMap() { return shardToCheckpointMap; } /** * 没有找到上一次任务的checkpoint,需要重新从头开始读。 * 为了减少扫描的数据量,尝试查找里startTime最近的一次checkpoint。 */ private void setWithNearestCheckpoint() { long expirationTime = (streamDetails.getExpirationTime() - 1) * TimeUtils.HOUR_IN_MILLIS; long timeRangeBegin = System.currentTimeMillis() - expirationTime; long timeRangeEnd = this.config.getStartTimestampMillis() - 1; if (timeRangeBegin < timeRangeEnd) { for (String shardId : ownedShards.keySet()) { LOG.info("Try find nearest checkpoint for shard {}, startTime: {}.", shardId, config.getStartTimestampMillis()); String checkpoint = this.checkpointInfoTracker.getShardLargestCheckpointInTimeRange(shardId, timeRangeBegin, timeRangeEnd); if (checkpoint != null) { LOG.info("Found checkpoint for shard {}, checkpoint: {}.", shardId, checkpoint); shardToCheckpointMap.put(shardId, new ShardCheckpoint(shardId, streamJob.getVersion(), checkpoint, 0)); } } } } private int calcThreadPoolSize() { int threadNum = 0; // 如果配置了thread num,则计算平均每个slave所启动的thread的个数 if (config.getThreadNum() > 0) { threadNum = config.getThreadNum() / slaveNumber.get(); } else { threadNum = Runtime.getRuntime().availableProcessors() * 4 / slaveNumber.get(); } if (threadNum == 0) { threadNum = 1; } LOG.info("ThreadNum: {}.", threadNum); return threadNum; } private Map filterShardsReachEnd(Map ownedShards, Map allCheckpoints) { Map allShardToProcess = new HashMap(); for (Map.Entry shard : ownedShards.entrySet()) { String shardId = shard.getKey(); if (allCheckpoints.get(shardId).getCheckpoint().equals(CheckpointPosition.SHARD_END)) { LOG.info("Shard has reach end, no need to process. ShardId: {}.", shardId); // but we need to set checkpoint for this job checkpointInfoTracker.writeCheckpoint(streamJob.getEndTimeInMillis(), new ShardCheckpoint(shardId, streamJob.getVersion(), CheckpointPosition.SHARD_END, 0), 0); } else { allShardToProcess.put(shard.getKey(), shard.getValue()); } } return allShardToProcess; } public void startRead(RecordSender recordSender) { int threadPoolSize = calcThreadPoolSize(); ExecutorService executorService = new ThreadPoolExecutor( 0, threadPoolSize, 60L, TimeUnit.SECONDS, new ArrayBlockingQueue(ownedShards.size())); LOG.info("Start thread pool with size: {}, ShardsCount: {}, SlaveCount: {}.", threadPoolSize, ownedShards.size(), slaveNumber.get()); try { Map allShardToProcess = filterShardsReachEnd(ownedShards, shardToCheckpointMap); Map shardProcessingState = new HashMap(); for (String shardId : allShardToProcess.keySet()) { shardProcessingState.put(shardId, ShardStatusChecker.ProcessState.BLOCK); } List processors = new ArrayList(); // 获取当前所有shard的checkpoint状态,对当前的owned shard执行对应的任务。 long lastLogTime = System.currentTimeMillis(); while (!allShardToProcess.isEmpty()) { Map checkpointMap = checkpointInfoTracker.getAllCheckpoints(streamJob.getEndTimeInMillis()); // 检查当前job的checkpoint,排查是否有其他job误入或者出现不明的shard。 checkCheckpoint(checkpointMap, streamJob); // 找到需要处理的shard以及确定不需要被处理的shard List shardToProcess = new ArrayList(); List shardNoNeedProcess = new ArrayList(); List shardBlocked = new ArrayList(); ShardStatusChecker.findShardToProcess(allShardToProcess, allShardsMap, checkpointMap, shardToProcess, shardNoNeedProcess, shardBlocked); // 将不需要处理的shard,设置checkpoint,代表本轮处理完毕,且checkpoint为TRIM_HORIZON for (StreamShard shard : shardNoNeedProcess) { LOG.info("Skip shard: {}.", shard.getShardId()); ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), streamJob.getVersion(), CheckpointPosition.TRIM_HORIZON, 0); checkpointInfoTracker.writeCheckpoint(config.getEndTimestampMillis(), checkpoint, 0); shardProcessingState.put(shard.getShardId(), ShardStatusChecker.ProcessState.SKIP); } for (StreamShard shard : shardToProcess) { RecordProcessor processor = new RecordProcessor(ots, config, streamJob, shard, shardToCheckpointMap.get(shard.getShardId()), !findCheckpoints, checkpointInfoTracker, recordSender); processor.initialize(); executorService.submit(processor); processors.add(processor); shardProcessingState.put(shard.getShardId(), ShardStatusChecker.ProcessState.READY); } // 等待所有任务执行完毕,并且检查每个任务的状态,检查是否发生hang或长时间没有数据 checkProcessorRunningStatus(processors); if (!allShardToProcess.isEmpty()) { TimeUtils.sleepMillis(config.getSlaveLoopInterval()); } long now = System.currentTimeMillis(); if (now - lastLogTime > config.getSlaveLoggingStatusInterval()) { logShardProcessingState(shardProcessingState); LOG.info("AllCheckpoints: {}", checkpointMap); lastLogTime = now; } } LOG.info("All shard is processing."); logShardProcessingState(shardProcessingState); // 等待当前分配的shard的读取任务执行完毕后退出。 while (true) { boolean finished = true; checkProcessorRunningStatus(processors); for (RecordProcessor processor : processors) { RecordProcessor.State state = processor.getState(); if (state != RecordProcessor.State.SUCCEED) { LOG.info("Shard is processing, shardId: {}, status: {}.", processor.getShard().getShardId(), state); finished = false; } } if (finished) { LOG.info("All record processor finished."); break; } TimeUtils.sleepMillis(config.getSlaveLoopInterval()); } } catch (TableStoreException ex) { throw DataXException.asDataXException(new OTSReaderError(ex.getErrorCode(), "SyncClientInterface Error"), ex.toString(), ex); } catch (OTSStreamReaderException ex) { LOG.error("SlaveId: {}, OwnedShards: {}.", slaveId, ownedShards, ex); throw DataXException.asDataXException(OTSReaderError.ERROR, ex.toString(), ex); } catch (Exception ex) { LOG.error("SlaveId: {}, OwnedShards: {}.", slaveId, ownedShards, ex); throw DataXException.asDataXException(OTSReaderError.ERROR, ex.toString(), ex); } finally { try { executorService.shutdownNow(); executorService.awaitTermination(1, TimeUnit.MINUTES); } catch (Exception e) { LOG.error("Shutdown encounter exception.", e); } } } private void logShardProcessingState(Map shardProcessingState) { StringBuilder sb = new StringBuilder(); sb.append("Shard running status: \n"); for (Map.Entry entry : shardProcessingState.entrySet()) { sb.append("ShardId:").append(entry.getKey()). append(", ProcessingState: ").append(entry.getValue()).append("\n"); } LOG.info("Version: {}, Reader status: {}", streamJob.getVersion(), sb.toString()); } private void checkProcessorRunningStatus(List processors) { long now = System.currentTimeMillis(); for (RecordProcessor processor : processors) { RecordProcessor.State state = processor.getState(); StreamShard shard = processor.getShard(); if (state == RecordProcessor.State.READY || state == RecordProcessor.State.SUCCEED) { continue; } else if (state == RecordProcessor.State.INTERRUPTED || state == RecordProcessor.State.FAILED) { throw new OTSStreamReaderException("Read task for shard '" + shard.getShardId() + "' has failed."); } else { // status = RUNNING long lastProcessTime = processor.getLastProcessTime(); if (now - lastProcessTime > OTSStreamReaderConstants.MAX_ONCE_PROCESS_TIME_MILLIS) { throw new OTSStreamReaderException("Process shard timeout, ShardId:" + shard.getShardId() + ", LastProcessTime:" + lastProcessTime + ", MaxProcessTime:" + OTSStreamReaderConstants.MAX_ONCE_PROCESS_TIME_MILLIS + ", Now:" + now + "."); } } } } void checkCheckpoint(Map checkpointMap, StreamJob streamJob) { for (Map.Entry entry : checkpointMap.entrySet()) { String shardId = entry.getKey(); String version = entry.getValue().getVersion(); if (!streamJob.getShardIds().contains(shardId)) { LOG.info("Shard '{}' is not found in job. Job: {}.", entry.getKey(), streamJob.getShardIds()); throw DataXException.asDataXException(OTSReaderError.ERROR, "Some shard from checkpoint is not belong to this job: " + shardId); } if (!version.equals(streamJob.getVersion())) { LOG.info("Version of shard '{}' in checkpoint is not equal with version of this job. " + "Checkpoint version: {}, job version: {}.", shardId, version, streamJob.getVersion()); throw DataXException.asDataXException(OTSReaderError.ERROR, "Version of checkpoint is not equal with version of this job."); } } } public void close() { ots.shutdown(); } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_en_US.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/Mode.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.config; public enum Mode { MULTI_VERSION, SINGLE_VERSION_AND_UPDATE_ONLY } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSRetryStrategyForStreamReader.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.config; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.OTSErrorCode; import com.alicloud.openservices.tablestore.*; import com.alicloud.openservices.tablestore.model.RetryStrategy; import java.util.Arrays; import java.util.List; public class OTSRetryStrategyForStreamReader implements RetryStrategy { private int maxRetries = 30; private static long retryPauseScaleTimeMillis = 100; private static long maxPauseTimeMillis = 10 * 1000; private int retries = 0; private static List noRetryErrorCode = Arrays.asList( OTSErrorCode.AUTHORIZATION_FAILURE, OTSErrorCode.CONDITION_CHECK_FAIL, OTSErrorCode.INVALID_PARAMETER, OTSErrorCode.INVALID_PK, OTSErrorCode.OBJECT_ALREADY_EXIST, OTSErrorCode.OBJECT_NOT_EXIST, OTSErrorCode.OUT_OF_COLUMN_COUNT_LIMIT, OTSErrorCode.OUT_OF_ROW_SIZE_LIMIT, OTSErrorCode.REQUEST_TOO_LARGE, OTSErrorCode.TRIMMED_DATA_ACCESS ); private boolean canRetry(Exception ex) { if (ex instanceof TableStoreException) { if (noRetryErrorCode.contains(((TableStoreException) ex).getErrorCode())) { return false; } return true; } else if (ex instanceof ClientException) { return true; } else { return false; } } public boolean shouldRetry(String action, Exception ex, int retries) { if (retries > maxRetries) { return false; } if (canRetry(ex)) { return true; } return false; } public void setMaxRetries(int maxRetries) { this.maxRetries = maxRetries; } public int getMaxRetries() { return this.maxRetries; } @Override public RetryStrategy clone() { return new OTSRetryStrategyForStreamReader(); } @Override public int getRetries() { return retries; } @Override public long nextPause(String action, Exception ex) { if (!shouldRetry(action, ex, retries)) { return 0; } long pause = Math.min((int)Math.pow(2, retries) * retryPauseScaleTimeMillis, maxPauseTimeMillis); ++retries; return pause; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConfig.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.config; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSStreamReaderException; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.ParamChecker; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.TimeUtils; import com.alicloud.openservices.tablestore.SyncClientInterface; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.ParseException; import java.util.ArrayList; import java.util.List; import java.util.Map; import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.CONF_SIMPLIFY_ENABLE; import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.DEFAULT_CONF_SIMPLIFY_ENABLE_VALUE; public class OTSStreamReaderConfig { private static final Logger LOG = LoggerFactory.getLogger(OTSStreamReaderConfig.class); private static final String KEY_OTS_ENDPOINT = "endpoint"; private static final String KEY_OTS_ACCESSID = "accessId"; private static final String KEY_OTS_ACCESSKEY = "accessKey"; private static final String KEY_OTS_INSTANCE_NAME = "instanceName"; private static final String KEY_DATA_TABLE_NAME = "dataTable"; private static final String KEY_STATUS_TABLE_NAME = "statusTable"; private static final String KEY_START_TIMESTAMP_MILLIS = "startTimestampMillis"; private static final String KEY_END_TIMESTAMP_MILLIS = "endTimestampMillis"; private static final String KEY_START_TIME_STRING = "startTimeString"; private static final String KEY_END_TIME_STRING = "endTimeString"; private static final String KEY_IS_EXPORT_SEQUENCE_INFO = "isExportSequenceInfo"; private static final String KEY_DATE = "date"; private static final String KEY_MAX_RETRIES = "maxRetries"; private static final String KEY_MODE = "mode"; private static final String KEY_COLUMN = "column"; private static final String KEY_THREAD_NUM = "threadNum"; private static final String KEY_ENABLE_TABLE_GROUP_SUPPORT = "enableTableGroupSupport"; private static final String ENABLE_SEEK_SHARD_ITERATOR = "enableSeekIterator"; private static final String IS_TIMESERIES_TABLE = "isTimeseriesTable"; private static final int DEFAULT_MAX_RETRIES = 30; private static final long DEFAULT_SLAVE_LOOP_INTERVAL = 10 * TimeUtils.SECOND_IN_MILLIS; private static final long DEFAULT_SLAVE_LOGGING_STATUS_INTERVAL = 60 * TimeUtils.SECOND_IN_MILLIS; private String endpoint; private String accessId; private String accessKey; private String instanceName; private String dataTable; private String statusTable; private long startTimestampMillis; private long endTimestampMillis; private boolean isExportSequenceInfo; private int maxRetries = DEFAULT_MAX_RETRIES; private int threadNum = 32; private long slaveLoopInterval = DEFAULT_SLAVE_LOOP_INTERVAL; private long slaveLoggingStatusInterval = DEFAULT_SLAVE_LOGGING_STATUS_INTERVAL; private boolean enableSeekIteratorByTimestamp; private boolean enableTableGroupSupport; private Mode mode; private List columns; private List columnsIsTimeseriesTags; private transient SyncClientInterface otsForTest; private boolean confSimplifyEnable; private boolean isTimeseriesTable; public String getEndpoint() { return endpoint; } public void setEndpoint(String endpoint) { this.endpoint = endpoint; } public String getAccessId() { return accessId; } public void setAccessId(String accessId) { this.accessId = accessId; } public String getAccessKey() { return accessKey; } public void setAccessKey(String accessKey) { this.accessKey = accessKey; } public String getInstanceName() { return instanceName; } public void setInstanceName(String instanceName) { this.instanceName = instanceName; } public String getDataTable() { return dataTable; } public void setDataTable(String dataTable) { this.dataTable = dataTable; } public String getStatusTable() { return statusTable; } public void setStatusTable(String statusTable) { this.statusTable = statusTable; } public long getStartTimestampMillis() { return startTimestampMillis; } public void setStartTimestampMillis(long startTimestampMillis) { this.startTimestampMillis = startTimestampMillis; } public long getEndTimestampMillis() { return endTimestampMillis; } public void setEndTimestampMillis(long endTimestampMillis) { this.endTimestampMillis = endTimestampMillis; } public boolean isExportSequenceInfo() { return isExportSequenceInfo; } public void setIsExportSequenceInfo(boolean isExportSequenceInfo) { this.isExportSequenceInfo = isExportSequenceInfo; } public boolean isEnableTableGroupSupport() { return enableTableGroupSupport; } public void setEnableTableGroupSupport(boolean enableTableGroupSupport) { this.enableTableGroupSupport = enableTableGroupSupport; } public boolean getEnableSeekIteratorByTimestamp() { return enableSeekIteratorByTimestamp; } public void setEnableSeekIteratorByTimestamp(boolean enableSeekIteratorByTimestamp) { this.enableSeekIteratorByTimestamp = enableSeekIteratorByTimestamp; } public Mode getMode() { return mode; } public void setMode(Mode mode) { this.mode = mode; } public List getColumns() { return columns; } public void setColumns(List columns) { this.columns = columns; } public List getColumnsIsTimeseriesTags() { return columnsIsTimeseriesTags; } public void setColumnsIsTimeseriesTags(List columnsIsTimeseriesTags) { this.columnsIsTimeseriesTags = columnsIsTimeseriesTags; } public boolean isTimeseriesTable() { return isTimeseriesTable; } public void setTimeseriesTable(boolean timeseriesTable) { isTimeseriesTable = timeseriesTable; } private static void parseConfigForSingleVersionAndUpdateOnlyMode(OTSStreamReaderConfig config, Configuration param) { try { Boolean isTimeseriesTable = param.getBool(IS_TIMESERIES_TABLE); if (isTimeseriesTable != null) { config.setTimeseriesTable(isTimeseriesTable); } else { config.setTimeseriesTable(false); } } catch (RuntimeException ex) { throw new OTSStreamReaderException("Parse timeseries stream settings fail, please check your config.", ex); } try { List values = param.getList(KEY_COLUMN); if (values == null) { config.setColumns(new ArrayList()); config.setColumnsIsTimeseriesTags(new ArrayList()); return; } List columns = new ArrayList(); List columnsIsTimeseriesTags = new ArrayList(); Boolean isTimeseriesTable = config.isTimeseriesTable(); for (Object item : values) { if (item instanceof Map) { String columnName = (String) ((Map) item).get("name"); columns.add(columnName); boolean columnsIsTimeseriesTag = false; if (isTimeseriesTable && Boolean.parseBoolean((String) ((Map) item).getOrDefault("is_timeseries_tag", "false"))) { columnsIsTimeseriesTag = true; } columnsIsTimeseriesTags.add(columnsIsTimeseriesTag); } else { throw new IllegalArgumentException("The item of column must be map object, please check your input."); } } config.setColumns(columns); config.setColumnsIsTimeseriesTags(columnsIsTimeseriesTags); } catch (RuntimeException ex) { throw new OTSStreamReaderException("Parse column fail, please check your config.", ex); } } public static OTSStreamReaderConfig load(Configuration param) { OTSStreamReaderConfig config = new OTSStreamReaderConfig(); config.setEndpoint(ParamChecker.checkStringAndGet(param, KEY_OTS_ENDPOINT, true)); config.setAccessId(ParamChecker.checkStringAndGet(param, KEY_OTS_ACCESSID, true)); config.setAccessKey(ParamChecker.checkStringAndGet(param, KEY_OTS_ACCESSKEY, true)); config.setInstanceName(ParamChecker.checkStringAndGet(param, KEY_OTS_INSTANCE_NAME, true)); config.setDataTable(ParamChecker.checkStringAndGet(param, KEY_DATA_TABLE_NAME, true)); config.setStatusTable(ParamChecker.checkStringAndGet(param, KEY_STATUS_TABLE_NAME, true)); config.setIsExportSequenceInfo(param.getBool(KEY_IS_EXPORT_SEQUENCE_INFO, false)); config.setEnableSeekIteratorByTimestamp(param.getBool(ENABLE_SEEK_SHARD_ITERATOR, false)); config.setConfSimplifyEnable(param.getBool(CONF_SIMPLIFY_ENABLE, DEFAULT_CONF_SIMPLIFY_ENABLE_VALUE)); config.setEnableTableGroupSupport(param.getBool(KEY_ENABLE_TABLE_GROUP_SUPPORT, false)); if (param.getInt(KEY_THREAD_NUM) != null) { config.setThreadNum(param.getInt(KEY_THREAD_NUM)); } if (param.getString(KEY_DATE) == null && (param.getLong(KEY_START_TIMESTAMP_MILLIS) == null || param.getLong(KEY_END_TIMESTAMP_MILLIS) == null) && (param.getLong(KEY_START_TIME_STRING) == null || param.getLong(KEY_END_TIME_STRING) == null)) { throw new OTSStreamReaderException("Must set date or time range millis or time range string, please check your config."); } if (param.get(KEY_DATE) != null && (param.getLong(KEY_START_TIMESTAMP_MILLIS) != null || param.getLong(KEY_END_TIMESTAMP_MILLIS) != null) && (param.getLong(KEY_START_TIME_STRING) != null || param.getLong(KEY_END_TIME_STRING) != null)) { throw new OTSStreamReaderException("Can't set date and time range millis and time range string, please check your config."); } if (param.get(KEY_DATE) != null && (param.getLong(KEY_START_TIMESTAMP_MILLIS) != null || param.getLong(KEY_END_TIMESTAMP_MILLIS) != null)) { throw new OTSStreamReaderException("Can't set date and time range both, please check your config."); } if (param.get(KEY_DATE) != null && (param.getLong(KEY_START_TIME_STRING) != null || param.getLong(KEY_END_TIME_STRING) != null)) { throw new OTSStreamReaderException("Can't set date and time range string both, please check your config."); } if ((param.getLong(KEY_START_TIMESTAMP_MILLIS) != null || param.getLong(KEY_END_TIMESTAMP_MILLIS) != null) && (param.getLong(KEY_START_TIME_STRING) != null || param.getLong(KEY_END_TIME_STRING) != null)) { throw new OTSStreamReaderException("Can't set time range millis and time range string both, expect timestamp like '1516010400000'."); } if (param.getString(KEY_START_TIME_STRING) != null && param.getString(KEY_END_TIME_STRING) != null) { String startTime = ParamChecker.checkStringAndGet(param, KEY_START_TIME_STRING, true); String endTime = ParamChecker.checkStringAndGet(param, KEY_END_TIME_STRING, true); try { long startTimestampMillis = TimeUtils.parseTimeStringToTimestampMillis(startTime); config.setStartTimestampMillis(startTimestampMillis); } catch (Exception ex) { throw new OTSStreamReaderException("Can't parse startTimeString: " + startTime + ", expect format date like '201801151612'."); } try { long endTimestampMillis = TimeUtils.parseTimeStringToTimestampMillis(endTime); config.setEndTimestampMillis(endTimestampMillis); } catch (Exception ex) { throw new OTSStreamReaderException("Can't parse endTimeString: " + endTime + ", expect format date like '201801151612'."); } } else if (param.getString(KEY_DATE) == null) { config.setStartTimestampMillis(param.getLong(KEY_START_TIMESTAMP_MILLIS)); config.setEndTimestampMillis(param.getLong(KEY_END_TIMESTAMP_MILLIS)); } else { String date = ParamChecker.checkStringAndGet(param, KEY_DATE, true); try { long startTimestampMillis = TimeUtils.parseDateToTimestampMillis(date); config.setStartTimestampMillis(startTimestampMillis); config.setEndTimestampMillis(startTimestampMillis + TimeUtils.DAY_IN_MILLIS); } catch (ParseException ex) { throw new OTSStreamReaderException("Can't parse date: " + date); } } if (config.getStartTimestampMillis() >= config.getEndTimestampMillis()) { throw new OTSStreamReaderException("EndTimestamp must be larger than startTimestamp."); } config.setMaxRetries(param.getInt(KEY_MAX_RETRIES, DEFAULT_MAX_RETRIES)); String mode = param.getString(KEY_MODE); if (mode != null) { if (mode.equalsIgnoreCase(Mode.SINGLE_VERSION_AND_UPDATE_ONLY.name())) { config.setMode(Mode.SINGLE_VERSION_AND_UPDATE_ONLY); parseConfigForSingleVersionAndUpdateOnlyMode(config, param); } else { throw new OTSStreamReaderException("Unsupported Mode: " + mode + ", please check your config."); } } else { config.setMode(Mode.MULTI_VERSION); List values = param.getList(KEY_COLUMN); if (values != null) { LOG.warn("The multi version mode doesn't support setting columns, column config will ignore."); } Boolean isTimeseriesTable = param.getBool(IS_TIMESERIES_TABLE); if (isTimeseriesTable != null) { LOG.warn("The multi version mode doesn't support setting Timeseries stream, stream config will ignore."); } } LOG.info("endpoint: {}, accessKeyId: {}, accessKeySecret: {}, instanceName: {}, dataTableName: {}, statusTableName: {}," + " isExportSequenceInfo: {}, startTimestampMillis: {}, endTimestampMillis:{}, maxRetries:{}, enableSeekIteratorByTimestamp: {}, " + "confSimplifyEnable: {}, isTimeseriesTable: {}.", config.getEndpoint(), config.getAccessId(), config.getAccessKey(), config.getInstanceName(), config.getDataTable(), config.getStatusTable(), config.isExportSequenceInfo(), config.getStartTimestampMillis(), config.getEndTimestampMillis(), config.getMaxRetries(), config.getEnableSeekIteratorByTimestamp(), config.isConfSimplifyEnable(), config.isTimeseriesTable()); return config; } /** * test use * @return */ public SyncClientInterface getOtsForTest() { return otsForTest; } /** * test use * @param otsForTest */ public void setOtsForTest(SyncClientInterface otsForTest) { this.otsForTest = otsForTest; } public int getMaxRetries() { return maxRetries; } public void setMaxRetries(int maxRetries) { this.maxRetries = maxRetries; } public int getThreadNum() { return threadNum; } public void setSlaveLoopInterval(long slaveLoopInterval) { this.slaveLoopInterval = slaveLoopInterval; } public void setSlaveLoggingStatusInterval(long slaveLoggingStatusInterval) { this.slaveLoggingStatusInterval = slaveLoggingStatusInterval; } public long getSlaveLoopInterval() { return slaveLoopInterval; } public long getSlaveLoggingStatusInterval() { return slaveLoggingStatusInterval; } public void setThreadNum(int threadNum) { this.threadNum = threadNum; } public boolean isConfSimplifyEnable() { return confSimplifyEnable; } public void setConfSimplifyEnable(boolean confSimplifyEnable) { this.confSimplifyEnable = confSimplifyEnable; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/OTSStreamReaderConstants.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.config; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.TimeUtils; public class OTSStreamReaderConstants { public static long BEFORE_OFFSET_TIME_MILLIS = 10 * TimeUtils.MINUTE_IN_MILLIS; public static long AFTER_OFFSET_TIME_MILLIS = 5 * TimeUtils.MINUTE_IN_MILLIS; public static final int STATUS_TABLE_TTL = 30 * TimeUtils.DAY_IN_SEC; public static final long MAX_WAIT_TABLE_READY_TIME_MILLIS = 2 * TimeUtils.MINUTE_IN_MILLIS; public static final long MAX_OTS_UNAVAILABLE_TIME = 30 * TimeUtils.MINUTE_IN_MILLIS; public static final long MAX_ONCE_PROCESS_TIME_MILLIS = MAX_OTS_UNAVAILABLE_TIME; public static final String CONF = "conf"; public static final String STREAM_JOB = "STREAM_JOB"; public static final String OWNED_SHARDS = "OWNED_SHARDS"; public static final String ALL_SHARDS = "ALL_SHARDS"; public static final String VERSION = "STREAM_VERSION"; /** * 是否开启OTS分布式模式降低Job Split阶段切分的Task Conf大小启动优化, * 新增该参数的目的是为了保证DataX灰度过程,避免因为OTS分布式任务运行部分子进程运行在老版本、部分运行在新版本导致任务失败问题, * 当DataX版本集群粒度已全量升级到新版本以后,再开启该参数为"true",默认值是"false" */ public static final String CONF_SIMPLIFY_ENABLE = "confSimplifyEnable"; public static final Integer RETRY_TIMES = 3; public static final Long DEFAULT_SLEEP_TIME_IN_MILLS = 500l; public static final boolean DEFAULT_CONF_SIMPLIFY_ENABLE_VALUE = false; static { String beforeOffsetMillis = System.getProperty("BEFORE_OFFSET_TIME_MILLIS"); if (beforeOffsetMillis != null) { BEFORE_OFFSET_TIME_MILLIS = Long.valueOf(beforeOffsetMillis); } String afterOffsetMillis = System.getProperty("AFTER_OFFSET_TIME_MILLIS"); if (afterOffsetMillis != null) { AFTER_OFFSET_TIME_MILLIS = Long.valueOf(afterOffsetMillis); } } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/config/StatusTableConstants.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.config; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import com.alicloud.openservices.tablestore.model.PrimaryKeyType; import java.util.Arrays; import java.util.List; public class StatusTableConstants { // status table's schema public static String PK1_STREAM_ID = "StreamId"; public static String PK2_STATUS_TYPE = "StatusType"; public static String PK3_STATUS_VALUE = "StatusValue"; public static List STATUS_TABLE_PK_SCHEMA = Arrays.asList( new PrimaryKeySchema(PK1_STREAM_ID, PrimaryKeyType.STRING), new PrimaryKeySchema(PK2_STATUS_TYPE, PrimaryKeyType.STRING), new PrimaryKeySchema(PK3_STATUS_VALUE, PrimaryKeyType.STRING)); /** * 记录对应某一时刻的所有Shard的Checkpoint。 * 格式如下: * * PK1 : StreamId : "dataTable_131231" * PK2 : StatusType : "CheckpointForDataxReader" * * 记录Checkpoint: * PK3 : StatusValue : "1444357620415 shard1" (Time + \t + ShardId) * Column : Checkpoint : "checkpoint" * 记录ShardCount: * PK3 : StatusValue : "1444357620415" (Time) * Column : ShardCount : 3 * */ public static String STATUS_TYPE_CHECKPOINT = "CheckpointForDataxReader"; // 记录每次Datax Job的运行信息,包括Shard列表,StreamId和版本等。 public static String STATUS_TYPE_JOB_DESC = "DataxJobDesc"; /** * 记录某个Shard在某个时间的Checkpoint * PK1: StreamId : "dataTable_131231" * PK2: StatusType: "ShardTimeCheckpointForDataxReader" * PK3: StatusValue: "shard1 1444357620415" (ShardId + \t + Time) * Column: Checkpoint : "checkpoint" */ public static String STATUS_TYPE_SHARD_CHECKPOINT = "ShardTimeCheckpointForDataxReader"; public static String TIME_SHARD_SEPARATOR = "\t"; public static String LARGEST_SHARD_ID = String.valueOf((char)127); //用于确定GetRange的范围。 // 记录Checkpoint的行的属性列 public static String CHECKPOINT_COLUMN_NAME = "Checkpoint"; public static String VERSION_COLUMN_NAME = "Version"; public static String SKIP_COUNT_COLUMN_NAME = "SkipCount"; public static String SHARDCOUNT_COLUMN_NAME = "ShardCount"; // 记录Job信息的行的属性列 public static final int COLUMN_MAX_SIZE = 64 * 1024; public static final String JOB_SHARD_LIST_PREFIX_COLUMN_NAME = "ShardIds_"; public static final String JOB_VERSION_COLUMN_NAME = "Version"; public static final String JOB_TABLE_NAME_COLUMN_NAME = "TableName"; public static final String JOB_STREAM_ID_COLUMN_NAME = "JobStreamId"; public static final String JOB_START_TIME_COLUMN_NAME = "StartTime"; public static final String JOB_END_TIME_COLUMN_NAME = "EndTime"; } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/CheckpointTimeTracker.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.core; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.StatusTableConstants; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.ShardCheckpoint; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.GsonParser; import com.alicloud.openservices.tablestore.*; import com.alicloud.openservices.tablestore.core.protocol.OtsInternalApi; import com.alicloud.openservices.tablestore.model.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; import java.util.concurrent.ConcurrentHashMap; public class CheckpointTimeTracker { private static final Logger LOG = LoggerFactory.getLogger(CheckpointTimeTracker.class); private final SyncClientInterface client; private final String statusTable; private final String streamId; public CheckpointTimeTracker(SyncClientInterface client, String statusTable, String streamId) { this.client = client; this.statusTable = statusTable; this.streamId = streamId; } /** * 返回timestamp时刻记录了checkpoint的shard的个数,用于检查checkpoints是否完整。 * * @param timestamp * @return 如果status表中未记录shardCount信息,返回-1 */ public int getShardCountForCheck(long timestamp) { PrimaryKey primaryKey = getPrimaryKeyForShardCount(timestamp); GetRowRequest getRowRequest = getOTSRequestForGet(primaryKey); Row row = client.getRow(getRowRequest).getRow(); if (row == null) { return -1; } int shardCount = (int) row.getColumn(StatusTableConstants.SHARDCOUNT_COLUMN_NAME).get(0).getValue().asLong(); LOG.info("GetShardCount: timestamp: {}, shardCount: {}.", timestamp, shardCount); return shardCount; } /** * 从状态表中读取所有的checkpoint。 * * @param timestamp * @return */ public Map getAllCheckpoints(long timestamp) { Iterator rowIter = getRangeIteratorForGetAllCheckpoints(client, timestamp); List rows = readAllRows(rowIter); Map checkpointMap = new HashMap(); for (Row row : rows) { String pk3 = row.getPrimaryKey().getPrimaryKeyColumn(StatusTableConstants.PK3_STATUS_VALUE).getValue().asString(); String shardId = pk3.split(StatusTableConstants.TIME_SHARD_SEPARATOR)[1]; ShardCheckpoint checkpoint = ShardCheckpoint.fromRow(shardId, row); checkpointMap.put(shardId, checkpoint); } if (LOG.isDebugEnabled()) { StringBuilder stringBuilder = new StringBuilder(); stringBuilder.append("GetAllCheckpoints: size: " + checkpointMap.size()); for (String shardId : checkpointMap.keySet()) { stringBuilder.append(", [shardId: "); stringBuilder.append(shardId); stringBuilder.append(", checkpoint: "); stringBuilder.append(checkpointMap.get(shardId)); stringBuilder.append("]"); } LOG.debug(stringBuilder.toString()); } return checkpointMap; } private List readAllRows(Iterator rowIter) { List rows = new ArrayList(); while (rowIter.hasNext()) { rows.add(rowIter.next()); } return rows; } /** * 设置某个分片某个时间的checkpoint, 用于寻找某个分片在一定区间内较大的checkpoint, 减少扫描的数据量. * * @param shardId * @param timestamp * @param checkpointValue */ public void setShardTimeCheckpoint(String shardId, long timestamp, String checkpointValue) { PutRowRequest putRowRequest = getOTSRequestForSetShardTimeCheckpoint(shardId, timestamp, checkpointValue); client.putRow(putRowRequest); LOG.info("SetShardTimeCheckpoint: timestamp: {}, shardId: {}, checkpointValue: {}.", timestamp, shardId, checkpointValue); } /** * 获取某个分片在某个时间范围内最大的checkpoint, 用于寻找某个分片在一定区间内较大的checkpoint, 减少扫描的数据量. * 查询的范围为左开右闭。 * * @param shardId * @param startTimestamp * @param endTimestamp * @return */ public String getShardLargestCheckpointInTimeRange(String shardId, long startTimestamp, long endTimestamp) { PrimaryKey startPk = getPrimaryKeyForShardTimeCheckpoint(shardId, endTimestamp); PrimaryKey endPk = getPrimaryKeyForShardTimeCheckpoint(shardId, startTimestamp); RangeRowQueryCriteria rangeRowQueryCriteria = new RangeRowQueryCriteria(statusTable); rangeRowQueryCriteria.setMaxVersions(1); rangeRowQueryCriteria.setDirection(Direction.BACKWARD); rangeRowQueryCriteria.setLimit(1); rangeRowQueryCriteria.setInclusiveStartPrimaryKey(startPk); rangeRowQueryCriteria.setExclusiveEndPrimaryKey(endPk); GetRangeRequest getRangeRequest = new GetRangeRequest(rangeRowQueryCriteria); GetRangeResponse result = client.getRange(getRangeRequest); if (result.getRows().isEmpty()) { return null; } else { try { String checkpoint = result.getRows().get(0).getLatestColumn(StatusTableConstants.CHECKPOINT_COLUMN_NAME).getValue().asString(); String time = result.getRows().get(0).getPrimaryKey().getPrimaryKeyColumn(2).getValue().asString().split(StatusTableConstants.TIME_SHARD_SEPARATOR)[1]; LOG.info("find checkpoint for shard {} in time {}.", shardId, time); return checkpoint; } catch (Exception ex) { LOG.error("Error when get shard time checkpoint.", ex); return null; } } } public void clearAllCheckpoints(long timestamp) { Iterator rowIter = getRangeIteratorForGetAllCheckpoints(client, timestamp); List rows = readAllRows(rowIter); for (Row row : rows) { DeleteRowRequest deleteRowRequest = getOTSRequestForDelete(row.getPrimaryKey()); client.deleteRow(deleteRowRequest); } LOG.info("ClearAllCheckpoints: timestamp: {}.", timestamp); } private PrimaryKey getPrimaryKeyForCheckpoint(long timestamp, String shardId) { String statusValue = String.format("%16d", timestamp) + StatusTableConstants.TIME_SHARD_SEPARATOR + shardId; List pkCols = new ArrayList(); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK1_STREAM_ID, PrimaryKeyValue.fromString(streamId))); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK2_STATUS_TYPE, PrimaryKeyValue.fromString(StatusTableConstants.STATUS_TYPE_CHECKPOINT))); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK3_STATUS_VALUE, PrimaryKeyValue.fromString(statusValue))); PrimaryKey primaryKey = new PrimaryKey(pkCols); return primaryKey; } private PrimaryKey getPrimaryKeyForJobDesc(long timestamp) { String statusValue = String.format("%16d", timestamp); List pkCols = new ArrayList(); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK1_STREAM_ID, PrimaryKeyValue.fromString(streamId))); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK2_STATUS_TYPE, PrimaryKeyValue.fromString(StatusTableConstants.STATUS_TYPE_JOB_DESC))); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK3_STATUS_VALUE, PrimaryKeyValue.fromString(statusValue))); PrimaryKey primaryKey = new PrimaryKey(pkCols); return primaryKey; } public PrimaryKey getPrimaryKeyForShardCount(long timestamp) { String statusValue = String.format("%16d", timestamp); List pkCols = new ArrayList(); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK1_STREAM_ID, PrimaryKeyValue.fromString(streamId))); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK2_STATUS_TYPE, PrimaryKeyValue.fromString(StatusTableConstants.STATUS_TYPE_CHECKPOINT))); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK3_STATUS_VALUE, PrimaryKeyValue.fromString(statusValue))); PrimaryKey primaryKey = new PrimaryKey(pkCols); return primaryKey; } private PrimaryKey getPrimaryKeyForShardTimeCheckpoint(String shardId, long timestamp) { String statusValue = shardId + StatusTableConstants.TIME_SHARD_SEPARATOR + String.format("%16d", timestamp); List pkCols = new ArrayList(); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK1_STREAM_ID, PrimaryKeyValue.fromString(streamId))); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK2_STATUS_TYPE, PrimaryKeyValue.fromString(StatusTableConstants.STATUS_TYPE_SHARD_CHECKPOINT))); pkCols.add(new PrimaryKeyColumn(StatusTableConstants.PK3_STATUS_VALUE, PrimaryKeyValue.fromString(statusValue))); PrimaryKey primaryKey = new PrimaryKey(pkCols); return primaryKey; } private PutRowRequest getOTSRequestForSetShardTimeCheckpoint(String shardId, long timestamp, String checkpointValue) { PrimaryKey primaryKey = getPrimaryKeyForShardTimeCheckpoint(shardId, timestamp); RowPutChange rowPutChange = new RowPutChange(statusTable, primaryKey); rowPutChange.addColumn(StatusTableConstants.CHECKPOINT_COLUMN_NAME, ColumnValue.fromString(checkpointValue)); PutRowRequest putRowRequest = new PutRowRequest(rowPutChange); return putRowRequest; } private GetRowRequest getOTSRequestForGet(PrimaryKey primaryKey) { SingleRowQueryCriteria rowQueryCriteria = new SingleRowQueryCriteria(statusTable, primaryKey); rowQueryCriteria.setMaxVersions(1); GetRowRequest getRowRequest = new GetRowRequest(rowQueryCriteria); return getRowRequest; } private Iterator getRangeIteratorForGetAllCheckpoints(SyncClientInterface client, long timestamp) { RangeIteratorParameter param = new RangeIteratorParameter(statusTable); PrimaryKey startPk = getPrimaryKeyForCheckpoint(timestamp, ""); PrimaryKey endPk = getPrimaryKeyForCheckpoint(timestamp, StatusTableConstants.LARGEST_SHARD_ID); param.setMaxVersions(1); param.setInclusiveStartPrimaryKey(startPk); param.setExclusiveEndPrimaryKey(endPk); return client.createRangeIterator(param); } private DeleteRowRequest getOTSRequestForDelete(PrimaryKey primaryKey) { RowDeleteChange rowDeleteChange = new RowDeleteChange(statusTable, primaryKey); DeleteRowRequest deleteRowRequest = new DeleteRowRequest(rowDeleteChange); return deleteRowRequest; } public void writeCheckpoint(long timestamp, ShardCheckpoint checkpoint) { writeCheckpoint(timestamp, checkpoint, 0); } public void writeCheckpoint(long timestamp, ShardCheckpoint checkpoint, long sendRecordCount) { LOG.info("Write checkpoint of time '{}' of shard '{}'.", timestamp, checkpoint.getShardId()); PrimaryKey primaryKey = getPrimaryKeyForCheckpoint(timestamp, checkpoint.getShardId()); RowPutChange rowChange = new RowPutChange(statusTable, primaryKey); checkpoint.serializeColumn(rowChange); if (sendRecordCount > 0) { rowChange.addColumn("SendRecordCount", ColumnValue.fromLong(sendRecordCount)); } PutRowRequest request = new PutRowRequest(); request.setRowChange(rowChange); client.putRow(request); } public ShardCheckpoint readCheckpoint(String shardId, long timestamp) { PrimaryKey primaryKey = getPrimaryKeyForCheckpoint(timestamp, shardId); GetRowRequest getRowRequest = getOTSRequestForGet(primaryKey); Row row = client.getRow(getRowRequest).getRow(); if (row == null) { return null; } return ShardCheckpoint.fromRow(shardId, row); } public void writeStreamJob(StreamJob streamJob) { PrimaryKey primaryKey = getPrimaryKeyForJobDesc(streamJob.getEndTimeInMillis()); RowPutChange rowChange = new RowPutChange(statusTable); rowChange.setPrimaryKey(primaryKey); streamJob.serializeColumn(rowChange); PutRowRequest request = new PutRowRequest(); request.setRowChange(rowChange); client.putRow(request); } public StreamJob readStreamJob(long timestamp) { PrimaryKey primaryKey = getPrimaryKeyForJobDesc(timestamp); GetRowRequest request = getOTSRequestForGet(primaryKey); GetRowResponse response = client.getRow(request); return StreamJob.fromRow(response.getRow()); } /** * 获取指定timestamp对应的Job的checkpoint,并检查checkpoint是否完整。 * 若是老版本的Job,则只检查shardCount是否一致。 * 若是新版本的Job,则除了检查shard id列表完全一致,还需要检查每个shard的checkpoint的version是否与job描述内的一致。 * * @param timestamp * @param streamId * @param allCheckpoints * @return 若成功获取上一次Job完整的checkpoint,则返回true,否则返回false */ public boolean getAndCheckAllCheckpoints(long timestamp, String streamId, Map allCheckpoints) { allCheckpoints.clear(); Map allCheckpointsInTable = getAllCheckpoints(timestamp); long shardCount = -1; boolean checkShardCountOnly = false; StreamJob streamJob = readStreamJob(timestamp); if (streamJob == null) { LOG.info("Stream job is not exist, timestamp: {}.", timestamp); // 如果streamJob不存在,则有可能是老版本的Job,尝试读取shardCount shardCount = getShardCountForCheck(timestamp); if (shardCount == -1) { LOG.info("Shard count not found, timestamp: {}.", timestamp); return false; } checkShardCountOnly = true; } if (checkShardCountOnly) { if (shardCount != allCheckpointsInTable.size()) { LOG.info("Shard count not equal, shardCount: {}, checkpointCount: {}.", shardCount, allCheckpoints.size()); return false; } } else { // 检查streamJob内的信息是否与checkpoint一致 if (!streamJob.getStreamId().equals(streamId)) { LOG.info("Stream id of the checkpoint is not equal with current job. StreamIdInCheckpoint: {}, StreamId: {}.", streamJob.getStreamId(), streamId); return false; } if (streamJob.getShardIds().size() != allCheckpointsInTable.size()) { LOG.info( "Shards in stream job is not equal with checkpoint count. " + "StreamJob shard count: {}, checkpoint count: {}.", streamJob.getShardIds().size(), allCheckpointsInTable.size()); return false; } for (String shardId : streamJob.getShardIds()) { ShardCheckpoint checkpoint = allCheckpointsInTable.get(shardId); if (checkpoint == null) { LOG.info("Checkpoint of shard in job is not found. ShardId: {}.", shardId); return false; } if (!checkpoint.getVersion().equals(streamJob.getVersion())) { LOG.info("Version is different. Checkpoint: {}, StreamJob: {}.", checkpoint, streamJob); return false; } } } for (Map.Entry entry : allCheckpointsInTable.entrySet()) { allCheckpoints.put(entry.getKey(), entry.getValue()); } return true; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/IStreamRecordSender.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.core; import com.alicloud.openservices.tablestore.model.StreamRecord; public interface IStreamRecordSender { void sendToDatax(StreamRecord streamRecord); } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_en_US.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/MultiVerModeRecordSender.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.core; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.ColumnValueTransformHelper; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSStreamReaderException; import com.alicloud.openservices.tablestore.model.*; /** * 输出完整的增量变化信息,每一行为一个cell的变更记录,输出样例如下: * | pk1 | pk2 | column_name | timestamp | column_value | op_type | seq_id | * | --- | --- | ----------- | --------- | ------------ | ------- | ------ | * | a | b | col1 | 10928121 | null | DO | 001 | 删除某一列某个特定版本 * | a | b | col2 | null | null | DA | 002 | 删除某一列所有版本 * | a | b | null | null | null | DR | 003 | 删除整行 * | a | b | col1 | 1928821 | abc | U | 004 | 插入一列 * */ public class MultiVerModeRecordSender implements IStreamRecordSender { enum OpType { U, // update DO, // delete one version DA, // delete all version DR // delete row } private final RecordSender dataxRecordSender; private String shardId; private final boolean isExportSequenceInfo; public MultiVerModeRecordSender(RecordSender dataxRecordSender, String shardId, boolean isExportSequenceInfo) { this.dataxRecordSender = dataxRecordSender; this.shardId = shardId; this.isExportSequenceInfo = isExportSequenceInfo; } @Override public void sendToDatax(StreamRecord streamRecord) { int colIdx = 0; switch (streamRecord.getRecordType()) { case PUT: sendToDatax(streamRecord.getPrimaryKey(), OpType.DR, null, getSequenceInfo(streamRecord, colIdx++)); for (RecordColumn recordColumn : streamRecord.getColumns()) { String sequenceInfo = getSequenceInfo(streamRecord, colIdx++); sendToDatax(streamRecord.getPrimaryKey(), recordColumn, sequenceInfo); } break; case UPDATE: for (RecordColumn recordColumn : streamRecord.getColumns()) { String sequenceInfo = getSequenceInfo(streamRecord, colIdx++); sendToDatax(streamRecord.getPrimaryKey(), recordColumn, sequenceInfo); } break; case DELETE: sendToDatax(streamRecord.getPrimaryKey(), OpType.DR, null, getSequenceInfo(streamRecord, colIdx++)); break; default: throw new OTSStreamReaderException("Unknown stream record type: " + streamRecord.getRecordType() + "."); } } private void sendToDatax(PrimaryKey primaryKey, RecordColumn column, String sequenceInfo) { switch (column.getColumnType()) { case PUT: sendToDatax(primaryKey, OpType.U, column.getColumn(), sequenceInfo); break; case DELETE_ONE_VERSION: sendToDatax(primaryKey, OpType.DO, column.getColumn(), sequenceInfo); break; case DELETE_ALL_VERSION: sendToDatax(primaryKey, OpType.DA, column.getColumn(), sequenceInfo); break; default: throw new OTSStreamReaderException("Unknown record column type: " + column.getColumnType() + "."); } } private void sendToDatax(PrimaryKey primaryKey, OpType opType, Column column, String sequenceInfo) { Record line = dataxRecordSender.createRecord(); for (PrimaryKeyColumn pkCol : primaryKey.getPrimaryKeyColumns()) { line.addColumn(ColumnValueTransformHelper.otsPrimaryKeyValueToDataxColumn(pkCol.getValue())); } switch (opType) { case U: line.addColumn(new StringColumn(column.getName())); line.addColumn(new LongColumn(column.getTimestamp())); line.addColumn(ColumnValueTransformHelper.otsColumnValueToDataxColumn(column.getValue())); line.addColumn(new StringColumn("" + opType)); if (isExportSequenceInfo) { line.addColumn(new StringColumn(sequenceInfo)); } break; case DO: line.addColumn(new StringColumn(column.getName())); line.addColumn(new LongColumn(column.getTimestamp())); line.addColumn(new StringColumn(null)); line.addColumn(new StringColumn("" + opType)); if (isExportSequenceInfo) { line.addColumn(new StringColumn(sequenceInfo)); } break; case DA: line.addColumn(new StringColumn(column.getName())); line.addColumn(new StringColumn(null)); line.addColumn(new StringColumn(null)); line.addColumn(new StringColumn("" + opType)); if (isExportSequenceInfo) { line.addColumn(new StringColumn(sequenceInfo)); } break; case DR: line.addColumn(new StringColumn(null)); line.addColumn(new StringColumn(null)); line.addColumn(new StringColumn(null)); line.addColumn(new StringColumn("" + OpType.DR)); if (isExportSequenceInfo) { line.addColumn(new StringColumn(sequenceInfo)); } break; default: throw new OTSStreamReaderException("Unknown operation type: " + opType + "."); } synchronized (dataxRecordSender) { dataxRecordSender.sendToWriter(line); } } private String getSequenceInfo(StreamRecord streamRecord, int colIdx) { int epoch = streamRecord.getSequenceInfo().getEpoch(); long timestamp = streamRecord.getSequenceInfo().getTimestamp(); int rowIdx = streamRecord.getSequenceInfo().getRowIndex(); String sequenceId = String.format("%010d_%020d_%010d_%s:%010d", epoch, timestamp, rowIdx, shardId, colIdx); return sequenceId; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/OTSStreamReaderChecker.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.core; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSStreamReaderException; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.StatusTableConstants; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.ShardCheckpoint; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.OTSHelper; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.TimeUtils; import com.alicloud.openservices.tablestore.*; import com.alicloud.openservices.tablestore.model.*; import com.aliyun.openservices.ots.internal.streamclient.Worker; import com.aliyun.openservices.ots.internal.streamclient.model.CheckpointPosition; import com.aliyun.openservices.ots.internal.streamclient.model.WorkerStatus; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.Date; import java.util.HashMap; import java.util.List; import java.util.Map; public class OTSStreamReaderChecker { private static final Logger LOG = LoggerFactory.getLogger(OTSStreamReaderChecker.class); private final SyncClientInterface ots; private final OTSStreamReaderConfig config; public OTSStreamReaderChecker(SyncClientInterface ots, OTSStreamReaderConfig config) { this.ots = ots; this.config = config; } /** * 1. 检查dataTable是否开启了stream。 * 2. 检查要导出的时间范围是否合理: * 最大可导出的时间范围为: [now - expirationTime, now] * 为了避免时间误差影响,允许导出的范围为: [now - expirationTime + beforeOffset, now - afterOffset] */ public void checkStreamEnabledAndTimeRangeOK() { boolean exists = OTSHelper.checkTableExists(ots, config.getDataTable(), config.isTimeseriesTable()); if (!exists) { throw new OTSStreamReaderException("The data table is not exist."); } StreamDetails streamDetails = OTSHelper.getStreamDetails(ots, config.getDataTable(), config.isTimeseriesTable()); if (streamDetails == null || !streamDetails.isEnableStream()) { throw new OTSStreamReaderException("The stream of data table is not enabled."); } long now = System.currentTimeMillis(); long startTime = config.getStartTimestampMillis(); long endTime = config.getEndTimestampMillis(); long beforeOffset = OTSStreamReaderConstants.BEFORE_OFFSET_TIME_MILLIS; long afterOffset = OTSStreamReaderConstants.AFTER_OFFSET_TIME_MILLIS; long expirationTime = streamDetails.getExpirationTime() * TimeUtils.HOUR_IN_MILLIS; if (startTime < now - expirationTime + beforeOffset) { throw new OTSStreamReaderException("As expiration time is " + expirationTime + ", so the start timestamp must greater than " + TimeUtils.getTimeInISO8601(new Date(now - expirationTime + beforeOffset)) + "(" + (now - expirationTime + beforeOffset )+ ")"); } if (endTime > now - afterOffset) { throw new OTSStreamReaderException("To avoid timing error between different machines, the end timestamp must smaller" + " than " + TimeUtils.getTimeInISO8601(new Date(now - afterOffset)) + "(" + (now - afterOffset) + ")"); } } /** * 检查statusTable的tableMeta * @param tableMeta */ private void checkTableMetaOfStatusTable(TableMeta tableMeta) { List pkSchema = tableMeta.getPrimaryKeyList(); if (!pkSchema.equals(StatusTableConstants.STATUS_TABLE_PK_SCHEMA)) { throw new OTSStreamReaderException("Unexpected table meta in status table, please check your config."); } } /** * 检查statusTable是否存在,如果不存在就创建statusTable,并等待表ready。 */ public void checkAndCreateStatusTableIfNotExist() { boolean tableExist = OTSHelper.checkTableExists(ots, config.getStatusTable(), false); if (tableExist) { DescribeTableResponse describeTableResult = OTSHelper.describeTable(ots, config.getStatusTable()); checkTableMetaOfStatusTable(describeTableResult.getTableMeta()); } else { TableMeta tableMeta = new TableMeta(config.getStatusTable()); tableMeta.addPrimaryKeyColumns(StatusTableConstants.STATUS_TABLE_PK_SCHEMA); TableOptions tableOptions = new TableOptions(OTSStreamReaderConstants.STATUS_TABLE_TTL, 1); OTSHelper.createTable(ots, tableMeta, tableOptions); boolean tableReady = OTSHelper.waitUntilTableReady(ots, config.getStatusTable(), OTSStreamReaderConstants.MAX_WAIT_TABLE_READY_TIME_MILLIS); if (!tableReady) { throw new OTSStreamReaderException("Check table ready timeout, MaxWaitTableReadyTimeMillis:" + OTSStreamReaderConstants.MAX_WAIT_TABLE_READY_TIME_MILLIS + "."); } } } /** * 尝试从状态表中恢复上一次Job执行结束后的checkpoint。 * 若恢复成功,则返回true,否则返回false。 * * @param checkpointTimeTracker * @param allShardsMap *@param streamJob * @param currentShardCheckpointMap @return */ public boolean checkAndSetCheckpoints( CheckpointTimeTracker checkpointTimeTracker, Map allShardsMap, StreamJob streamJob, Map currentShardCheckpointMap) { long timestamp = config.getStartTimestampMillis(); Map allCheckpoints = new HashMap(); boolean gotCheckpoint = checkpointTimeTracker.getAndCheckAllCheckpoints(timestamp, streamJob.getStreamId(), allCheckpoints); if (!gotCheckpoint) { return false; } for (Map.Entry entry : allCheckpoints.entrySet()) { String shardId = entry.getKey(); ShardCheckpoint checkpoint = entry.getValue(); if (!currentShardCheckpointMap.containsKey(shardId)) { // 发现未读完的shard,并且该shard还不在此次任务列表中 if (!checkpoint.getCheckpoint().equals(CheckpointPosition.SHARD_END)) { throw new OTSStreamReaderException("Shard does not exist now, ShardId:" + shardId + ", Checkpoint:" + checkpoint); } } else { currentShardCheckpointMap.put(shardId, new ShardCheckpoint(shardId, streamJob.getVersion(), checkpoint.getCheckpoint(), checkpoint.getSkipCount())); } } return true; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/RecordProcessor.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.core; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.Mode; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSStreamReaderException; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.ShardCheckpoint; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.TimeUtils; import com.alicloud.openservices.tablestore.*; import com.alicloud.openservices.tablestore.model.*; import com.aliyun.openservices.ots.internal.streamclient.model.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicLong; public class RecordProcessor implements Runnable { private static final Logger LOG = LoggerFactory.getLogger(RecordProcessor.class); private static final long RECORD_CHECKPOINT_INTERVAL = 10 * TimeUtils.MINUTE_IN_MILLIS; private final SyncClientInterface ots; private final long startTimestampMillis; private final long endTimestampMillis; private final OTSStreamReaderConfig readerConfig; private boolean shouldSkip; private final CheckpointTimeTracker checkpointTimeTracker; private final RecordSender recordSender; private final boolean isExportSequenceInfo; private IStreamRecordSender otsStreamRecordSender; private long lastRecordCheckpointTime; private StreamJob stream; private StreamShard shard; private ShardCheckpoint startCheckpoint; // read state private String lastShardIterator; private String nextShardIterator; private long skipCount; // running state private long startTime; private long lastProcessTime; private AtomicBoolean stop; private AtomicLong sendRecordCount; //enable seek shardIterator by timestamp private boolean enableSeekShardIteratorByTimestamp; public enum State { READY, // initialized but not start RUNNING, // start to read and process records SUCCEED, // succeed to process all records FAILED, // encounter exception and failed INTERRUPTED // not finish but been interrupted } private State state; public RecordProcessor(SyncClientInterface ots, OTSStreamReaderConfig config, StreamJob stream, StreamShard shardToProcess, ShardCheckpoint startCheckpoint, boolean shouldSkip, CheckpointTimeTracker checkpointTimeTracker, RecordSender recordSender) { this.ots = ots; this.readerConfig = config; this.stream = stream; this.shard = shardToProcess; this.startCheckpoint = startCheckpoint; this.startTimestampMillis = stream.getStartTimeInMillis(); this.endTimestampMillis = stream.getEndTimeInMillis(); this.shouldSkip = shouldSkip; this.checkpointTimeTracker = checkpointTimeTracker; this.recordSender = recordSender; this.isExportSequenceInfo = config.isExportSequenceInfo(); this.lastRecordCheckpointTime = 0; this.enableSeekShardIteratorByTimestamp = config.getEnableSeekIteratorByTimestamp(); // set init state startTime = 0; lastProcessTime = 0; state = State.READY; stop = new AtomicBoolean(true); sendRecordCount = new AtomicLong(0); } public StreamShard getShard() { return shard; } public State getState() { return state; } public long getStartTime() { return startTime; } public long getLastProcessTime() { return lastProcessTime; } public void initialize() { if (readerConfig.getMode().equals(Mode.MULTI_VERSION)) { this.otsStreamRecordSender = new MultiVerModeRecordSender(recordSender, shard.getShardId(), isExportSequenceInfo); } else if (readerConfig.getMode().equals(Mode.SINGLE_VERSION_AND_UPDATE_ONLY)) { this.otsStreamRecordSender = new SingleVerAndUpOnlyModeRecordSender(recordSender, shard.getShardId(), isExportSequenceInfo, readerConfig.getColumns(), readerConfig.getColumnsIsTimeseriesTags()); } else { throw new OTSStreamReaderException("Internal Error. Unhandled Mode: " + readerConfig.getMode()); } if (startCheckpoint.getCheckpoint().equals(CheckpointPosition.TRIM_HORIZON)) { lastShardIterator = null; if (enableSeekShardIteratorByTimestamp) { long beginTimeStamp = startTimestampMillis - 10 * 60 * 1000; if (beginTimeStamp > 0) { nextShardIterator = getShardIteratorWithBeginTime((startTimestampMillis - 10 * 60 * 1000) * 1000); } else { nextShardIterator = ots.getShardIterator(new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId())).getShardIterator(); } } else { nextShardIterator = ots.getShardIterator(new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId())).getShardIterator(); } skipCount = startCheckpoint.getSkipCount(); } else { lastShardIterator = null; nextShardIterator = startCheckpoint.getCheckpoint(); skipCount = startCheckpoint.getSkipCount(); } LOG.info("Initialize record processor. Mode: {}, StartCheckpoint: [{}], ShardId: {}, ShardIterator: {}, SkipCount: {}, enableSeekShardIteratorByTimestamp: {}, startTimestamp: {}.", readerConfig.getMode(), startCheckpoint, shard.getShardId(), nextShardIterator, skipCount, enableSeekShardIteratorByTimestamp, startTimestampMillis); } private long getTimestamp(StreamRecord record) { return record.getSequenceInfo().getTimestamp() / 1000; } void sendRecord(StreamRecord record) { sendRecordCount.incrementAndGet(); otsStreamRecordSender.sendToDatax(record); } @Override public void run() { LOG.info("Start process records with startTime: {}, endTime: {}, nextShardIterator: {}, skipCount: {}.", startTimestampMillis, endTimestampMillis, nextShardIterator, skipCount); try { startTime = System.currentTimeMillis(); lastProcessTime = startTime; boolean finished = false; stop.set(false); state = State.RUNNING; while (!stop.get()) { finished = readAndProcessRecords(); lastProcessTime = System.currentTimeMillis(); if (finished) { break; } if (Thread.currentThread().isInterrupted()) { state = State.INTERRUPTED; break; } } if (finished) { state = State.SUCCEED; } else { state = State.INTERRUPTED; } } catch (Exception e) { LOG.error("Some fatal error has happened, shardId: {}, LastShardIterator: {}, NextShartIterator: {}.", shard.getShardId(), lastShardIterator, nextShardIterator, e); state = State.FAILED; } LOG.info("Finished process records. ShardId: {}, RecordSent: {}.", shard.getShardId(), sendRecordCount.get()); } public void stop() { stop.set(true); } /** * 处理所有记录。 * 当发现已经获取得到完整的时间范围内的数据,则返回true,否则返回false。 * * @param records * @param nextShardIterator * @param mayMoreRecord * @return */ boolean process(List records, String nextShardIterator, Boolean mayMoreRecord) { if (records.isEmpty() && nextShardIterator != null) { // 没有读到更多数据 if (!readerConfig.isEnableTableGroupSupport()) { LOG.info("ProcessFinished: No more data in shard, shardId: {}.", shard.getShardId()); ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), nextShardIterator, 0); checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); checkpointTimeTracker.setShardTimeCheckpoint(shard.getShardId(), endTimestampMillis, nextShardIterator); return true; } else { if (mayMoreRecord == null) { LOG.error("mayMoreRecord can not be null when tablegroup is true"); throw DataXException.asDataXException("mayMoreRecord can not be null when tablegroup is true"); } else if (mayMoreRecord) { return false; } else { LOG.info("ProcessFinished: No more data in shard, shardId: {}.", shard.getShardId()); ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), nextShardIterator, 0); checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); checkpointTimeTracker.setShardTimeCheckpoint(shard.getShardId(), endTimestampMillis, nextShardIterator); return true; } } } int size = records.size(); // 只记录每次Iterator的第一个record作为checkpoint,因为checkpoint只记录shardIterator,而不记录skipCount。 if (!records.isEmpty()) { long firstRecordTimestamp = getTimestamp(records.get(0)); if (firstRecordTimestamp >= lastRecordCheckpointTime + RECORD_CHECKPOINT_INTERVAL) { lastRecordCheckpointTime = firstRecordTimestamp; checkpointTimeTracker.setShardTimeCheckpoint(shard.getShardId(), firstRecordTimestamp, lastShardIterator); } } for (int i = 0; i < size; i++) { long timestamp = getTimestamp(records.get(i)); LOG.debug("Process record with timestamp: {}.", timestamp); if (timestamp < endTimestampMillis) { if (shouldSkip && (timestamp < startTimestampMillis)) { LOG.debug("Skip record out of start time: {}, startTime: {}.", timestamp, startTimestampMillis); continue; } shouldSkip = false; LOG.debug("Send record. Timestamp: {}.", timestamp); sendRecord(records.get(i)); } else { LOG.info("ProcessFinished: Record in shard reach boundary of endTime, shardId: {}. Timestamp: {}, EndTime: {}", shard.getShardId(), timestamp, endTimestampMillis); String newIterator = lastShardIterator; if (i > 0) { newIterator = GetStreamRecordWithLimitRowCount(lastShardIterator, i); } ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), newIterator, 0); checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); return true; } } if (nextShardIterator == null) { LOG.info("ProcessFinished: Shard has reach to end, shardId: {}.", shard.getShardId()); ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), CheckpointPosition.SHARD_END, 0); checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); return true; } return false; } private boolean readAndProcessRecords() { LOG.debug("Read and process records. ShardId: {}, ShardIterator: {}.", shard.getShardId(), nextShardIterator); if (enableSeekShardIteratorByTimestamp && nextShardIterator == null) { LOG.info("ProcessFinished: Shard has reach to end, shardId: {}.", shard.getShardId()); ShardCheckpoint checkpoint = new ShardCheckpoint(shard.getShardId(), stream.getVersion(), CheckpointPosition.SHARD_END, 0); checkpointTimeTracker.writeCheckpoint(endTimestampMillis, checkpoint, sendRecordCount.get()); return true; } GetStreamRecordRequest request = new GetStreamRecordRequest(nextShardIterator); if (readerConfig.isEnableTableGroupSupport()) { request.setTableName(stream.getTableName()); } if (readerConfig.isTimeseriesTable()){ request.setParseInTimeseriesDataFormat(true); } GetStreamRecordResponse response = ots.getStreamRecord(request); lastShardIterator = nextShardIterator; nextShardIterator = response.getNextShardIterator(); return processRecords(response.getRecords(), nextShardIterator, response.getMayMoreRecord()); } private String GetStreamRecordWithLimitRowCount(String beginIterator, int expectedRowCount) { LOG.debug("Read and process records. ShardId: {}, ShardIterator: {}, expectedRowCount: {}..", shard.getShardId(), beginIterator, expectedRowCount); GetStreamRecordRequest request = new GetStreamRecordRequest(beginIterator); request.setLimit(expectedRowCount); GetStreamRecordResponse response = ots.getStreamRecord(request); return response.getNextShardIterator(); } public boolean processRecords(List records, String nextShardIterator, Boolean mayMoreRecord) { long startTime = System.currentTimeMillis(); if (records.isEmpty()) { LOG.info("StartProcessRecords: size: {}.", records.size()); } else { LOG.debug("StartProcessRecords: size: {}, recordTime: {}.", records.size(), getTimestamp(records.get(0))); } if (process(records, nextShardIterator, mayMoreRecord)) { return true; } LOG.debug("ProcessRecords, ProcessShard:{}, ProcessTime: {}, Size:{}, NextShardIterator:{}", shard.getShardId(), System.currentTimeMillis() - startTime, records.size(), nextShardIterator); return false; } private String getShardIteratorWithBeginTime(long timestamp){ LOG.info("Begin to seek shard iterator with timestamp, shardId: {}, timestamp: {}.", shard.getShardId(), timestamp); GetShardIteratorRequest getShardIteratorRequest = new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId()); getShardIteratorRequest.setTimestamp(timestamp); GetShardIteratorResponse response = ots.getShardIterator(getShardIteratorRequest); String nextToken = response.getNextToken(); if (nextToken == null) { return response.getShardIterator(); } while (nextToken != null) { getShardIteratorRequest = new GetShardIteratorRequest(stream.getStreamId(), shard.getShardId()); getShardIteratorRequest.setTimestamp(timestamp); getShardIteratorRequest.setToken(nextToken); response = ots.getShardIterator(getShardIteratorRequest); nextToken = response.getNextToken(); } return response.getShardIterator(); } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/ShardStatusChecker.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.core; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSReaderError; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.ShardCheckpoint; import com.alicloud.openservices.tablestore.model.StreamShard; import com.aliyun.openservices.ots.internal.streamclient.model.CheckpointPosition; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; public class ShardStatusChecker { private static final Logger LOG = LoggerFactory.getLogger(ShardStatusChecker.class); public enum ProcessState { READY, // shard is ready to process records and is not done DONE_NOT_END, // shard is done but not reach end of shard DONE_REACH_END, // shard is done and reach end of shard BLOCK, // shard is block on its parents SKIP // shard is skipped } /** * 1. 若shard没有parent shard,或者其parent shard均已达到END,则该shard需要被处理 * 2. 若shard有parent shard,其已经被处理完毕,且其checkpoint不为END,则该shard不需要再被处理 *

    * 所有确认需要被处理和不需要被处理的shard,都会从allShardToProcess列表中移除 * * @param allShardToProcess * @param allShardsMap * @param checkpointMap * @return */ public static void findShardToProcess( Map allShardToProcess, Map allShardsMap, Map checkpointMap, List shardToProcess, List shardNoNeedToProcess, List shardBlocked) { Map shardStates = new HashMap(); for (Map.Entry entry : allShardToProcess.entrySet()) { determineShardState(entry.getValue().getShardId(), allShardsMap, checkpointMap, shardStates); } for (Map.Entry entry : shardStates.entrySet()) { String shardId = entry.getKey(); if (allShardToProcess.containsKey(shardId)) { StreamShard shard = allShardToProcess.get(shardId); switch (entry.getValue()) { case READY: shardToProcess.add(shard); allShardToProcess.remove(shardId); break; case BLOCK: shardBlocked.add(shard); break; case SKIP: shardNoNeedToProcess.add(shard); allShardToProcess.remove(shardId); break; default: LOG.error("Unexpected state '{}' for shard '{}'.", entry.getValue(), shard); throw DataXException.asDataXException(OTSReaderError.ERROR, "Unexpected state '" + entry.getValue() + "' for shard '" + shard + "'."); } } } } public static ProcessState determineShardState( String shardId, Map allShards, Map allCheckpoints, Map shardStates) { StreamShard shard = allShards.get(shardId); if (shard == null) { // 若发现shard已经不存在,则我们认为shard已经被处理完毕。 // 做出这种判断的前提是: // 若此次任务是延续上次任务的checkpoint,则该shard一定是在上一次任务中checkpoint达到了SHARD_END(在slave初始化时做检查)。 // 若此次任务不是延续上次任务,则对于全新的任务,不存在的shard我们可以认为是处理完毕的,即不需要处理。 LOG.warn("Shard is not found: {}.", shardId); return ProcessState.DONE_REACH_END; } if (shardStates.containsKey(shardId)) { return shardStates.get(shardId); } ProcessState finalState; if (allCheckpoints.containsKey(shardId)) { ShardCheckpoint checkpoint = allCheckpoints.get(shardId); if (checkpoint == null || checkpoint.getCheckpoint() == null) { finalState = ProcessState.READY; } else if (checkpoint.getCheckpoint().equals(CheckpointPosition.SHARD_END)){ finalState = ProcessState.DONE_REACH_END; } else { finalState = ProcessState.DONE_NOT_END; } } else { ProcessState stateOfParent = ProcessState.DONE_REACH_END; String parentId = shard.getParentId(); if (parentId != null) { stateOfParent = determineShardState(parentId, allShards, allCheckpoints, shardStates); } ProcessState stateOfParentSibling = ProcessState.DONE_REACH_END; String parentSiblingId = shard.getParentSiblingId(); if (parentSiblingId != null) { stateOfParentSibling = determineShardState(parentSiblingId, allShards, allCheckpoints, shardStates); } if (stateOfParent == ProcessState.SKIP || stateOfParentSibling == ProcessState.SKIP) { finalState = ProcessState.SKIP; } else if (stateOfParent == ProcessState.DONE_NOT_END || stateOfParentSibling == ProcessState.DONE_NOT_END) { finalState = ProcessState.SKIP; } else if (stateOfParent == ProcessState.BLOCK || stateOfParentSibling == ProcessState.BLOCK) { finalState = ProcessState.BLOCK; } else if (stateOfParent == ProcessState.READY || stateOfParentSibling == ProcessState.READY){ finalState = ProcessState.BLOCK; } else { // stateOfParent == ProcessState.DONE_REACH_END && stateOfParentSibling == ProcessState.DONE_REACH_END finalState = ProcessState.READY; } } shardStates.put(shard.getShardId(), finalState); return finalState; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/core/SingleVerAndUpOnlyModeRecordSender.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.core; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSStreamReaderException; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.ColumnValueTransformHelper; import com.alicloud.openservices.tablestore.core.protocol.timeseries.TimeseriesResponseFactory; import com.alicloud.openservices.tablestore.model.*; import java.util.HashMap; import java.util.List; import java.util.Map; /** * 该输出模式假设用户对数据只有Put和Update操作,无Delete操作,且没有使用多版本。 * 在该种模式下,会整行输出数据,用户必须指定需要导出的列的列名,输出的数据样例如下: * | pk1 | pk2 | col1 | col2 | col3 | sequence id | * | --- | --- | ---- | ---- | ---- | ----------- | * | a | b | c1 | null | null | 001 | *

    * 注意:删除整行,删除某列(某个版本或所有),这些增量信息都会被忽略。 */ public class SingleVerAndUpOnlyModeRecordSender implements IStreamRecordSender { private final RecordSender dataxRecordSender; private final boolean isExportSequenceInfo; private String shardId; private List columnNames; private List columnsIsTimeseriesTags; public SingleVerAndUpOnlyModeRecordSender(RecordSender dataxRecordSender, String shardId, boolean isExportSequenceInfo, List columnNames, List columnsIsTimeseriesTags) { this.dataxRecordSender = dataxRecordSender; this.shardId = shardId; this.isExportSequenceInfo = isExportSequenceInfo; this.columnNames = columnNames; this.columnsIsTimeseriesTags = columnsIsTimeseriesTags; } @Override public void sendToDatax(StreamRecord streamRecord) { String sequenceInfo = getSequenceInfo(streamRecord); switch (streamRecord.getRecordType()) { case PUT: case UPDATE: sendToDatax(streamRecord.getPrimaryKey(), streamRecord.getColumns(), sequenceInfo); break; case DELETE: break; default: throw new OTSStreamReaderException("Unknown stream record type: " + streamRecord.getRecordType() + "."); } } private void sendToDatax(PrimaryKey primaryKey, List columns, String sequenceInfo) { Record line = dataxRecordSender.createRecord(); Map map = new HashMap(); for (PrimaryKeyColumn pkCol : primaryKey.getPrimaryKeyColumns()) { map.put(pkCol.getName(), pkCol.getValue()); } /** * 将时序数据中tags字段的字符串转化为Map */ Map tagsMap = new HashMap<>(); if (columnsIsTimeseriesTags != null && columnsIsTimeseriesTags.contains(true)) { try{ tagsMap = TimeseriesResponseFactory.parseTagsOrAttrs(String.valueOf(map.get("_tags"))); } catch (Exception ex){ throw new OTSStreamReaderException("Parse \"_tags\" fail, please check your config.", ex); } } for (RecordColumn recordColumn : columns) { if (recordColumn.getColumnType().equals(RecordColumn.ColumnType.PUT)) { map.put(recordColumn.getColumn().getName(), recordColumn.getColumn().getValue()); } } boolean findColumn = false; for (int i = 0; i < columnNames.size(); i++) { if (columnsIsTimeseriesTags != null && columnsIsTimeseriesTags.get(i)) { String value = tagsMap.get(columnNames.get(i)); if (value != null) { findColumn = true; line.addColumn(new StringColumn(value)); } else { line.addColumn(new StringColumn(null)); } } else { Object value = map.get(columnNames.get(i)); if (value != null) { findColumn = true; if (value instanceof ColumnValue) { line.addColumn(ColumnValueTransformHelper.otsColumnValueToDataxColumn((ColumnValue) value)); } else { line.addColumn(ColumnValueTransformHelper.otsPrimaryKeyValueToDataxColumn((PrimaryKeyValue) value)); } } else { line.addColumn(new StringColumn(null)); } } } if (!findColumn) { return; } if (isExportSequenceInfo) { line.addColumn(new StringColumn(sequenceInfo)); } synchronized (dataxRecordSender) { dataxRecordSender.sendToWriter(line); } } private String getSequenceInfo(StreamRecord streamRecord) { int epoch = streamRecord.getSequenceInfo().getEpoch(); long timestamp = streamRecord.getSequenceInfo().getTimestamp(); int rowIdx = streamRecord.getSequenceInfo().getRowIndex(); String sequenceId = String.format("%010d_%020d_%010d_%s", epoch, timestamp, rowIdx, shardId); return sequenceId; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_en_US.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/OTSErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.model; /** * 表示来自开放结构化数据服务(Open Table Service,OTS)的错误代码。 * */ public class OTSErrorCode { /** * 用户身份验证失败。 */ public static final String AUTHORIZATION_FAILURE = "OTSAuthFailed"; /** * 服务器内部错误。 */ public static final String INTERNAL_SERVER_ERROR = "OTSInternalServerError"; /** * 参数错误。 */ public static final String INVALID_PARAMETER = "OTSParameterInvalid"; /** * 整个请求过大。 */ public static final String REQUEST_TOO_LARGE = "OTSRequestBodyTooLarge"; /** * 客户端请求超时。 */ public static final String REQUEST_TIMEOUT = "OTSRequestTimeout"; /** * 用户的配额已经用满。 */ public static final String QUOTA_EXHAUSTED = "OTSQuotaExhausted"; /** * 内部服务器发生failover,导致表的部分分区不可服务。 */ public static final String PARTITION_UNAVAILABLE = "OTSPartitionUnavailable"; /** * 表刚被创建还无法立马提供服务。 */ public static final String TABLE_NOT_READY = "OTSTableNotReady"; /** * 请求的表不存在。 */ public static final String OBJECT_NOT_EXIST = "OTSObjectNotExist"; /** * 请求创建的表已经存在。 */ public static final String OBJECT_ALREADY_EXIST = "OTSObjectAlreadyExist"; /** * 多个并发的请求写同一行数据,导致冲突。 */ public static final String ROW_OPEARTION_CONFLICT = "OTSRowOperationConflict"; /** * 主键不匹配。 */ public static final String INVALID_PK = "OTSInvalidPK"; /** * 读写能力调整过于频繁。 */ public static final String TOO_FREQUENT_RESERVED_THROUGHPUT_ADJUSTMENT = "OTSTooFrequentReservedThroughputAdjustment"; /** * 该行总列数超出限制。 */ public static final String OUT_OF_COLUMN_COUNT_LIMIT = "OTSOutOfColumnCountLimit"; /** * 该行所有列数据大小总和超出限制。 */ public static final String OUT_OF_ROW_SIZE_LIMIT = "OTSOutOfRowSizeLimit"; /** * 剩余预留读写能力不足。 */ public static final String NOT_ENOUGH_CAPACITY_UNIT = "OTSNotEnoughCapacityUnit"; /** * 预查条件检查失败。 */ public static final String CONDITION_CHECK_FAIL = "OTSConditionCheckFail"; /** * 在OTS内部操作超时。 */ public static final String STORAGE_TIMEOUT = "OTSTimeout"; /** * 在OTS内部有服务器不可访问。 */ public static final String SERVER_UNAVAILABLE = "OTSServerUnavailable"; /** * OTS内部服务器繁忙。 */ public static final String SERVER_BUSY = "OTSServerBusy"; /** * 流数据已经过期 */ public static final String TRIMMED_DATA_ACCESS = "OTSTrimmedDataAccess"; } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/OTSStreamJobShard.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.model; import com.alicloud.openservices.tablestore.model.StreamShard; import java.util.List; /** * OTS streamJob & allShards model * * @author mingya.wmy (云时) */ public class OTSStreamJobShard { private StreamJob streamJob; private List allShards; public OTSStreamJobShard() { } public OTSStreamJobShard(StreamJob streamJob, List allShards) { this.streamJob = streamJob; this.allShards = allShards; } public StreamJob getStreamJob() { return streamJob; } public void setStreamJob(StreamJob streamJob) { this.streamJob = streamJob; } public List getAllShards() { return allShards; } public void setAllShards(List allShards) { this.allShards = allShards; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/ShardCheckpoint.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.model; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.StatusTableConstants; import com.alicloud.openservices.tablestore.model.ColumnValue; import com.alicloud.openservices.tablestore.model.Row; import com.alicloud.openservices.tablestore.model.RowPutChange; public class ShardCheckpoint { private String shardId; private String version; private String checkpoint; private long skipCount; public ShardCheckpoint(String shardId, String version, String shardIterator, long skipCount) { this.shardId = shardId; this.version = version; this.checkpoint = shardIterator; this.skipCount = skipCount; } public String getShardId() { return shardId; } public void setShardId(String shardId) { this.shardId = shardId; } public String getVersion() { return version; } public void setVersion(String version) { this.version = version; } public String getCheckpoint() { return checkpoint; } public void setCheckpoint(String checkpoint) { this.checkpoint = checkpoint; } public long getSkipCount() { return skipCount; } public void setSkipCount(long skipCount) { this.skipCount = skipCount; } public static ShardCheckpoint fromRow(String shardId, Row row) { String shardIterator = row.getLatestColumn(StatusTableConstants.CHECKPOINT_COLUMN_NAME).getValue().asString(); long skipCount = 0; // compatible with old stream reader if (row.contains(StatusTableConstants.SKIP_COUNT_COLUMN_NAME)) { skipCount = row.getLatestColumn(StatusTableConstants.SKIP_COUNT_COLUMN_NAME).getValue().asLong(); } // compatible with old stream reader String version = ""; if (row.contains(StatusTableConstants.VERSION_COLUMN_NAME)) { version = row.getLatestColumn(StatusTableConstants.VERSION_COLUMN_NAME).getValue().asString(); } return new ShardCheckpoint(shardId, version, shardIterator, skipCount); } public void serializeColumn(RowPutChange rowChange) { rowChange.addColumn(StatusTableConstants.VERSION_COLUMN_NAME, ColumnValue.fromString(version)); rowChange.addColumn(StatusTableConstants.CHECKPOINT_COLUMN_NAME, ColumnValue.fromString(checkpoint)); rowChange.addColumn(StatusTableConstants.SKIP_COUNT_COLUMN_NAME, ColumnValue.fromLong(skipCount)); } @Override public int hashCode() { int result = 31; result = result ^ this.shardId.hashCode(); result = result ^ this.version.hashCode(); result = result ^ this.checkpoint.hashCode(); result = result ^ (int)this.skipCount; return result; } @Override public boolean equals(Object obj) { if (this == obj) { return true; } if (obj == null) { return false; } if (!(obj instanceof ShardCheckpoint)) { return false; } ShardCheckpoint other = (ShardCheckpoint)obj; return this.shardId.equals(other.shardId) && this.version.equals(other.version) && this.checkpoint.equals(other.checkpoint) && this.skipCount == other.skipCount; } @Override public String toString() { StringBuilder sb = new StringBuilder(); sb.append("ShardId: ").append(shardId) .append(", Version: ").append(version) .append(", Checkpoint: ").append(checkpoint) .append(", SkipCount: ").append(skipCount); return sb.toString(); } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/model/StreamJob.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.model; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSReaderError; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.StatusTableConstants; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils.GsonParser; import com.alicloud.openservices.tablestore.core.utils.CompressUtil; import com.alicloud.openservices.tablestore.model.Column; import com.alicloud.openservices.tablestore.model.ColumnValue; import com.alicloud.openservices.tablestore.model.Row; import com.alicloud.openservices.tablestore.model.RowPutChange; import com.google.gson.Gson; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.UnsupportedEncodingException; import java.util.*; import java.util.zip.DataFormatException; import java.util.zip.Deflater; import java.util.zip.Inflater; public class StreamJob { private String tableName; private String streamId; private String version; private Set shardIds; private long startTimeInMillis; private long endTimeInMillis; public StreamJob(String tableName, String streamId, String version, Set shardIds, long startTimestampMillis, long endTimestampMillis) { this.tableName = tableName; this.streamId = streamId; this.version = version; this.shardIds = shardIds; this.startTimeInMillis = startTimestampMillis; this.endTimeInMillis = endTimestampMillis; } public String getTableName() { return tableName; } public void setTableName(String tableName) { this.tableName = tableName; } public String getStreamId() { return streamId; } public void setStreamId(String streamId) { this.streamId = streamId; } public String getVersion() { return version; } public void setVersion(String version) { this.version = version; } public Set getShardIds() { return shardIds; } public void setShardIds(Set shardIds) { this.shardIds = shardIds; } public long getStartTimeInMillis() { return startTimeInMillis; } public void setStartTimeInMillis(long startTimeInMillis) { this.startTimeInMillis = startTimeInMillis; } public long getEndTimeInMillis() { return endTimeInMillis; } public void setEndTimeInMillis(long endTimeInMillis) { this.endTimeInMillis = endTimeInMillis; } public void serializeShardIdList(RowPutChange rowChange, Set shardIds) { try { String json = GsonParser.listToJson(new ArrayList(shardIds)); byte[] content = CompressUtil.compress(new ByteArrayInputStream(json.getBytes("utf-8")), new Deflater()); List columns = new ArrayList(); int index = 0; while (index < content.length) { int endIndex = index + StatusTableConstants.COLUMN_MAX_SIZE; if (endIndex > content.length) { endIndex = content.length; } columns.add(ColumnValue.fromBinary(Arrays.copyOfRange(content, index, endIndex))); index = endIndex; } for (int id = 0; id < columns.size(); id++) { rowChange.addColumn(StatusTableConstants.JOB_SHARD_LIST_PREFIX_COLUMN_NAME + id, columns.get(id)); } } catch (UnsupportedEncodingException e) { throw DataXException.asDataXException(OTSReaderError.ERROR, e); } catch (IOException e) { throw DataXException.asDataXException(OTSReaderError.ERROR, e); } } public static Set deserializeShardIdList(Row row) { ByteArrayOutputStream output = new ByteArrayOutputStream(); try { int id = 0; while (true) { String columnName = StatusTableConstants.JOB_SHARD_LIST_PREFIX_COLUMN_NAME + id; Column column = row.getLatestColumn(columnName); if (column != null) { output.write(column.getValue().asBinary()); id++; } else { break; } } byte[] content = output.toByteArray(); byte[] realContent = CompressUtil.decompress(new ByteArrayInputStream(content), 1024, new Inflater()); String json = new String(realContent, "utf-8"); return new HashSet(GsonParser.jsonToList(json)); } catch (UnsupportedEncodingException e) { throw DataXException.asDataXException(OTSReaderError.ERROR, e); } catch (IOException e) { throw DataXException.asDataXException(OTSReaderError.ERROR, e); } catch (DataFormatException e) { throw DataXException.asDataXException(OTSReaderError.ERROR, e); } } public void serializeColumn(RowPutChange rowChange) { serializeShardIdList(rowChange, shardIds); rowChange.addColumn(StatusTableConstants.JOB_VERSION_COLUMN_NAME, ColumnValue.fromString(version)); rowChange.addColumn(StatusTableConstants.JOB_TABLE_NAME_COLUMN_NAME, ColumnValue.fromString(tableName)); rowChange.addColumn(StatusTableConstants.JOB_STREAM_ID_COLUMN_NAME, ColumnValue.fromString(streamId)); rowChange.addColumn(StatusTableConstants.JOB_START_TIME_COLUMN_NAME, ColumnValue.fromLong(startTimeInMillis)); rowChange.addColumn(StatusTableConstants.JOB_END_TIME_COLUMN_NAME, ColumnValue.fromLong(endTimeInMillis)); } public String toJson() { Gson gson = new Gson(); return gson.toJson(this); } @Override public String toString() { return toJson(); } public static StreamJob fromJson(String json) { Gson gson = new Gson(); return gson.fromJson(json, StreamJob.class); } public static StreamJob fromRow(Row row) { if (row == null) { return null; } Set shardIds = deserializeShardIdList(row); String version = row.getLatestColumn(StatusTableConstants.JOB_VERSION_COLUMN_NAME).getValue().asString(); String tableName = row.getLatestColumn(StatusTableConstants.JOB_TABLE_NAME_COLUMN_NAME).getValue().asString(); String streamId = row.getLatestColumn(StatusTableConstants.JOB_STREAM_ID_COLUMN_NAME).getValue().asString(); long startTime = row.getLatestColumn(StatusTableConstants.JOB_START_TIME_COLUMN_NAME).getValue().asLong(); long endTime = row.getLatestColumn(StatusTableConstants.JOB_END_TIME_COLUMN_NAME).getValue().asLong(); return new StreamJob(tableName, streamId, version, shardIds, startTime, endTime); } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/ColumnValueTransformHelper.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.element.Column; import com.alicloud.openservices.tablestore.model.*; public class ColumnValueTransformHelper { public static Column otsPrimaryKeyValueToDataxColumn(PrimaryKeyValue pkValue) { switch (pkValue.getType()) { case STRING:return new StringColumn(pkValue.asString()); case INTEGER:return new LongColumn(pkValue.asLong()); case BINARY:return new BytesColumn(pkValue.asBinary()); default: throw new IllegalArgumentException("Unknown primary key type: " + pkValue.getType() + "."); } } public static Column otsColumnValueToDataxColumn(ColumnValue columnValue) { switch (columnValue.getType()) { case STRING:return new StringColumn(columnValue.asString()); case INTEGER:return new LongColumn(columnValue.asLong()); case BINARY:return new BytesColumn(columnValue.asBinary()); case BOOLEAN:return new BoolColumn(columnValue.asBoolean()); case DOUBLE:return new DoubleColumn(columnValue.asDouble()); default: throw new IllegalArgumentException("Unknown column type: " + columnValue.getType() + "."); } } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/GsonParser.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alicloud.openservices.tablestore.model.StreamShard; import com.google.gson.GsonBuilder; import com.google.gson.reflect.TypeToken; import java.lang.reflect.Type; import java.util.ArrayList; import java.util.List; public class GsonParser { public static String configToJson(OTSStreamReaderConfig config) { return new GsonBuilder().create().toJson(config); } public static OTSStreamReaderConfig jsonToConfig(String jsonStr) { return new GsonBuilder().create().fromJson(jsonStr, OTSStreamReaderConfig.class); } public static String listToJson(List list) { return new GsonBuilder().create().toJson(list); } public static List jsonToList(String jsonStr) { return new GsonBuilder().create().fromJson(jsonStr, new TypeToken>(){}.getType()); } public static Object toJson(List allShards) { return new GsonBuilder().create().toJson(allShards); } public static List fromJson(String jsonStr) { return new GsonBuilder().create().fromJson(jsonStr, new TypeToken>(){}.getType()); } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_en_US.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_ja_JP.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_CN.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_HK.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/LocalStrings_zh_TW.properties ================================================ ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSHelper.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSRetryStrategyForStreamReader; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alicloud.openservices.tablestore.ClientConfiguration; import com.alicloud.openservices.tablestore.SyncClient; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.TableStoreException; import com.alicloud.openservices.tablestore.model.*; import com.alicloud.openservices.tablestore.model.timeseries.DescribeTimeseriesTableRequest; import com.alicloud.openservices.tablestore.model.timeseries.DescribeTimeseriesTableResponse; import com.aliyun.openservices.ots.internal.streamclient.utils.TimeUtils; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; public class OTSHelper { private static final String TABLE_NOT_READY = "OTSTableNotReady"; private static final String OTS_PARTITION_UNAVAILABLE = "OTSPartitionUnavailable"; private static final String OBJECT_NOT_EXIST = "OTSObjectNotExist"; private static final int CREATE_TABLE_READ_CU = 0; private static final int CREATE_TABLE_WRITE_CU = 0; private static final long CHECK_TABLE_READY_INTERNAL_MILLIS = 100; public static SyncClientInterface getOTSInstance(OTSStreamReaderConfig config) { if (config.getOtsForTest() != null) { return config.getOtsForTest(); // for test } ClientConfiguration clientConfig = new ClientConfiguration(); OTSRetryStrategyForStreamReader retryStrategy = new OTSRetryStrategyForStreamReader(); retryStrategy.setMaxRetries(config.getMaxRetries()); clientConfig.setRetryStrategy(retryStrategy); clientConfig.setConnectionTimeoutInMillisecond(50 * 1000); clientConfig.setSocketTimeoutInMillisecond(50 * 1000); clientConfig.setIoThreadCount(4); clientConfig.setMaxConnections(30); SyncClientInterface ots = new SyncClient(config.getEndpoint(), config.getAccessId(), config.getAccessKey(), config.getInstanceName(), clientConfig); return ots; } public static DescribeStreamResponse getStreamResponse(SyncClientInterface ots, String tableName, boolean isTimeseriesTable) { /** * 对于时序表,需要通过listStream&describeStream两次交互,获取streamID与expirationTime */ ListStreamRequest request = new ListStreamRequest(tableName); ListStreamResponse response = ots.listStream(request); String streamID = null; for (Stream stream : response.getStreams()) { if (stream.getTableName().equals(tableName)) { streamID = stream.getStreamId(); break; } } if (streamID == null) { throw new RuntimeException(String.format("Did not get any stream from table : (\"%s\") .", tableName)); } DescribeStreamRequest describeStreamRequest = new DescribeStreamRequest(streamID); if (isTimeseriesTable) { describeStreamRequest.setSupportTimeseriesTable(true); } DescribeStreamResponse result = ots.describeStream(describeStreamRequest); if(isTimeseriesTable && !result.isTimeseriesDataTable()){ throw new RuntimeException(String.format("The table [%s] is not timeseries data table, please remove the config: {isTimeseriesTable : true}.", tableName)); } return result; } public static StreamDetails getStreamDetails(SyncClientInterface ots, String tableName) { DescribeTableRequest describeTableRequest = new DescribeTableRequest(tableName); DescribeTableResponse result = ots.describeTable(describeTableRequest); return result.getStreamDetails(); } public static StreamDetails getStreamDetails(SyncClientInterface ots, String tableName, boolean isTimeseriesTable) { if (!isTimeseriesTable) { return getStreamDetails(ots, tableName); } else { DescribeStreamResponse result = getStreamResponse(ots, tableName, isTimeseriesTable); //TODO:时序表无法直接获取StreamDetails,需要手动构建。 // 其中lastEnableTime字段暂时无法获取 return new StreamDetails(true, result.getStreamId(), result.getExpirationTime(), 0); } } public static List getOrderedShardList(SyncClientInterface ots, String streamId, boolean isTimeseriesTable) { DescribeStreamRequest describeStreamRequest = new DescribeStreamRequest(streamId); if (isTimeseriesTable) { describeStreamRequest.setSupportTimeseriesTable(true); } DescribeStreamResponse describeStreamResult = ots.describeStream(describeStreamRequest); List shardList = new ArrayList(); shardList.addAll(describeStreamResult.getShards()); while (describeStreamResult.getNextShardId() != null) { describeStreamRequest.setInclusiveStartShardId(describeStreamResult.getNextShardId()); describeStreamResult = ots.describeStream(describeStreamRequest); shardList.addAll(describeStreamResult.getShards()); } return shardList; } public static boolean checkTableExists(SyncClientInterface ots, String tableName, boolean isTimeseriesTable) { boolean exist = false; try { if (isTimeseriesTable) { describeTimeseriesTable(ots, tableName); } else { describeTable(ots, tableName); } exist = true; } catch (TableStoreException ex) { if (!ex.getErrorCode().equals(OBJECT_NOT_EXIST)) { throw ex; } } return exist; } public static DescribeTableResponse describeTable(SyncClientInterface ots, String tableName) { return ots.describeTable(new DescribeTableRequest(tableName)); } public static DescribeTimeseriesTableResponse describeTimeseriesTable(SyncClientInterface ots, String tableName) { return ((SyncClient) ots).asTimeseriesClient().describeTimeseriesTable(new DescribeTimeseriesTableRequest(tableName)); } public static void createTable(SyncClientInterface ots, TableMeta tableMeta, TableOptions tableOptions) { CreateTableRequest request = new CreateTableRequest(tableMeta, tableOptions, new ReservedThroughput(CREATE_TABLE_READ_CU, CREATE_TABLE_WRITE_CU)); ots.createTable(request); } public static boolean waitUntilTableReady(SyncClientInterface ots, String tableName, long maxWaitTimeMillis) { TableMeta tableMeta = describeTable(ots, tableName).getTableMeta(); List startPkCols = new ArrayList(); List endPkCols = new ArrayList(); for (PrimaryKeySchema pkSchema : tableMeta.getPrimaryKeyList()) { startPkCols.add(new PrimaryKeyColumn(pkSchema.getName(), PrimaryKeyValue.INF_MIN)); endPkCols.add(new PrimaryKeyColumn(pkSchema.getName(), PrimaryKeyValue.INF_MAX)); } RangeRowQueryCriteria rangeRowQueryCriteria = new RangeRowQueryCriteria(tableName); rangeRowQueryCriteria.setInclusiveStartPrimaryKey(new PrimaryKey(startPkCols)); rangeRowQueryCriteria.setExclusiveEndPrimaryKey(new PrimaryKey(endPkCols)); rangeRowQueryCriteria.setLimit(1); rangeRowQueryCriteria.setMaxVersions(1); long startTime = System.currentTimeMillis(); while (System.currentTimeMillis() - startTime < maxWaitTimeMillis) { try { GetRangeRequest getRangeRequest = new GetRangeRequest(rangeRowQueryCriteria); ots.getRange(getRangeRequest); return true; } catch (TableStoreException ex) { if (!ex.getErrorCode().equals(OTS_PARTITION_UNAVAILABLE) && !ex.getErrorCode().equals(TABLE_NOT_READY)) { throw ex; } } TimeUtils.sleepMillis(CHECK_TABLE_READY_INTERNAL_MILLIS); } return false; } public static Map toShardMap(List orderedShardList) { Map shardsMap = new HashMap(); for (StreamShard shard : orderedShardList) { shardsMap.put(shard.getShardId(), shard); } return shardsMap; } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/OTSStreamJobShardUtil.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConfig; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.core.CheckpointTimeTracker; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.OTSStreamJobShard; import com.alibaba.datax.plugin.reader.otsstreamreader.internal.model.StreamJob; import com.alibaba.fastjson.JSON; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.StreamShard; import org.apache.commons.lang3.StringUtils; import java.util.List; import java.util.Set; import java.util.concurrent.Callable; import java.util.stream.Collectors; import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.DEFAULT_SLEEP_TIME_IN_MILLS; import static com.alibaba.datax.plugin.reader.otsstreamreader.internal.config.OTSStreamReaderConstants.RETRY_TIMES; /** * @author mingya.wmy (云时) */ public class OTSStreamJobShardUtil { private static OTSStreamJobShard otsStreamJobShard = null; /** * 获取全局OTS StreamJob 和 allShards ,懒汉单例模式,减少对OTS接口交互频次 * 备注:config 和 version 所有TASK 均一样 * * @param config * @param version * @return * @throws Exception */ public static OTSStreamJobShard getOTSStreamJobShard(OTSStreamReaderConfig config, String version) throws Exception { if (otsStreamJobShard == null) { synchronized (OTSHelper.class) { if (otsStreamJobShard == null) { otsStreamJobShard = RetryUtil.executeWithRetry(new Callable() { @Override public OTSStreamJobShard call() throws Exception { return getOTSStreamJobShardByOtsClient(config, version); } }, RETRY_TIMES, DEFAULT_SLEEP_TIME_IN_MILLS, true); } } } return otsStreamJobShard; } /** * 获取OTS StreamJob 和 allShards * * @param config OTS CONF * @param version OTS STREAM VERSION * @return */ private static OTSStreamJobShard getOTSStreamJobShardByOtsClient(OTSStreamReaderConfig config, String version) { // Init ots,Task阶段从OTS中获取 allShards 和 streamJob SyncClientInterface ots = null; try { ots = OTSHelper.getOTSInstance(config); String streamId = OTSHelper.getStreamResponse(ots, config.getDataTable(), config.isTimeseriesTable()).getStreamId(); List allShards = OTSHelper.getOrderedShardList(ots, streamId, config.isTimeseriesTable()); CheckpointTimeTracker checkpointInfoTracker = new CheckpointTimeTracker(ots, config.getStatusTable(), streamId); StreamJob streamJobFromCPT = checkpointInfoTracker.readStreamJob(config.getEndTimestampMillis()); if (!StringUtils.equals(streamJobFromCPT.getVersion(), version)) { throw new RuntimeException(String.format("streamJob version (\"%s\") is not equal to \"%s\", streamJob: %s", streamJobFromCPT.getVersion(), version, JSON.toJSONString(streamJobFromCPT))); } Set shardIdSetsFromTracker = streamJobFromCPT.getShardIds(); if (shardIdSetsFromTracker == null || shardIdSetsFromTracker.isEmpty()) { throw new RuntimeException(String.format("StreamJob [statusTable=%s, streamId=%s] shardIds can't be null!", config.getStatusTable(), streamId)); } Set currentAllStreamShardIdSets = allShards.stream().map(streamShard -> streamShard.getShardId()).collect(Collectors.toSet()); for (String shardId: shardIdSetsFromTracker) { if (!currentAllStreamShardIdSets.contains(shardId)) { allShards.add(new StreamShard(shardId)); } } StreamJob streamJob = new StreamJob(config.getDataTable(), streamId, version, shardIdSetsFromTracker, config.getStartTimestampMillis(), config.getEndTimestampMillis()); return new OTSStreamJobShard(streamJob, allShards); } catch (Throwable e) { throw new DataXException(String.format("Get ots shards error: %s", e.getMessage())); } finally { if (ots != null) { ots.shutdown(); } } } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/ParamChecker.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils; import com.alibaba.datax.common.util.Configuration; public class ParamChecker { private static void throwNotExistException() { throw new IllegalArgumentException("missing the key."); } private static void throwStringLengthZeroException() { throw new IllegalArgumentException("input the key is empty string."); } public static String checkStringAndGet(Configuration param, String key, boolean isTrim) { try { String value = param.getString(key); if (isTrim) { value = value != null ? value.trim() : null; } if (null == value) { throwNotExistException(); } else if (value.length() == 0) { throwStringLengthZeroException(); } return value; } catch(RuntimeException e) { throw e; } } } ================================================ FILE: otsstreamreader/src/main/java/com/alibaba/datax/plugin/reader/otsstreamreader/internal/utils/TimeUtils.java ================================================ package com.alibaba.datax.plugin.reader.otsstreamreader.internal.utils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.DateFormat; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import java.util.TimeZone; public class TimeUtils { public static final long SECOND_IN_MILLIS = 1000; public static final long MINUTE_IN_MILLIS = 60 * 1000; public static final int DAY_IN_SEC = 24 * 60 * 60; public static final long DAY_IN_MILLIS = DAY_IN_SEC * 1000; public static final long HOUR_IN_MILLIS = 60 * MINUTE_IN_MILLIS; private static final Logger LOG = LoggerFactory.getLogger(TimeUtils.class); public static long sleepMillis(long timeToSleepMillis) { if(timeToSleepMillis <= 0L) { return 0L; } else { long startTime = System.currentTimeMillis(); try { Thread.sleep(timeToSleepMillis); } catch (InterruptedException var5) { Thread.interrupted(); LOG.warn("Interrupted while sleeping"); } return System.currentTimeMillis() - startTime; } } public static long parseDateToTimestampMillis(String dateStr) throws ParseException { SimpleDateFormat format = new SimpleDateFormat("yyyyMMdd"); Date date = format.parse(dateStr); return date.getTime(); } public static long parseTimeStringToTimestampMillis(String dateStr) throws ParseException { SimpleDateFormat format = new SimpleDateFormat("yyyyMMddHHmmss"); Date date = format.parse(dateStr); return date.getTime(); } public static String getTimeInISO8601(Date date) { TimeZone tz = TimeZone.getTimeZone("UTC"); DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm'Z'"); df.setTimeZone(tz); String nowAsISO = df.format(date); return nowAsISO; } } ================================================ FILE: otsstreamreader/src/main/resources/log4j2.xml ================================================ %d %p %c{1.} [%t] %m%n ================================================ FILE: otsstreamreader/src/main/resources/plugin.json ================================================ { "name": "otsstreamreader", "class": "com.alibaba.datax.plugin.reader.otsstreamreader.internal.OTSStreamReader", "description": "", "developer": "alibaba" } ================================================ FILE: otsstreamreader/tools/config.json ================================================ { "endpoint" : "", "accessId" : "", "accessKey" : "", "instanceName" : "", "statusTable" : "" } ================================================ FILE: otsstreamreader/tools/tablestore_streamreader_console.py ================================================ #!/bin/usr/env python #-*- coding: utf-8 -*- from optparse import OptionParser import sys import json import tabulate import zlib from ots2 import * class ConsoleConfig: def __init__(self, config_file): f = open(config_file, 'r') config = json.loads(f.read()) self.endpoint = str(config['endpoint']) self.accessid = str(config['accessId']) self.accesskey = str(config['accessKey']) self.instance_name = str(config['instanceName']) self.status_table = str(config['statusTable']) self.ots = OTSClient(self.endpoint, self.accessid, self.accesskey, self.instance_name) def describe_job(config, options): ''' 1. get job's description 2. get all job's checkpoints and check if it is done ''' if not options.stream_id: print "Error: Should set the stream id using '-s' or '--streamid'." sys.exit(-1) if not options.timestamp: print "Error: Should set the timestamp using '-t' or '--timestamp'." sys.exit(-1) pk = [('StreamId', options.stream_id), ('StatusType', 'DataxJobDesc'), ('StatusValue', '%16d' % int(options.timestamp))] consumed, pk, attrs, next_token = config.ots.get_row(config.status_table, pk, [], None, 1) if not attrs: print 'Stream job is not found.' sys.exit(-1) job_detail = parse_job_detail(attrs) print '----------JobDescriptions----------' print json.dumps(job_detail, indent=2) print '-----------------------------------' stream_checkpoints = _list_checkpoints(config, options.stream_id, int(options.timestamp)) cps_headers = ['ShardId', 'SendRecordCount', 'Checkpoint', 'SkipCount', 'Version'] table_content = [] for cp in stream_checkpoints: table_content.append([cp['ShardId'], cp['SendRecordCount'], cp['Checkpoint'], cp['SkipCount'], cp['Version']]) print tabulate.tabulate(table_content, headers=cps_headers) # check if stream job has finished finished = True if len(job_detail['ShardIds']) != len(stream_checkpoints): finished = False for cp in stream_checkpoints: if cp['Version'] != job_detail['Version']: finished = False print '----------JobSummary----------' print 'ShardsCount:', len(job_detail['ShardIds']) print 'CheckPointsCount:', len(stream_checkpoints) print 'JobStatus:', 'Finished' if finished else 'NotFinished' print '------------------------------' def _list_checkpoints(config, stream_id, timestamp): start_pk = [('StreamId', stream_id), ('StatusType', 'CheckpointForDataxReader'), ('StatusValue', '%16d' % timestamp)] end_pk = [('StreamId', stream_id), ('StatusType', 'CheckpointForDataxReader'), ('StatusValue', '%16d' % (timestamp + 1))] consumed_counter = CapacityUnit(0, 0) columns_to_get = [] checkpoints = [] range_iter = config.ots.xget_range( config.status_table, Direction.FORWARD, start_pk, end_pk, consumed_counter, columns_to_get, 100, column_filter=None, max_version=1 ) rows = [] for (primary_key, attrs) in range_iter: checkpoint = {} for attr in attrs: checkpoint[attr[0]] = attr[1] if not checkpoint.has_key('SendRecordCount'): checkpoint['SendRecordCount'] = 0 checkpoint['ShardId'] = primary_key[2][1].split('\t')[1] checkpoints.append(checkpoint) return checkpoints def list_job(config, options): ''' Two options: 1. list all jobs of stream 2. list all jobs and all streams ''' consumed_counter = CapacityUnit(0, 0) if options.stream_id: start_pk = [('StreamId', options.stream_id), ('StatusType', INF_MIN), ('StatusValue', INF_MIN)] end_pk = [('StreamId', options.stream_id), ('StatusType', INF_MAX), ('StatusValue', INF_MAX)] else: start_pk = [('StreamId', INF_MIN), ('StatusType', INF_MIN), ('StatusValue', INF_MIN)] end_pk = [('StreamId', INF_MAX), ('StatusType', INF_MAX), ('StatusValue', INF_MAX)] columns_to_get = [] range_iter = config.ots.xget_range( config.status_table, Direction.FORWARD, start_pk, end_pk, consumed_counter, columns_to_get, None, column_filter=None, max_version=1 ) rows = [] for (primary_key, attrs) in range_iter: if primary_key[1][1] == 'DataxJobDesc': job_detail = parse_job_detail(attrs) rows.append([job_detail['TableName'], job_detail['JobStreamId'], job_detail['EndTime'], job_detail['StartTime'], job_detail['EndTime'], job_detail['Version']]) headers = ['TableName', 'JobStreamId', 'Timestamp', 'StartTime', 'EndTime', 'Version'] print tabulate.tabulate(rows, headers=headers) def parse_job_detail(attrs): job_details = {} shard_ids_content = '' for attr in attrs: if attr[0].startswith('ShardIds_'): shard_ids_content += attr[1] else: job_details[attr[0]] = attr[1] shard_ids = json.loads(zlib.decompress(shard_ids_content)) if not job_details.has_key('Version'): job_details['Version'] = '' if not job_details.has_key('SkipCount'): job_details['SkipCount'] = 0 job_details['ShardIds'] = shard_ids return job_details def parse_time(value): try: return int(value) except Exception,e: return int(time.mktime(time.strptime(value, '%Y-%m-%d %H:%M:%S'))) if __name__ == '__main__': parser = OptionParser() parser.add_option('-c', '--config', dest='config_file', help='path of config file', metavar='tablestore_streamreader_config.json') parser.add_option('-a', '--action', dest='action', help='the action to do', choices = ['describe_job', 'list_job'], metavar='') parser.add_option('-t', '--timestamp', dest='timestamp', help='the timestamp', metavar='') parser.add_option('-s', '--streamid', dest='stream_id', help='the id of stream', metavar='') parser.add_option('-d', '--shardid', dest='shard_id', help='the id of shard', metavar='') options, args = parser.parse_args() if not options.config_file: print "Error: Should set the path of config file using '-c' or '--config'." sys.exit(-1) if not options.action: print "Error: Should set the action using '-a' or '--action'." sys.exit(-1) console_config = ConsoleConfig(options.config_file) if options.action == 'list_job': list_job(console_config, options) elif options.action == 'describe_job': describe_job(console_config, options) ================================================ FILE: otsstreamreader/tools/tabulate.py ================================================ # -*- coding: utf-8 -*- """Pretty-print tabular data.""" from __future__ import print_function from __future__ import unicode_literals from collections import namedtuple, Iterable from platform import python_version_tuple import re if python_version_tuple()[0] < "3": from itertools import izip_longest from functools import partial _none_type = type(None) _int_type = int _long_type = long _float_type = float _text_type = unicode _binary_type = str def _is_file(f): return isinstance(f, file) else: from itertools import zip_longest as izip_longest from functools import reduce, partial _none_type = type(None) _int_type = int _long_type = int _float_type = float _text_type = str _binary_type = bytes import io def _is_file(f): return isinstance(f, io.IOBase) try: import wcwidth # optional wide-character (CJK) support except ImportError: wcwidth = None __all__ = ["tabulate", "tabulate_formats", "simple_separated_format"] __version__ = "0.7.6-dev" # minimum extra space in headers MIN_PADDING = 2 # if True, enable wide-character (CJK) support WIDE_CHARS_MODE = wcwidth is not None Line = namedtuple("Line", ["begin", "hline", "sep", "end"]) DataRow = namedtuple("DataRow", ["begin", "sep", "end"]) # A table structure is suppposed to be: # # --- lineabove --------- # headerrow # --- linebelowheader --- # datarow # --- linebewteenrows --- # ... (more datarows) ... # --- linebewteenrows --- # last datarow # --- linebelow --------- # # TableFormat's line* elements can be # # - either None, if the element is not used, # - or a Line tuple, # - or a function: [col_widths], [col_alignments] -> string. # # TableFormat's *row elements can be # # - either None, if the element is not used, # - or a DataRow tuple, # - or a function: [cell_values], [col_widths], [col_alignments] -> string. # # padding (an integer) is the amount of white space around data values. # # with_header_hide: # # - either None, to display all table elements unconditionally, # - or a list of elements not to be displayed if the table has column headers. # TableFormat = namedtuple("TableFormat", ["lineabove", "linebelowheader", "linebetweenrows", "linebelow", "headerrow", "datarow", "padding", "with_header_hide"]) def _pipe_segment_with_colons(align, colwidth): """Return a segment of a horizontal line with optional colons which indicate column's alignment (as in `pipe` output format).""" w = colwidth if align in ["right", "decimal"]: return ('-' * (w - 1)) + ":" elif align == "center": return ":" + ('-' * (w - 2)) + ":" elif align == "left": return ":" + ('-' * (w - 1)) else: return '-' * w def _pipe_line_with_colons(colwidths, colaligns): """Return a horizontal line with optional colons to indicate column's alignment (as in `pipe` output format).""" segments = [_pipe_segment_with_colons(a, w) for a, w in zip(colaligns, colwidths)] return "|" + "|".join(segments) + "|" def _mediawiki_row_with_attrs(separator, cell_values, colwidths, colaligns): alignment = { "left": '', "right": 'align="right"| ', "center": 'align="center"| ', "decimal": 'align="right"| ' } # hard-coded padding _around_ align attribute and value together # rather than padding parameter which affects only the value values_with_attrs = [' ' + alignment.get(a, '') + c + ' ' for c, a in zip(cell_values, colaligns)] colsep = separator*2 return (separator + colsep.join(values_with_attrs)).rstrip() def _textile_row_with_attrs(cell_values, colwidths, colaligns): cell_values[0] += ' ' alignment = { "left": "<.", "right": ">.", "center": "=.", "decimal": ">." } values = (alignment.get(a, '') + v for a, v in zip(colaligns, cell_values)) return '|' + '|'.join(values) + '|' def _html_begin_table_without_header(colwidths_ignore, colaligns_ignore): # this table header will be suppressed if there is a header row return "\n".join(["

  • ", ""]) def _html_row_with_attrs(celltag, cell_values, colwidths, colaligns): alignment = { "left": '', "right": ' style="text-align: right;"', "center": ' style="text-align: center;"', "decimal": ' style="text-align: right;"' } values_with_attrs = ["<{0}{1}>{2}".format(celltag, alignment.get(a, ''), c) for c, a in zip(cell_values, colaligns)] rowhtml = "" + "".join(values_with_attrs).rstrip() + "" if celltag == "th": # it's a header row, create a new table header rowhtml = "\n".join(["
    ", "", rowhtml, "", ""]) return rowhtml def _moin_row_with_attrs(celltag, cell_values, colwidths, colaligns, header=''): alignment = { "left": '', "right": '', "center": '', "decimal": '' } values_with_attrs = ["{0}{1} {2} ".format(celltag, alignment.get(a, ''), header+c+header) for c, a in zip(cell_values, colaligns)] return "".join(values_with_attrs)+"||" def _latex_line_begin_tabular(colwidths, colaligns, booktabs=False): alignment = { "left": "l", "right": "r", "center": "c", "decimal": "r" } tabular_columns_fmt = "".join([alignment.get(a, "l") for a in colaligns]) return "\n".join(["\\begin{tabular}{" + tabular_columns_fmt + "}", "\\toprule" if booktabs else "\hline"]) LATEX_ESCAPE_RULES = {r"&": r"\&", r"%": r"\%", r"$": r"\$", r"#": r"\#", r"_": r"\_", r"^": r"\^{}", r"{": r"\{", r"}": r"\}", r"~": r"\textasciitilde{}", "\\": r"\textbackslash{}", r"<": r"\ensuremath{<}", r">": r"\ensuremath{>}"} def _latex_row(cell_values, colwidths, colaligns): def escape_char(c): return LATEX_ESCAPE_RULES.get(c, c) escaped_values = ["".join(map(escape_char, cell)) for cell in cell_values] rowfmt = DataRow("", "&", "\\\\") return _build_simple_row(escaped_values, rowfmt) _table_formats = {"simple": TableFormat(lineabove=Line("", "-", " ", ""), linebelowheader=Line("", "-", " ", ""), linebetweenrows=None, linebelow=Line("", "-", " ", ""), headerrow=DataRow("", " ", ""), datarow=DataRow("", " ", ""), padding=0, with_header_hide=["lineabove", "linebelow"]), "plain": TableFormat(lineabove=None, linebelowheader=None, linebetweenrows=None, linebelow=None, headerrow=DataRow("", " ", ""), datarow=DataRow("", " ", ""), padding=0, with_header_hide=None), "grid": TableFormat(lineabove=Line("+", "-", "+", "+"), linebelowheader=Line("+", "=", "+", "+"), linebetweenrows=Line("+", "-", "+", "+"), linebelow=Line("+", "-", "+", "+"), headerrow=DataRow("|", "|", "|"), datarow=DataRow("|", "|", "|"), padding=1, with_header_hide=None), "fancy_grid": TableFormat(lineabove=Line("╒", "═", "╤", "╕"), linebelowheader=Line("╞", "═", "╪", "╡"), linebetweenrows=Line("├", "─", "┼", "┤"), linebelow=Line("╘", "═", "╧", "╛"), headerrow=DataRow("│", "│", "│"), datarow=DataRow("│", "│", "│"), padding=1, with_header_hide=None), "pipe": TableFormat(lineabove=_pipe_line_with_colons, linebelowheader=_pipe_line_with_colons, linebetweenrows=None, linebelow=None, headerrow=DataRow("|", "|", "|"), datarow=DataRow("|", "|", "|"), padding=1, with_header_hide=["lineabove"]), "orgtbl": TableFormat(lineabove=None, linebelowheader=Line("|", "-", "+", "|"), linebetweenrows=None, linebelow=None, headerrow=DataRow("|", "|", "|"), datarow=DataRow("|", "|", "|"), padding=1, with_header_hide=None), "jira": TableFormat(lineabove=None, linebelowheader=None, linebetweenrows=None, linebelow=None, headerrow=DataRow("||", "||", "||"), datarow=DataRow("|", "|", "|"), padding=1, with_header_hide=None), "psql": TableFormat(lineabove=Line("+", "-", "+", "+"), linebelowheader=Line("|", "-", "+", "|"), linebetweenrows=None, linebelow=Line("+", "-", "+", "+"), headerrow=DataRow("|", "|", "|"), datarow=DataRow("|", "|", "|"), padding=1, with_header_hide=None), "rst": TableFormat(lineabove=Line("", "=", " ", ""), linebelowheader=Line("", "=", " ", ""), linebetweenrows=None, linebelow=Line("", "=", " ", ""), headerrow=DataRow("", " ", ""), datarow=DataRow("", " ", ""), padding=0, with_header_hide=None), "mediawiki": TableFormat(lineabove=Line("{| class=\"wikitable\" style=\"text-align: left;\"", "", "", "\n|+ \n|-"), linebelowheader=Line("|-", "", "", ""), linebetweenrows=Line("|-", "", "", ""), linebelow=Line("|}", "", "", ""), headerrow=partial(_mediawiki_row_with_attrs, "!"), datarow=partial(_mediawiki_row_with_attrs, "|"), padding=0, with_header_hide=None), "moinmoin": TableFormat(lineabove=None, linebelowheader=None, linebetweenrows=None, linebelow=None, headerrow=partial(_moin_row_with_attrs,"||",header="'''"), datarow=partial(_moin_row_with_attrs,"||"), padding=1, with_header_hide=None), "html": TableFormat(lineabove=_html_begin_table_without_header, linebelowheader="", linebetweenrows=None, linebelow=Line("\n
    ", "", "", ""), headerrow=partial(_html_row_with_attrs, "th"), datarow=partial(_html_row_with_attrs, "td"), padding=0, with_header_hide=["lineabove"]), "latex": TableFormat(lineabove=_latex_line_begin_tabular, linebelowheader=Line("\\hline", "", "", ""), linebetweenrows=None, linebelow=Line("\\hline\n\\end{tabular}", "", "", ""), headerrow=_latex_row, datarow=_latex_row, padding=1, with_header_hide=None), "latex_booktabs": TableFormat(lineabove=partial(_latex_line_begin_tabular, booktabs=True), linebelowheader=Line("\\midrule", "", "", ""), linebetweenrows=None, linebelow=Line("\\bottomrule\n\\end{tabular}", "", "", ""), headerrow=_latex_row, datarow=_latex_row, padding=1, with_header_hide=None), "tsv": TableFormat(lineabove=None, linebelowheader=None, linebetweenrows=None, linebelow=None, headerrow=DataRow("", "\t", ""), datarow=DataRow("", "\t", ""), padding=0, with_header_hide=None), "textile": TableFormat(lineabove=None, linebelowheader=None, linebetweenrows=None, linebelow=None, headerrow=DataRow("|_. ", "|_.", "|"), datarow=_textile_row_with_attrs, padding=1, with_header_hide=None)} tabulate_formats = list(sorted(_table_formats.keys())) _invisible_codes = re.compile(r"\x1b\[\d*m|\x1b\[\d*\;\d*\;\d*m") # ANSI color codes _invisible_codes_bytes = re.compile(b"\x1b\[\d*m|\x1b\[\d*\;\d*\;\d*m") # ANSI color codes def simple_separated_format(separator): """Construct a simple TableFormat with columns separated by a separator. >>> tsv = simple_separated_format("\\t") ; \ tabulate([["foo", 1], ["spam", 23]], tablefmt=tsv) == 'foo \\t 1\\nspam\\t23' True """ return TableFormat(None, None, None, None, headerrow=DataRow('', separator, ''), datarow=DataRow('', separator, ''), padding=0, with_header_hide=None) def _isconvertible(conv, string): try: n = conv(string) return True except (ValueError, TypeError): return False def _isnumber(string): """ >>> _isnumber("123.45") True >>> _isnumber("123") True >>> _isnumber("spam") False """ return _isconvertible(float, string) def _isint(string, inttype=int): """ >>> _isint("123") True >>> _isint("123.45") False """ return type(string) is inttype or\ (isinstance(string, _binary_type) or isinstance(string, _text_type))\ and\ _isconvertible(inttype, string) def _type(string, has_invisible=True): """The least generic type (type(None), int, float, str, unicode). >>> _type(None) is type(None) True >>> _type("foo") is type("") True >>> _type("1") is type(1) True >>> _type('\x1b[31m42\x1b[0m') is type(42) True >>> _type('\x1b[31m42\x1b[0m') is type(42) True """ if has_invisible and \ (isinstance(string, _text_type) or isinstance(string, _binary_type)): string = _strip_invisible(string) if string is None: return _none_type elif hasattr(string, "isoformat"): # datetime.datetime, date, and time return _text_type elif _isint(string): return int elif _isint(string, _long_type): return int elif _isnumber(string): return float elif isinstance(string, _binary_type): return _binary_type else: return _text_type def _afterpoint(string): """Symbols after a decimal point, -1 if the string lacks the decimal point. >>> _afterpoint("123.45") 2 >>> _afterpoint("1001") -1 >>> _afterpoint("eggs") -1 >>> _afterpoint("123e45") 2 """ if _isnumber(string): if _isint(string): return -1 else: pos = string.rfind(".") pos = string.lower().rfind("e") if pos < 0 else pos if pos >= 0: return len(string) - pos - 1 else: return -1 # no point else: return -1 # not a number def _padleft(width, s): """Flush right. >>> _padleft(6, '\u044f\u0439\u0446\u0430') == ' \u044f\u0439\u0446\u0430' True """ fmt = "{0:>%ds}" % width return fmt.format(s) def _padright(width, s): """Flush left. >>> _padright(6, '\u044f\u0439\u0446\u0430') == '\u044f\u0439\u0446\u0430 ' True """ fmt = "{0:<%ds}" % width return fmt.format(s) def _padboth(width, s): """Center string. >>> _padboth(6, '\u044f\u0439\u0446\u0430') == ' \u044f\u0439\u0446\u0430 ' True """ fmt = "{0:^%ds}" % width return fmt.format(s) def _strip_invisible(s): "Remove invisible ANSI color codes." if isinstance(s, _text_type): return re.sub(_invisible_codes, "", s) else: # a bytestring return re.sub(_invisible_codes_bytes, "", s) def _visible_width(s): """Visible width of a printed string. ANSI color codes are removed. >>> _visible_width('\x1b[31mhello\x1b[0m'), _visible_width("world") (5, 5) """ # optional wide-character support if wcwidth is not None and WIDE_CHARS_MODE: len_fn = wcwidth.wcswidth else: len_fn = len if isinstance(s, _text_type) or isinstance(s, _binary_type): return len_fn(_strip_invisible(s)) else: return len_fn(_text_type(s)) def _align_column(strings, alignment, minwidth=0, has_invisible=True): """[string] -> [padded_string] >>> list(map(str,_align_column(["12.345", "-1234.5", "1.23", "1234.5", "1e+234", "1.0e234"], "decimal"))) [' 12.345 ', '-1234.5 ', ' 1.23 ', ' 1234.5 ', ' 1e+234 ', ' 1.0e234'] >>> list(map(str,_align_column(['123.4', '56.7890'], None))) ['123.4', '56.7890'] """ if alignment == "right": strings = [s.strip() for s in strings] padfn = _padleft elif alignment == "center": strings = [s.strip() for s in strings] padfn = _padboth elif alignment == "decimal": if has_invisible: decimals = [_afterpoint(_strip_invisible(s)) for s in strings] else: decimals = [_afterpoint(s) for s in strings] maxdecimals = max(decimals) strings = [s + (maxdecimals - decs) * " " for s, decs in zip(strings, decimals)] padfn = _padleft elif not alignment: return strings else: strings = [s.strip() for s in strings] padfn = _padright enable_widechars = wcwidth is not None and WIDE_CHARS_MODE if has_invisible: width_fn = _visible_width elif enable_widechars: # optional wide-character support if available width_fn = wcwidth.wcswidth else: width_fn = len s_lens = list(map(len, strings)) s_widths = list(map(width_fn, strings)) maxwidth = max(max(s_widths), minwidth) if not enable_widechars and not has_invisible: padded_strings = [padfn(maxwidth, s) for s in strings] else: # enable wide-character width corrections visible_widths = [maxwidth - (w - l) for w, l in zip(s_widths, s_lens)] # wcswidth and _visible_width don't count invisible characters; # padfn doesn't need to apply another correction padded_strings = [padfn(w, s) for s, w in zip(strings, visible_widths)] return padded_strings def _more_generic(type1, type2): types = { _none_type: 0, int: 1, float: 2, _binary_type: 3, _text_type: 4 } invtypes = { 4: _text_type, 3: _binary_type, 2: float, 1: int, 0: _none_type } moregeneric = max(types.get(type1, 4), types.get(type2, 4)) return invtypes[moregeneric] def _column_type(strings, has_invisible=True): """The least generic type all column values are convertible to. >>> _column_type(["1", "2"]) is _int_type True >>> _column_type(["1", "2.3"]) is _float_type True >>> _column_type(["1", "2.3", "four"]) is _text_type True >>> _column_type(["four", '\u043f\u044f\u0442\u044c']) is _text_type True >>> _column_type([None, "brux"]) is _text_type True >>> _column_type([1, 2, None]) is _int_type True >>> import datetime as dt >>> _column_type([dt.datetime(1991,2,19), dt.time(17,35)]) is _text_type True """ types = [_type(s, has_invisible) for s in strings ] return reduce(_more_generic, types, int) def _format(val, valtype, floatfmt, missingval="", has_invisible=True): """Format a value accoding to its type. Unicode is supported: >>> hrow = ['\u0431\u0443\u043a\u0432\u0430', '\u0446\u0438\u0444\u0440\u0430'] ; \ tbl = [['\u0430\u0437', 2], ['\u0431\u0443\u043a\u0438', 4]] ; \ good_result = '\\u0431\\u0443\\u043a\\u0432\\u0430 \\u0446\\u0438\\u0444\\u0440\\u0430\\n------- -------\\n\\u0430\\u0437 2\\n\\u0431\\u0443\\u043a\\u0438 4' ; \ tabulate(tbl, headers=hrow) == good_result True """ if val is None: return missingval if valtype in [int, _text_type]: return "{0}".format(val) elif valtype is _binary_type: try: return _text_type(val, "ascii") except TypeError: return _text_type(val) elif valtype is float: is_a_colored_number = has_invisible and isinstance(val, (_text_type, _binary_type)) if is_a_colored_number: raw_val = _strip_invisible(val) formatted_val = format(float(raw_val), floatfmt) return val.replace(raw_val, formatted_val) else: return format(float(val), floatfmt) else: return "{0}".format(val) def _align_header(header, alignment, width, visible_width): "Pad string header to width chars given known visible_width of the header." width += len(header) - visible_width if alignment == "left": return _padright(width, header) elif alignment == "center": return _padboth(width, header) elif not alignment: return "{0}".format(header) else: return _padleft(width, header) def _prepend_row_index(rows, index): """Add a left-most index column.""" if index is None or index is False: return rows if len(index) != len(rows): print('index=', index) print('rows=', rows) raise ValueError('index must be as long as the number of data rows') rows = [[v]+list(row) for v,row in zip(index, rows)] return rows def _bool(val): "A wrapper around standard bool() which doesn't throw on NumPy arrays" try: return bool(val) except ValueError: # val is likely to be a numpy array with many elements return False def _normalize_tabular_data(tabular_data, headers, showindex="default"): """Transform a supported data type to a list of lists, and a list of headers. Supported tabular data types: * list-of-lists or another iterable of iterables * list of named tuples (usually used with headers="keys") * list of dicts (usually used with headers="keys") * list of OrderedDicts (usually used with headers="keys") * 2D NumPy arrays * NumPy record arrays (usually used with headers="keys") * dict of iterables (usually used with headers="keys") * pandas.DataFrame (usually used with headers="keys") The first row can be used as headers if headers="firstrow", column indices can be used as headers if headers="keys". If showindex="default", show row indices of the pandas.DataFrame. If showindex="always", show row indices for all types of data. If showindex="never", don't show row indices for all types of data. If showindex is an iterable, show its values as row indices. """ try: bool(headers) is_headers2bool_broken = False except ValueError: # numpy.ndarray, pandas.core.index.Index, ... is_headers2bool_broken = True headers = list(headers) index = None if hasattr(tabular_data, "keys") and hasattr(tabular_data, "values"): # dict-like and pandas.DataFrame? if hasattr(tabular_data.values, "__call__"): # likely a conventional dict keys = tabular_data.keys() rows = list(izip_longest(*tabular_data.values())) # columns have to be transposed elif hasattr(tabular_data, "index"): # values is a property, has .index => it's likely a pandas.DataFrame (pandas 0.11.0) keys = tabular_data.keys() vals = tabular_data.values # values matrix doesn't need to be transposed # for DataFrames add an index per default index = list(tabular_data.index) rows = [list(row) for row in vals] else: raise ValueError("tabular data doesn't appear to be a dict or a DataFrame") if headers == "keys": headers = list(map(_text_type,keys)) # headers should be strings else: # it's a usual an iterable of iterables, or a NumPy array rows = list(tabular_data) if (headers == "keys" and hasattr(tabular_data, "dtype") and getattr(tabular_data.dtype, "names")): # numpy record array headers = tabular_data.dtype.names elif (headers == "keys" and len(rows) > 0 and isinstance(rows[0], tuple) and hasattr(rows[0], "_fields")): # namedtuple headers = list(map(_text_type, rows[0]._fields)) elif (len(rows) > 0 and isinstance(rows[0], dict)): # dict or OrderedDict uniq_keys = set() # implements hashed lookup keys = [] # storage for set if headers == "firstrow": firstdict = rows[0] if len(rows) > 0 else {} keys.extend(firstdict.keys()) uniq_keys.update(keys) rows = rows[1:] for row in rows: for k in row.keys(): #Save unique items in input order if k not in uniq_keys: keys.append(k) uniq_keys.add(k) if headers == 'keys': headers = keys elif isinstance(headers, dict): # a dict of headers for a list of dicts headers = [headers.get(k, k) for k in keys] headers = list(map(_text_type, headers)) elif headers == "firstrow": if len(rows) > 0: headers = [firstdict.get(k, k) for k in keys] headers = list(map(_text_type, headers)) else: headers = [] elif headers: raise ValueError('headers for a list of dicts is not a dict or a keyword') rows = [[row.get(k) for k in keys] for row in rows] elif headers == "keys" and len(rows) > 0: # keys are column indices headers = list(map(_text_type, range(len(rows[0])))) # take headers from the first row if necessary if headers == "firstrow" and len(rows) > 0: if index is not None: headers = [index[0]] + list(rows[0]) index = index[1:] else: headers = rows[0] headers = list(map(_text_type, headers)) # headers should be strings rows = rows[1:] headers = list(map(_text_type,headers)) rows = list(map(list,rows)) # add or remove an index column showindex_is_a_str = type(showindex) in [_text_type, _binary_type] if showindex == "default" and index is not None: rows = _prepend_row_index(rows, index) elif isinstance(showindex, Iterable) and not showindex_is_a_str: rows = _prepend_row_index(rows, list(showindex)) elif showindex == "always" or (_bool(showindex) and not showindex_is_a_str): if index is None: index = list(range(len(rows))) rows = _prepend_row_index(rows, index) elif showindex == "never" or (not _bool(showindex) and not showindex_is_a_str): pass # pad with empty headers for initial columns if necessary if headers and len(rows) > 0: nhs = len(headers) ncols = len(rows[0]) if nhs < ncols: headers = [""]*(ncols - nhs) + headers return rows, headers def tabulate(tabular_data, headers=(), tablefmt="simple", floatfmt="g", numalign="decimal", stralign="left", missingval="", showindex="default"): """Format a fixed width table for pretty printing. >>> print(tabulate([[1, 2.34], [-56, "8.999"], ["2", "10001"]])) --- --------- 1 2.34 -56 8.999 2 10001 --- --------- The first required argument (`tabular_data`) can be a list-of-lists (or another iterable of iterables), a list of named tuples, a dictionary of iterables, an iterable of dictionaries, a two-dimensional NumPy array, NumPy record array, or a Pandas' dataframe. Table headers ------------- To print nice column headers, supply the second argument (`headers`): - `headers` can be an explicit list of column headers - if `headers="firstrow"`, then the first row of data is used - if `headers="keys"`, then dictionary keys or column indices are used Otherwise a headerless table is produced. If the number of headers is less than the number of columns, they are supposed to be names of the last columns. This is consistent with the plain-text format of R and Pandas' dataframes. >>> print(tabulate([["sex","age"],["Alice","F",24],["Bob","M",19]], ... headers="firstrow")) sex age ----- ----- ----- Alice F 24 Bob M 19 By default, pandas.DataFrame data have an additional column called row index. To add a similar column to all other types of data, use `showindex="always"` or `showindex=True`. To suppress row indices for all types of data, pass `showindex="never" or `showindex=False`. To add a custom row index column, pass `showindex=some_iterable`. >>> print(tabulate([["F",24],["M",19]], showindex="always")) - - -- 0 F 24 1 M 19 - - -- Column alignment ---------------- `tabulate` tries to detect column types automatically, and aligns the values properly. By default it aligns decimal points of the numbers (or flushes integer numbers to the right), and flushes everything else to the left. Possible column alignments (`numalign`, `stralign`) are: "right", "center", "left", "decimal" (only for `numalign`), and None (to disable alignment). Table formats ------------- `floatfmt` is a format specification used for columns which contain numeric data with a decimal point. `None` values are replaced with a `missingval` string: >>> print(tabulate([["spam", 1, None], ... ["eggs", 42, 3.14], ... ["other", None, 2.7]], missingval="?")) ----- -- ---- spam 1 ? eggs 42 3.14 other ? 2.7 ----- -- ---- Various plain-text table formats (`tablefmt`) are supported: 'plain', 'simple', 'grid', 'pipe', 'orgtbl', 'rst', 'mediawiki', 'latex', and 'latex_booktabs'. Variable `tabulate_formats` contains the list of currently supported formats. "plain" format doesn't use any pseudographics to draw tables, it separates columns with a double space: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], ... ["strings", "numbers"], "plain")) strings numbers spam 41.9999 eggs 451 >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], tablefmt="plain")) spam 41.9999 eggs 451 "simple" format is like Pandoc simple_tables: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], ... ["strings", "numbers"], "simple")) strings numbers --------- --------- spam 41.9999 eggs 451 >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], tablefmt="simple")) ---- -------- spam 41.9999 eggs 451 ---- -------- "grid" is similar to tables produced by Emacs table.el package or Pandoc grid_tables: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], ... ["strings", "numbers"], "grid")) +-----------+-----------+ | strings | numbers | +===========+===========+ | spam | 41.9999 | +-----------+-----------+ | eggs | 451 | +-----------+-----------+ >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], tablefmt="grid")) +------+----------+ | spam | 41.9999 | +------+----------+ | eggs | 451 | +------+----------+ "fancy_grid" draws a grid using box-drawing characters: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], ... ["strings", "numbers"], "fancy_grid")) ╒═══════════╤═══════════╕ │ strings │ numbers │ ╞═══════════╪═══════════╡ │ spam │ 41.9999 │ ├───────────┼───────────┤ │ eggs │ 451 │ ╘═══════════╧═══════════╛ "pipe" is like tables in PHP Markdown Extra extension or Pandoc pipe_tables: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], ... ["strings", "numbers"], "pipe")) | strings | numbers | |:----------|----------:| | spam | 41.9999 | | eggs | 451 | >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], tablefmt="pipe")) |:-----|---------:| | spam | 41.9999 | | eggs | 451 | "orgtbl" is like tables in Emacs org-mode and orgtbl-mode. They are slightly different from "pipe" format by not using colons to define column alignment, and using a "+" sign to indicate line intersections: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], ... ["strings", "numbers"], "orgtbl")) | strings | numbers | |-----------+-----------| | spam | 41.9999 | | eggs | 451 | >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], tablefmt="orgtbl")) | spam | 41.9999 | | eggs | 451 | "rst" is like a simple table format from reStructuredText; please note that reStructuredText accepts also "grid" tables: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], ... ["strings", "numbers"], "rst")) ========= ========= strings numbers ========= ========= spam 41.9999 eggs 451 ========= ========= >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], tablefmt="rst")) ==== ======== spam 41.9999 eggs 451 ==== ======== "mediawiki" produces a table markup used in Wikipedia and on other MediaWiki-based sites: >>> print(tabulate([["strings", "numbers"], ["spam", 41.9999], ["eggs", "451.0"]], ... headers="firstrow", tablefmt="mediawiki")) {| class="wikitable" style="text-align: left;" |+ |- ! strings !! align="right"| numbers |- | spam || align="right"| 41.9999 |- | eggs || align="right"| 451 |} "html" produces HTML markup: >>> print(tabulate([["strings", "numbers"], ["spam", 41.9999], ["eggs", "451.0"]], ... headers="firstrow", tablefmt="html"))
    strings numbers
    spam 41.9999
    eggs 451
    "latex" produces a tabular environment of LaTeX document markup: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], tablefmt="latex")) \\begin{tabular}{lr} \\hline spam & 41.9999 \\\\ eggs & 451 \\\\ \\hline \\end{tabular} "latex_booktabs" produces a tabular environment of LaTeX document markup using the booktabs.sty package: >>> print(tabulate([["spam", 41.9999], ["eggs", "451.0"]], tablefmt="latex_booktabs")) \\begin{tabular}{lr} \\toprule spam & 41.9999 \\\\ eggs & 451 \\\\ \\bottomrule \end{tabular} """ if tabular_data is None: tabular_data = [] list_of_lists, headers = _normalize_tabular_data( tabular_data, headers, showindex=showindex) # optimization: look for ANSI control codes once, # enable smart width functions only if a control code is found plain_text = '\n'.join(['\t'.join(map(_text_type, headers))] + \ ['\t'.join(map(_text_type, row)) for row in list_of_lists]) has_invisible = re.search(_invisible_codes, plain_text) enable_widechars = wcwidth is not None and WIDE_CHARS_MODE if has_invisible: width_fn = _visible_width elif enable_widechars: # optional wide-character support if available width_fn = wcwidth.wcswidth else: width_fn = len # format rows and columns, convert numeric values to strings cols = list(zip(*list_of_lists)) coltypes = list(map(_column_type, cols)) cols = [[_format(v, ct, floatfmt, missingval, has_invisible) for v in c] for c,ct in zip(cols, coltypes)] # align columns aligns = [numalign if ct in [int,float] else stralign for ct in coltypes] minwidths = [width_fn(h) + MIN_PADDING for h in headers] if headers else [0]*len(cols) cols = [_align_column(c, a, minw, has_invisible) for c, a, minw in zip(cols, aligns, minwidths)] if headers: # align headers and add headers t_cols = cols or [['']] * len(headers) t_aligns = aligns or [stralign] * len(headers) minwidths = [max(minw, width_fn(c[0])) for minw, c in zip(minwidths, t_cols)] headers = [_align_header(h, a, minw, width_fn(h)) for h, a, minw in zip(headers, t_aligns, minwidths)] rows = list(zip(*cols)) else: minwidths = [width_fn(c[0]) for c in cols] rows = list(zip(*cols)) if not isinstance(tablefmt, TableFormat): tablefmt = _table_formats.get(tablefmt, _table_formats["simple"]) return _format_table(tablefmt, headers, rows, minwidths, aligns) def _build_simple_row(padded_cells, rowfmt): "Format row according to DataRow format without padding." begin, sep, end = rowfmt return (begin + sep.join(padded_cells) + end).rstrip() def _build_row(padded_cells, colwidths, colaligns, rowfmt): "Return a string which represents a row of data cells." if not rowfmt: return None if hasattr(rowfmt, "__call__"): return rowfmt(padded_cells, colwidths, colaligns) else: return _build_simple_row(padded_cells, rowfmt) def _build_line(colwidths, colaligns, linefmt): "Return a string which represents a horizontal line." if not linefmt: return None if hasattr(linefmt, "__call__"): return linefmt(colwidths, colaligns) else: begin, fill, sep, end = linefmt cells = [fill*w for w in colwidths] return _build_simple_row(cells, (begin, sep, end)) def _pad_row(cells, padding): if cells: pad = " "*padding padded_cells = [pad + cell + pad for cell in cells] return padded_cells else: return cells def _format_table(fmt, headers, rows, colwidths, colaligns): """Produce a plain-text representation of the table.""" lines = [] hidden = fmt.with_header_hide if (headers and fmt.with_header_hide) else [] pad = fmt.padding headerrow = fmt.headerrow padded_widths = [(w + 2*pad) for w in colwidths] padded_headers = _pad_row(headers, pad) padded_rows = [_pad_row(row, pad) for row in rows] if fmt.lineabove and "lineabove" not in hidden: lines.append(_build_line(padded_widths, colaligns, fmt.lineabove)) if padded_headers: lines.append(_build_row(padded_headers, padded_widths, colaligns, headerrow)) if fmt.linebelowheader and "linebelowheader" not in hidden: lines.append(_build_line(padded_widths, colaligns, fmt.linebelowheader)) if padded_rows and fmt.linebetweenrows and "linebetweenrows" not in hidden: # initial rows with a line below for row in padded_rows[:-1]: lines.append(_build_row(row, padded_widths, colaligns, fmt.datarow)) lines.append(_build_line(padded_widths, colaligns, fmt.linebetweenrows)) # the last row without a line below lines.append(_build_row(padded_rows[-1], padded_widths, colaligns, fmt.datarow)) else: for row in padded_rows: lines.append(_build_row(row, padded_widths, colaligns, fmt.datarow)) if fmt.linebelow and "linebelow" not in hidden: lines.append(_build_line(padded_widths, colaligns, fmt.linebelow)) return "\n".join(lines) def _main(): """\ Usage: tabulate [options] [FILE ...] Pretty-print tabular data. See also https://bitbucket.org/astanin/python-tabulate FILE a filename of the file with tabular data; if "-" or missing, read data from stdin. Options: -h, --help show this message -1, --header use the first row of data as a table header -o FILE, --output FILE print table to FILE (default: stdout) -s REGEXP, --sep REGEXP use a custom column separator (default: whitespace) -F FPFMT, --float FPFMT floating point number format (default: g) -f FMT, --format FMT set output table format; supported formats: plain, simple, grid, fancy_grid, pipe, orgtbl, rst, mediawiki, html, latex, latex_booktabs, tsv (default: simple) """ import getopt import sys import textwrap usage = textwrap.dedent(_main.__doc__) try: opts, args = getopt.getopt(sys.argv[1:], "h1o:s:F:f:", ["help", "header", "output", "sep=", "float=", "format="]) except getopt.GetoptError as e: print(e) print(usage) sys.exit(2) headers = [] floatfmt = "g" tablefmt = "simple" sep = r"\s+" outfile = "-" for opt, value in opts: if opt in ["-1", "--header"]: headers = "firstrow" elif opt in ["-o", "--output"]: outfile = value elif opt in ["-F", "--float"]: floatfmt = value elif opt in ["-f", "--format"]: if value not in tabulate_formats: print("%s is not a supported table format" % value) print(usage) sys.exit(3) tablefmt = value elif opt in ["-s", "--sep"]: sep = value elif opt in ["-h", "--help"]: print(usage) sys.exit(0) files = [sys.stdin] if not args else args with (sys.stdout if outfile == "-" else open(outfile, "w")) as out: for f in files: if f == "-": f = sys.stdin if _is_file(f): _pprint_file(f, headers=headers, tablefmt=tablefmt, sep=sep, floatfmt=floatfmt, file=out) else: with open(f) as fobj: _pprint_file(fobj, headers=headers, tablefmt=tablefmt, sep=sep, floatfmt=floatfmt, file=out) def _pprint_file(fobject, headers, tablefmt, sep, floatfmt, file): rows = fobject.readlines() table = [re.split(sep, r.rstrip()) for r in rows if r.strip()] print(tabulate(table, headers, tablefmt, floatfmt=floatfmt), file=file) if __name__ == "__main__": _main() ================================================ FILE: otswriter/doc/otswriter.md ================================================ # OTSWriter 插件文档 ___ ## 1 快速介绍 OTSWriter插件实现了向OTS写入数据,目前支持了多版本数据的写入、主键自增列的写入等功能。 OTS是构建在阿里云飞天分布式系统之上的 NoSQL数据库服务,提供海量结构化数据的存储和实时访问。OTS 以实例和表的形式组织数据,通过数据分片和负载均衡技术,实现规模上的无缝扩展。 ## 2 实现原理 简而言之,OTSWriter通过OTS官方Java SDK连接到OTS服务端,并通过SDK写入OTS服务端。OTSWriter本身对于写入过程做了很多优化,包括写入超时重试、异常写入重试、批量提交等Feature。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个写入OTS作业: `normal模式` ``` { "job": { "setting": { }, "content": [ { "reader": {}, "writer": { "name": "otswriter", "parameter": { "endpoint":"", "accessId":"", "accessKey":"", "instanceName":"", "table":"", // 可选 multiVersion||normal,可选配置,默认normal "mode":"normal", //newVersion定义是否使用新版本插件 可选值:true || false "newVersion":"true", //是否允许向包含主键自增列的ots表中写入数据 //与mode:multiVersion的多版本模式不兼容 "enableAutoIncrement":"true", // 需要导入的PK列名,区分大小写 // 类型支持:STRING,INT,BINARY // 必选 // 1. 支持类型转换,注意类型转换时的精度丢失 // 2. 顺序不要求和表的Meta一致 // 3. name全局唯一 "primaryKey":[ "userid", "groupid" ], // 需要导入的列名,区分大小写 // 类型支持STRING,INT,DOUBLE,BOOL和BINARY // 必选 // 1.name全局唯一 "column":[ {"name":"addr", "type":"string"}, {"name":"height", "type":"int"} ], // 如果用户配置了时间戳,系统将使用配置的时间戳,如果没有配置,使用OTS的系统时间戳 // 可选 "defaultTimestampInMillionSecond": 142722431, // 写入OTS的方式 // PutRow : 等同于OTS API中PutRow操作,检查条件是ignore // UpdateRow : 等同于OTS API中UpdateRow操作,检查条件是ignore "writeMode":"PutRow" } } } ] } } ``` ### 3.2 参数说明 * **endpoint** * 描述:OTS Server的EndPoint(服务地址),例如http://bazhen.cn−hangzhou.ots.aliyuncs.com。 * 必选:是
    * 默认值:无
    * **accessId** * 描述:OTS的accessId
    * 必选:是
    * 默认值:无
    * **accessKey** * 描述:OTS的accessKey
    * 必选:是
    * 默认值:无
    * **instanceName** * 描述:OTS的实例名称,实例是用户使用和管理 OTS 服务的实体,用户在开通 OTS 服务之后,需要通过管理控制台来创建实例,然后在实例内进行表的创建和管理。实例是 OTS 资源管理的基础单元,OTS 对应用程序的访问控制和资源计量都在实例级别完成。
    * 必选:是
    * 默认值:无
    * **table** * 描述:所选取的需要抽取的表名称,这里有且只能填写一张表。在OTS不存在多表同步的需求。
    * 必选:是
    * 默认值:无
    * **newVersion** * 描述:version定义了使用的ots SDK版本。
    * true,新版本插件,使用com.alicloud.openservices.tablestore的依赖(推荐) * false,旧版本插件,使用com.aliyun.openservices.ots的依赖,**不支持多版本数据的读取** * 必选:否
    * 默认值:false
    * **mode** * 描述:是否为多版本数据,目前有两种模式。
    * normal,对应普通的数据 * multiVersion,写入数据为多版本格式的数据,多版本模式下,配置参数有所不同,详见3.4节 * 必选:否
    * 默认值:normal
    * **enableAutoIncrement** * 描述:是否允许向包含主键自增列的ots表中写入数据。
    * true,插件会扫描表中的自增列信息,并在写入数据时自动添加自增列 * false,写入含主键自增列的表时会报错 * 必选:否
    * 默认值:false
    * **isTimeseriesTable** * 描述:写入的对应表是否为时序表,仅在mode=normal模式下生效。
    * true,写入的数据表为时序数据表 * false,写入的数据表为普通的宽表 * 必选:否
    * 默认值:false
    * 在写入时序数据表的模式下,不需要配置`primaryKey`字段,只需要配置`column`字段,配置样例: ```json "column": [ { "name": "_m_name", // 表示度量名称(measurement)字段 }, { "name": "_data_source", // 表示数据源(dataSource)字段 }, { "name": "_tags", // 表示标签字段,会被解析为Map类型 }, { "name": "_time", // 表示时间戳字段,会被解析为long类型的值 }, { "name": "tag_a", "isTag":"true" // 表示标签内部字段,该字段会被解析到标签的字典内部 }, { "name": "column1", // 属性列名称 "type": "string" // 属性列类型,支持 bool string int double binary }, { "name": "column2", "type": "int" } ], ``` * **primaryKey** * 描述: OTS的主键信息,使用JSON的数组描述字段信息。OTS本身是NoSQL系统,在OTSWriter导入数据过程中,必须指定相应地字段名称。 OTS的PrimaryKey只能支持STRING,INT两种类型,因此OTSWriter本身也限定填写上述两种类型。 DataX本身支持类型转换的,因此对于源头数据非String/Int,OTSWriter会进行数据类型转换。 配置实例: ```json "primaryKey":[ "userid", "groupid" ] ``` * 必选:是
    * 默认值:无
    * **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。使用格式为 ```json {"name":"col2", "type":"INT"}, ``` 其中的name指定写入的OTS列名,type指定写入的类型。OTS类型支持STRING,INT,DOUBLE,BOOL和BINARY几种类型 。 写入过程不支持常量、函数或者自定义表达式。 * 必选:是
    * 默认值:无
    * **writeMode** * 描述:写入模式,目前支持两种模式, * PutRow,对应于OTS API PutRow,插入数据到指定的行,如果该行不存在,则新增一行;若该行存在,则覆盖原有行。 * UpdateRow,对应于OTS API UpdateRow,更新指定行的数据,如果该行不存在,则新增一行;若该行存在,则根据请求的内容在这一行中新增、修改或者删除指定列的值。 * 必选:是
    * 默认值:无
    ### 3.3 类型转换 目前OTSWriter支持所有OTS类型,下面列出OTSWriter针对OTS类型转换列表: | DataX 内部类型| OTS 数据类型 | | -------- | ----- | | Long |Integer | | Double |Double| | String |String| | Boolean |Boolean| | Bytes |Binary | * 注意,OTS本身不支持日期型类型。应用层一般使用Long报错时间的Unix TimeStamp。 ### 3.4 multiVersion模式 #### 3.4.1 模式介绍 multiVersion模式解决了ots数据库中多版本数据的导入问题。支持Hbase的全量数据迁移到OTS * 注意:这种模式的数据格式比较特殊,该writer需要reader也提供版本的输出 * 当前只有hbase reader 与 ots reader提供这种模式,使用时切记注意 #### 3.4.2 配置样例 ``` { "job": { "setting": { }, "content": [ { "reader": {}, "writer": { "name": "otswriter", "parameter": { "endpoint":"", "accessId":"", "accessKey":"", "instanceName":"", "table":"", // 多版本模式,插件会按照多版本模式去解析所有配置 "mode":"multiVersion", "newVersion":"true", // 配置PK信息 // 考虑到配置成本,并不需要配置PK在Record(Line)中的位置,要求 // Record的格式固定,PK一定在行首,PK之后是columnName,格式如下: // 如:{pk0,pk1,pk2,pk3}, {columnName}, {timestamp}, {value} "primaryKey":[ "userid", "groupid" ], // 列名前缀过滤 // 描述:hbase导入过来的数据,cf和qulifier共同组成columnName, // OTS并不支持cf,所以需要将cf过滤掉 // 注意: // 1.该参数选填,如果没有填写或者值为空字符串,表示不对列名进行过滤。 // 2.如果datax传入的数据columnName列不是以前缀开始,则将该Record放入脏数据回收器中 "columnNamePrefixFilter":"cf:" } } } ] } } ``` ## 4 约束限制 ### 4.1 写入幂等性 OTS写入本身是支持幂等性的,也就是使用OTS SDK同一条数据写入OTS系统,一次和多次请求的结果可以理解为一致的。因此对于OTSWriter多次尝试写入同一条数据与写入一条数据结果是等同的。 ### 4.2 单任务FailOver 由于OTS写入本身是幂等性的,因此可以支持单任务FailOver。即一旦写入Fail,DataX会重新启动相关子任务进行重试。 ## 5 FAQ * 1.如果使用多版本模式,value为null应该怎么解释? * : 表示删除指定的版本 * 2.如果ts列为空怎么办? * :插件记录为垃圾数据 * 3.Record的count和期望不符? * : 插件异常终止 * 4.在普通模式下,采用UpdateRow的方式写入数据,如果不指定TS,相同行数的数据怎么写入到OTS中? * : 后面的覆盖前面的数据 ================================================ FILE: otswriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT otswriter otswriter com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.aliyun.openservices ots-public 2.2.6 log4j-core org.apache.logging.log4j com.aliyun.openservices tablestore 5.13.10 log4j-core org.apache.logging.log4j com.google.code.gson gson 2.2.4 src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single org.apache.maven.plugins maven-surefire-plugin 2.5 **/unittest/*.java **/functiontest/*.java ================================================ FILE: otswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/otswriter target/ otswriter-0.0.1-SNAPSHOT.jar plugin/writer/otswriter false plugin/writer/otswriter/libs runtime ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/IOtsWriterMasterProxy.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; import com.alibaba.datax.common.util.Configuration; import java.util.List; public interface IOtsWriterMasterProxy { public void init(Configuration param) throws Exception; public void close(); public List split(int mandatoryNumber); } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/IOtsWriterSlaveProxy.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; public interface IOtsWriterSlaveProxy { /** * Slave的初始化,创建Slave所使用的资源 */ public void init(Configuration configuration); /** * 释放Slave的所有资源 */ public void close() throws OTSCriticalException; /** * Slave的执行器,将Datax的数据写入到OTS中 * @param recordReceiver * @throws OTSCriticalException */ public void write(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector) throws OTSCriticalException; } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/Key.java ================================================ /** * (C) 2010-2014 Alibaba Group Holding Limited. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.alibaba.datax.plugin.writer.otswriter; public final class Key { public final static String OTS_ENDPOINT = "endpoint"; public final static String OTS_ACCESSID = "accessId"; public final static String OTS_ACCESSKEY = "accessKey"; public final static String OTS_INSTANCE_NAME = "instanceName"; public final static String ENABLE_AUTO_INCREMENT = "enableAutoIncrement"; public final static String IS_TIMESERIES_TABLE = "isTimeseriesTable"; public final static String TIMEUNIT_FORMAT = "timeunit"; public final static String TABLE_NAME = "table"; public final static String PRIMARY_KEY = "primaryKey"; public final static String COLUMN = "column"; public final static String WRITE_MODE = "writeMode"; public final static String MODE = "mode"; public final static String NEW_VERISON = "newVersion"; public final static String DEFAULT_TIMESTAMP = "defaultTimestampInMillisecond"; public final static String COLUMN_NAME_PREFIX_FILTER = "columnNamePrefixFilter"; } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OTSCriticalException.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; /** * 插件错误异常,该异常主要用于描述插件的异常退出 * @author redchen */ public class OTSCriticalException extends Exception{ private static final long serialVersionUID = 5820460098894295722L; public OTSCriticalException() {} public OTSCriticalException(String message) { super(message); } public OTSCriticalException(Throwable a) { super(a); } public OTSCriticalException(String message, Throwable a) { super(message, a); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OTSErrorCode.java ================================================ /** * Copyright (C) Alibaba Cloud Computing * All rights reserved. * * 版权所有 (C)阿里云计算有限公司 */ package com.alibaba.datax.plugin.writer.otswriter; /** * 表示来自开放结构化数据服务(Open Table Service,OTS)的错误代码。 * */ public interface OTSErrorCode { /** * 用户身份验证失败。 */ static final String AUTHORIZATION_FAILURE = "OTSAuthFailed"; /** * 服务器内部错误。 */ static final String INTERNAL_SERVER_ERROR = "OTSInternalServerError"; /** * 参数错误。 */ static final String INVALID_PARAMETER = "OTSParameterInvalid"; /** * 整个请求过大。 */ static final String REQUEST_TOO_LARGE = "OTSRequestBodyTooLarge"; /** * 客户端请求超时。 */ static final String REQUEST_TIMEOUT = "OTSRequestTimeout"; /** * 用户的配额已经用满。 */ static final String QUOTA_EXHAUSTED = "OTSQuotaExhausted"; /** * 内部服务器发生failover,导致表的部分分区不可服务。 */ static final String PARTITION_UNAVAILABLE = "OTSPartitionUnavailable"; /** * 表刚被创建还无法立马提供服务。 */ static final String TABLE_NOT_READY = "OTSTableNotReady"; /** * 请求的表不存在。 */ static final String OBJECT_NOT_EXIST = "OTSObjectNotExist"; /** * 请求创建的表已经存在。 */ static final String OBJECT_ALREADY_EXIST = "OTSObjectAlreadyExist"; /** * 多个并发的请求写同一行数据,导致冲突。 */ static final String ROW_OPEARTION_CONFLICT = "OTSRowOperationConflict"; /** * 主键不匹配。 */ static final String INVALID_PK = "OTSInvalidPK"; /** * 读写能力调整过于频繁。 */ static final String TOO_FREQUENT_RESERVED_THROUGHPUT_ADJUSTMENT = "OTSTooFrequentReservedThroughputAdjustment"; /** * 该行总列数超出限制。 */ static final String OUT_OF_COLUMN_COUNT_LIMIT = "OTSOutOfColumnCountLimit"; /** * 该行所有列数据大小总和超出限制。 */ static final String OUT_OF_ROW_SIZE_LIMIT = "OTSOutOfRowSizeLimit"; /** * 剩余预留读写能力不足。 */ static final String NOT_ENOUGH_CAPACITY_UNIT = "OTSNotEnoughCapacityUnit"; /** * 预查条件检查失败。 */ static final String CONDITION_CHECK_FAIL = "OTSConditionCheckFail"; /** * 在OTS内部操作超时。 */ static final String STORAGE_TIMEOUT = "OTSTimeout"; /** * 在OTS内部有服务器不可访问。 */ static final String SERVER_UNAVAILABLE = "OTSServerUnavailable"; /** * OTS内部服务器繁忙。 */ static final String SERVER_BUSY = "OTSServerBusy"; } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriter.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConst; import com.alibaba.datax.plugin.writer.otswriter.model.OTSMode; import com.alibaba.datax.plugin.writer.otswriter.utils.GsonParser; import com.alicloud.openservices.tablestore.ClientException; import com.alicloud.openservices.tablestore.TableStoreException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class OtsWriter { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private IOtsWriterMasterProxy proxy; @Override public void init() { LOG.info("init() begin ..."); proxy = new OtsWriterMasterProxy(); try { this.proxy.init(getPluginJobConf()); } catch (TableStoreException e) { LOG.error("OTSException: {}", e.toString(), e); throw DataXException.asDataXException(new OtsWriterError(e.getErrorCode(), "OTS Client Error"), e.toString(), e); } catch (ClientException e) { LOG.error("ClientException: {}", e.toString(), e); throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.toString(), e); throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } LOG.info("init() end ..."); } @Override public void destroy() { this.proxy.close(); } @Override public List split(int mandatoryNumber) { try { return this.proxy.split(mandatoryNumber); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.getMessage(), e); throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private IOtsWriterSlaveProxy proxy = null; /** * 基于配置,构建对应的worker代理 */ @Override public void init() { OTSConf conf = GsonParser.jsonToConf(this.getPluginJobConf().getString(OTSConst.OTS_CONF)); // 是否使用新接口 if(conf.isNewVersion()) { if (conf.getMode() == OTSMode.MULTI_VERSION) { LOG.info("init OtsWriterSlaveProxyMultiVersion"); proxy = new OtsWriterSlaveProxyMultiversion(); } else { LOG.info("init OtsWriterSlaveProxyNormal"); proxy = new OtsWriterSlaveProxyNormal(); } } else{ proxy = new OtsWriterSlaveProxyOld(); } proxy.init(this.getPluginJobConf()); } @Override public void destroy() { try { proxy.close(); } catch (OTSCriticalException e) { LOG.error("OTSCriticalException. ErrorMsg:{}", e.getMessage(), e); throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } } @Override public void startWrite(RecordReceiver lineReceiver) { LOG.info("startWrite() begin ..."); try { proxy.write(lineReceiver, this.getTaskPluginCollector()); } catch (TableStoreException e) { LOG.error("OTSException: {}", e.toString(), e); throw DataXException.asDataXException(new OtsWriterError(e.getErrorCode(), "OTS Client Error"), e.toString(), e); } catch (ClientException e) { LOG.error("ClientException: {}", e.toString(), e); throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } catch (Exception e) { LOG.error("Exception. ErrorMsg:{}", e.toString(), e); throw DataXException.asDataXException(OtsWriterError.ERROR, e.toString(), e); } LOG.info("startWrite() end ..."); } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterError.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; import com.alibaba.datax.common.spi.ErrorCode; public class OtsWriterError implements ErrorCode { private String code; private String description; // TODO // 这一块需要DATAX来统一定义分类, OTS基于这些分类在细化 // 所以暂定两个基础的Error Code,其他错误统一使用OTS的错误码和错误消息 public final static OtsWriterError ERROR = new OtsWriterError( "OtsWriterError", "This error represents an internal error of the ots writer plugin, which indicates that the system is not processed."); public final static OtsWriterError INVALID_PARAM = new OtsWriterError( "OtsWriterInvalidParameter", "This error represents a parameter error, indicating that the user entered the wrong parameter format."); public OtsWriterError (String code) { this.code = code; this.description = code; } public OtsWriterError (String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return "[ code:" + this.code + ", message:" + this.description + "]"; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterMasterProxy.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.otswriter.callable.GetTableMetaCallable; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConst; import com.alibaba.datax.plugin.writer.otswriter.model.OTSMode; import com.alibaba.datax.plugin.writer.otswriter.model.OTSOpType; import com.alibaba.datax.plugin.writer.otswriter.utils.*; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.TimeseriesClient; import com.alicloud.openservices.tablestore.model.TableMeta; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import java.util.concurrent.TimeUnit; public class OtsWriterMasterProxy implements IOtsWriterMasterProxy { private static final Logger LOG = LoggerFactory.getLogger(OtsWriterMasterProxy.class); private OTSConf conf = new OTSConf(); private SyncClientInterface ots = null; private TableMeta meta = null; /** * @param param * @throws Exception */ @Override public void init(Configuration param) throws Exception { // 默认参数 setStaticParams(param); conf.setTimestamp(param.getInt(Key.DEFAULT_TIMESTAMP, -1)); conf.setRequestTotalSizeLimitation(param.getInt(OTSConst.REQUEST_TOTAL_SIZE_LIMITATION, 1024 * 1024)); // 必选参数 conf.setEndpoint(ParamChecker.checkStringAndGet(param, Key.OTS_ENDPOINT)); conf.setAccessId(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSID)); conf.setAccessKey(ParamChecker.checkStringAndGet(param, Key.OTS_ACCESSKEY)); conf.setInstanceName(ParamChecker.checkStringAndGet(param, Key.OTS_INSTANCE_NAME)); conf.setTableName(ParamChecker.checkStringAndGet(param, Key.TABLE_NAME)); ots = Common.getOTSInstance(conf); conf.setNewVersion(param.getBool(Key.NEW_VERISON, false)); conf.setMode(WriterModelParser.parseOTSMode(param.getString(Key.MODE, "normal"))); conf.setEnableAutoIncrement(param.getBool(Key.ENABLE_AUTO_INCREMENT, false)); conf.setTimeseriesTable(param.getBool(Key.IS_TIMESERIES_TABLE, false)); ParamChecker.checkVersion(conf); if (!conf.isTimeseriesTable()){ meta = getTableMeta(ots, conf.getTableName()); LOG.debug("Table Meta : {}", GsonParser.metaToJson(meta)); conf.setPrimaryKeyColumn(WriterModelParser.parseOTSPKColumnList(meta, ParamChecker.checkListAndGet(param, Key.PRIMARY_KEY, true))); } if (conf.getMode() == OTSMode.MULTI_VERSION) { conf.setOperation(OTSOpType.UPDATE_ROW);// 多版本只支持Update模式 conf.setColumnNamePrefixFilter(param.getString(Key.COLUMN_NAME_PREFIX_FILTER, null)); } else if (!conf.isTimeseriesTable()){ // 普通模式,写入宽表 conf.setOperation(WriterModelParser.parseOTSOpType(ParamChecker.checkStringAndGet(param, Key.WRITE_MODE), conf.getMode())); conf.setAttributeColumn(WriterModelParser.parseOTSAttrColumnList(conf.getPrimaryKeyColumn(), ParamChecker.checkListAndGet(param, Key.COLUMN, false), conf.getMode() ) ); ParamChecker.checkAttribute(conf.getAttributeColumn()); } else { // 普通模式,写入时序表 conf.setOperation(OTSOpType.PUT_ROW);// 时序表只支持Put模式 conf.setAttributeColumn(WriterModelParser.parseOTSTimeseriesRowAttrList(ParamChecker.checkListAndGet(param, Key.COLUMN, true))); conf.setTimeUnit(ParamChecker.checkTimeUnitAndGet(param.getString(Key.TIMEUNIT_FORMAT, "MICROSECONDS"))); } /** * 如果配置支持主键列自增 */ if (conf.getEnableAutoIncrement()) { ParamChecker.checkPrimaryKeyWithAutoIncrement(meta, conf.getPrimaryKeyColumn()); conf.setEncodePkColumnMapping(Common.getEncodePkColumnMappingWithAutoIncrement(meta, conf.getPrimaryKeyColumn())); } /** * 如果配置不支持主键列自增 */ else if (!conf.isTimeseriesTable()){ ParamChecker.checkPrimaryKey(meta, conf.getPrimaryKeyColumn()); conf.setEncodePkColumnMapping(Common.getEncodePkColumnMapping(meta, conf.getPrimaryKeyColumn())); } } @Override public List split(int mandatoryNumber) { LOG.info("Begin split and MandatoryNumber : {}", mandatoryNumber); List configurations = new ArrayList(); String json = GsonParser.confToJson(this.conf); for (int i = 0; i < mandatoryNumber; i++) { Configuration configuration = Configuration.newDefault(); configuration.set(OTSConst.OTS_CONF, json); configurations.add(configuration); } LOG.info("End split."); return configurations; } @Override public void close() { ots.shutdown(); } public OTSConf getOTSConf() { return conf; } // private function private TableMeta getTableMeta(SyncClientInterface ots, String tableName) throws Exception { return RetryHelper.executeWithRetry( new GetTableMetaCallable(ots, tableName), conf.getRetry(), conf.getSleepInMillisecond() ); } public void setStaticParams(Configuration param) { // 默认参数 conf.setRetry(param.getInt(OTSConst.RETRY, 18)); conf.setSleepInMillisecond(param.getInt(OTSConst.SLEEP_IN_MILLISECOND, 100)); conf.setBatchWriteCount(param.getInt(OTSConst.BATCH_WRITE_COUNT, 100)); conf.setConcurrencyWrite(param.getInt(OTSConst.CONCURRENCY_WRITE, 5)); conf.setIoThreadCount(param.getInt(OTSConst.IO_THREAD_COUNT, 1)); conf.setSocketTimeoutInMillisecond(param.getInt(OTSConst.SOCKET_TIMEOUTIN_MILLISECOND, 10000)); conf.setConnectTimeoutInMillisecond(param.getInt(OTSConst.CONNECT_TIMEOUT_IN_MILLISECOND, 10000)); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyMultiversion.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.otswriter.model.*; import com.alibaba.datax.plugin.writer.otswriter.utils.CollectorUtil; import com.alibaba.datax.plugin.writer.otswriter.utils.Common; import com.alibaba.datax.plugin.writer.otswriter.utils.GsonParser; import com.alibaba.datax.plugin.writer.otswriter.utils.ParseRecord; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.PrimaryKey; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; import java.util.Map; import static com.alibaba.datax.plugin.writer.otswriter.utils.Common.getOTSInstance; public class OtsWriterSlaveProxyMultiversion implements IOtsWriterSlaveProxy { private OTSConf conf = null; private SyncClientInterface ots = null; private OTSSendBuffer buffer = null; private Map pkColumnMapping = null; private static final Logger LOG = LoggerFactory.getLogger(OtsWriterSlaveProxyMultiversion.class); @Override public void init(Configuration configuration) { LOG.info("OtsWriterSlaveProxyMultiversion init begin"); this.conf = GsonParser.jsonToConf(configuration.getString(OTSConst.OTS_CONF)); this.ots = getOTSInstance(conf); this.pkColumnMapping = Common.getPkColumnMapping(conf.getEncodePkColumnMapping()); buffer = new OTSSendBuffer(ots, conf); LOG.info("init end"); } @Override public void close() throws OTSCriticalException { LOG.info("close begin"); ots.shutdown(); LOG.info("close end"); } @Override public void write(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector) throws OTSCriticalException { LOG.info("write begin"); // 初始化全局垃圾回收器 CollectorUtil.init(taskPluginCollector); // Record format : {PK1, PK2, ...} {ColumnName} {TimeStamp} {Value} int expectColumnCount = conf.getPrimaryKeyColumn().size()+ 3;// 3表示{ColumnName} {TimeStamp} {Value} Record record = null; PrimaryKey lastCellPk = null; List rowBuffer = new ArrayList(); while ((record = recordReceiver.getFromReader()) != null) { LOG.debug("Record Raw: {}", record.toString()); int columnCount = record.getColumnNumber(); if (columnCount != expectColumnCount) { // 如果Column的个数和预期的个数不一致时,认为是系统故障或者用户配置Column错误,异常退出 throw new OTSCriticalException(String.format( OTSErrorMessage.RECORD_AND_COLUMN_SIZE_ERROR, columnCount, expectColumnCount, record.toString() )); } PrimaryKey curPk = null; if ((curPk = Common.getPKFromRecord(this.pkColumnMapping, record)) == null) { continue; } // check same row if (lastCellPk == null) { lastCellPk = curPk; } else if (!lastCellPk.equals(curPk)) { OTSLine line = ParseRecord.parseMultiVersionRecordToOTSLine( conf.getTableName(), conf.getOperation(), pkColumnMapping, conf.getColumnNamePrefixFilter(), lastCellPk, rowBuffer); if (line != null) { buffer.write(line); } rowBuffer.clear(); lastCellPk = curPk; } rowBuffer.add(record); } // Flush剩余数据 if (!rowBuffer.isEmpty()) { OTSLine line = ParseRecord.parseMultiVersionRecordToOTSLine( conf.getTableName(), conf.getOperation(), pkColumnMapping, conf.getColumnNamePrefixFilter(), lastCellPk, rowBuffer); if (line != null) { buffer.write(line); } } buffer.close(); LOG.info("write end"); } public void setOts(SyncClientInterface ots){ this.ots = ots; } public OTSConf getConf() { return conf; } public void setConf(OTSConf conf) { this.conf = conf; } public void setBuffer(OTSSendBuffer buffer) { this.buffer = buffer; } public void setPkColumnMapping(Map pkColumnMapping) { this.pkColumnMapping = pkColumnMapping; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyNormal.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.otswriter.callable.GetTableMetaCallable; import com.alibaba.datax.plugin.writer.otswriter.model.*; import com.alibaba.datax.plugin.writer.otswriter.utils.*; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import com.alicloud.openservices.tablestore.model.TableMeta; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.Map; import static com.alibaba.datax.plugin.writer.otswriter.utils.Common.getOTSInstance; public class OtsWriterSlaveProxyNormal implements IOtsWriterSlaveProxy { private OTSConf conf = null; private SyncClientInterface ots = null; private OTSSendBuffer buffer = null; private Map pkColumnMapping = null; private static final Logger LOG = LoggerFactory.getLogger(OtsWriterSlaveProxyNormal.class); private PrimaryKeySchema primaryKeySchema =null; @Override public void init(Configuration configuration) { LOG.info("init begin"); this.conf = GsonParser.jsonToConf(configuration.getString(OTSConst.OTS_CONF)); this.ots = getOTSInstance(conf); if (!conf.isTimeseriesTable()){ this.pkColumnMapping = Common.getPkColumnMapping(conf.getEncodePkColumnMapping()); } buffer = new OTSSendBuffer(ots, conf); if(conf.getEnableAutoIncrement()){ primaryKeySchema = getAutoIncrementKey(); } LOG.info("init end"); } @Override public void close() throws com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException { LOG.info("close begin"); ots.shutdown(); LOG.info("close end"); } @Override public void write(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector) throws com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException { LOG.info("write begin"); // 初始化全局垃圾回收器 CollectorUtil.init(taskPluginCollector); int expectColumnCount = conf.getAttributeColumn().size(); if (!conf.isTimeseriesTable()){ expectColumnCount += conf.getPrimaryKeyColumn().size(); } Record record = null; while ((record = recordReceiver.getFromReader()) != null) { LOG.debug("Record Raw: {}", record.toString()); int columnCount = record.getColumnNumber(); if (columnCount != expectColumnCount) { // 如果Column的个数和预期的个数不一致时,认为是系统故障或者用户配置Column错误,异常退出 throw new OTSCriticalException(String.format( OTSErrorMessage.RECORD_AND_COLUMN_SIZE_ERROR, columnCount, expectColumnCount, record.toString() )); } OTSLine line; if(conf.getEnableAutoIncrement()){ line = ParseRecord.parseNormalRecordToOTSLineWithAutoIncrement( conf.getTableName(), conf.getOperation(), pkColumnMapping, conf.getAttributeColumn(), record, conf.getTimestamp(), primaryKeySchema); } else if(!conf.isTimeseriesTable()){ line = ParseRecord.parseNormalRecordToOTSLine( conf.getTableName(), conf.getOperation(), pkColumnMapping, conf.getAttributeColumn(), record, conf.getTimestamp()); }else{ line = ParseRecord.parseNormalRecordToOTSLineOfTimeseriesTable(conf.getAttributeColumn(), record, conf.getTimeUnit()); } if (line != null) { buffer.write(line); } } buffer.close(); LOG.info("write end"); } private PrimaryKeySchema getAutoIncrementKey() { TableMeta tableMeta = null; try { tableMeta = RetryHelper.executeWithRetry( new GetTableMetaCallable(ots, conf.getTableName()), conf.getRetry(), conf.getSleepInMillisecond() ); } catch (Exception e) { throw new RuntimeException(e); } for (PrimaryKeySchema primaryKeySchema : tableMeta.getPrimaryKeyList()) { if(primaryKeySchema.hasOption()){ return primaryKeySchema; } } return null; } public void setOts(SyncClientInterface ots){ this.ots = ots; } public OTSConf getConf() { return conf; } public void setConf(OTSConf conf) { this.conf = conf; } public void setBuffer(OTSSendBuffer buffer) { this.buffer = buffer; } public void setPkColumnMapping(Map pkColumnMapping) { this.pkColumnMapping = pkColumnMapping; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/OtsWriterSlaveProxyOld.java ================================================ package com.alibaba.datax.plugin.writer.otswriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConst; import com.alibaba.datax.plugin.writer.otswriter.utils.WriterRetryPolicy; import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; import com.alibaba.datax.plugin.writer.otswriter.utils.WithRecord; import com.alibaba.datax.plugin.writer.otswriter.utils.CommonOld; import com.alibaba.datax.plugin.writer.otswriter.utils.GsonParser; import com.aliyun.openservices.ots.*; import com.aliyun.openservices.ots.internal.OTSCallback; import com.aliyun.openservices.ots.internal.writer.WriterConfig; import com.aliyun.openservices.ots.model.*; import org.apache.commons.math3.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; import java.util.concurrent.Executors; public class OtsWriterSlaveProxyOld implements IOtsWriterSlaveProxy { private static final Logger LOG = LoggerFactory.getLogger(OtsWriterSlaveProxyOld.class); private OTSConf conf; private OTSAsync otsAsync; private OTSWriter otsWriter; private class WriterCallback implements OTSCallback { private TaskPluginCollector collector; public WriterCallback(TaskPluginCollector collector) { this.collector = collector; } @Override public void onCompleted(OTSContext otsContext) { LOG.debug("Write row succeed. PrimaryKey: {}.", otsContext.getOTSRequest().getRowPrimaryKey()); } @Override public void onFailed(OTSContext otsContext, OTSException ex) { LOG.error("Write row failed.", ex); WithRecord withRecord = (WithRecord)otsContext.getOTSRequest(); collector.collectDirtyRecord(withRecord.getRecord(), ex); } @Override public void onFailed(OTSContext otsContext, ClientException ex) { LOG.error("Write row failed.", ex); WithRecord withRecord = (WithRecord)otsContext.getOTSRequest(); collector.collectDirtyRecord(withRecord.getRecord(), ex); } } @Override public void init(Configuration configuration) { conf = GsonParser.jsonToConf(configuration.getString(OTSConst.OTS_CONF)); ClientConfiguration clientConfigure = new ClientConfiguration(); clientConfigure.setIoThreadCount(conf.getIoThreadCount()); clientConfigure.setMaxConnections(conf.getConcurrencyWrite()); clientConfigure.setSocketTimeoutInMillisecond(conf.getSocketTimeout()); // TODO clientConfigure.setConnectionTimeoutInMillisecond(10000); OTSServiceConfiguration otsConfigure = new OTSServiceConfiguration(); otsConfigure.setRetryStrategy(new WriterRetryPolicy(conf)); otsAsync = new OTSClientAsync( conf.getEndpoint(), conf.getAccessId(), conf.getAccessKey(), conf.getInstanceName(), clientConfigure, otsConfigure); } @Override public void close() { otsAsync.shutdown(); } @Override public void write(RecordReceiver recordReceiver, TaskPluginCollector collector) throws OTSCriticalException { LOG.info("Writer slave started."); WriterConfig writerConfig = new WriterConfig(); writerConfig.setConcurrency(conf.getConcurrencyWrite()); writerConfig.setMaxBatchRowsCount(conf.getBatchWriteCount()); // TODO writerConfig.setMaxBatchSize(1024 * 1024); writerConfig.setBufferSize(1024); writerConfig.setMaxAttrColumnSize(2 * 1024 * 1024); writerConfig.setMaxColumnsCount(1024); writerConfig.setMaxPKColumnSize(1024); otsWriter = new DefaultOTSWriter(otsAsync, conf.getTableName(), writerConfig, new WriterCallback(collector), Executors.newFixedThreadPool(3)); int expectColumnCount = conf.getPrimaryKeyColumn().size() + conf.getAttributeColumn().size(); Record record; while ((record = recordReceiver.getFromReader()) != null) { LOG.debug("Record Raw: {}", record.toString()); int columnCount = record.getColumnNumber(); if (columnCount != expectColumnCount) { // 如果Column的个数和预期的个数不一致时,认为是系统故障或者用户配置Column错误,异常退出 throw new IllegalArgumentException(String.format(OTSErrorMessage.RECORD_AND_COLUMN_SIZE_ERROR, columnCount, expectColumnCount, record.toString())); } // 类型转换 try { RowPrimaryKey primaryKey = CommonOld.getPKFromRecord(conf.getPrimaryKeyColumn(), record); List> attributes = CommonOld.getAttrFromRecord(conf.getPrimaryKeyColumn().size(), conf.getAttributeColumn(), record); RowChange rowChange = CommonOld.columnValuesToRowChange(conf.getTableName(), conf.getOperation(), primaryKey, attributes); WithRecord withRecord = (WithRecord)rowChange; withRecord.setRecord(record); otsWriter.addRowChange(rowChange); } catch (IllegalArgumentException e) { LOG.warn("Found dirty data.", e); collector.collectDirtyRecord(record, e.getMessage()); } catch (ClientException e) { LOG.warn("Found dirty data.", e); collector.collectDirtyRecord(record, e.getMessage()); } } otsWriter.close(); LOG.info("Writer slave finished."); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/BatchWriteRowCallable.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.callable; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.BatchWriteRowRequest; import com.alicloud.openservices.tablestore.model.BatchWriteRowResponse; import java.util.concurrent.Callable; public class BatchWriteRowCallable implements Callable{ private SyncClientInterface ots = null; private BatchWriteRowRequest batchWriteRowRequest = null; public BatchWriteRowCallable(SyncClientInterface ots, BatchWriteRowRequest batchWriteRowRequest) { this.ots = ots; this.batchWriteRowRequest = batchWriteRowRequest; } @Override public BatchWriteRowResponse call() throws Exception { return ots.batchWriteRow(batchWriteRowRequest); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallable.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.callable; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.DescribeTableRequest; import com.alicloud.openservices.tablestore.model.DescribeTableResponse; import com.alicloud.openservices.tablestore.model.TableMeta; import java.util.concurrent.Callable; public class GetTableMetaCallable implements Callable{ private SyncClientInterface ots = null; private String tableName = null; public GetTableMetaCallable(SyncClientInterface ots, String tableName) { this.ots = ots; this.tableName = tableName; } @Override public TableMeta call() throws Exception { DescribeTableRequest describeTableRequest = new DescribeTableRequest(tableName); DescribeTableResponse result = ots.describeTable(describeTableRequest); return result.getTableMeta(); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/GetTableMetaCallableOld.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.callable; import com.aliyun.openservices.ots.OTSClient; import com.aliyun.openservices.ots.model.DescribeTableRequest; import com.aliyun.openservices.ots.model.DescribeTableResult; import com.aliyun.openservices.ots.model.TableMeta; import java.util.concurrent.Callable; public class GetTableMetaCallableOld implements Callable{ private OTSClient ots = null; private String tableName = null; public GetTableMetaCallableOld(OTSClient ots, String tableName) { this.ots = ots; this.tableName = tableName; } @Override public TableMeta call() throws Exception { DescribeTableRequest describeTableRequest = new DescribeTableRequest(); describeTableRequest.setTableName(tableName); DescribeTableResult result = ots.describeTable(describeTableRequest); TableMeta tableMeta = result.getTableMeta(); return tableMeta; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/PutRowChangeCallable.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.callable; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.PutRowRequest; import com.alicloud.openservices.tablestore.model.PutRowResponse; import java.util.concurrent.Callable; public class PutRowChangeCallable implements Callable{ private SyncClientInterface ots = null; private PutRowRequest putRowRequest = null; public PutRowChangeCallable(SyncClientInterface ots, PutRowRequest putRowRequest) { this.ots = ots; this.putRowRequest = putRowRequest; } @Override public PutRowResponse call() throws Exception { return ots.putRow(putRowRequest); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/PutTimeseriesDataCallable.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.callable; import com.alicloud.openservices.tablestore.TimeseriesClient; import com.alicloud.openservices.tablestore.model.timeseries.PutTimeseriesDataRequest; import com.alicloud.openservices.tablestore.model.timeseries.PutTimeseriesDataResponse; import java.util.concurrent.Callable; public class PutTimeseriesDataCallable implements Callable { private TimeseriesClient client = null; private PutTimeseriesDataRequest putTimeseriesDataRequest = null; public PutTimeseriesDataCallable(TimeseriesClient client, PutTimeseriesDataRequest putTimeseriesDataRequest) { this.client = client; this.putTimeseriesDataRequest = putTimeseriesDataRequest; } @Override public PutTimeseriesDataResponse call() throws Exception { return client.putTimeseriesData(putTimeseriesDataRequest); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/callable/UpdateRowChangeCallable.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.callable; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.model.UpdateRowRequest; import com.alicloud.openservices.tablestore.model.UpdateRowResponse; import java.util.concurrent.Callable; public class UpdateRowChangeCallable implements Callable{ private SyncClientInterface ots = null; private UpdateRowRequest updateRowRequest = null; public UpdateRowChangeCallable(SyncClientInterface ots, UpdateRowRequest updateRowRequest ) { this.ots = ots; this.updateRowRequest = updateRowRequest; } @Override public UpdateRowResponse call() throws Exception { return ots.updateRow(updateRowRequest); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSAttrColumn.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alicloud.openservices.tablestore.model.ColumnType; public class OTSAttrColumn { // 该字段只在多版本中使用,表示多版本中,输入源中columnName的值,由将对应的Cell写入用户配置name的列中 private String srcName = null; private String name = null; private ColumnType type = null; //该字段只在写入时序表时使用,该字段是否为时序数据的标签内部字段 private Boolean isTag = false; public OTSAttrColumn(String name, ColumnType type) { this.name = name; this.type = type; } public OTSAttrColumn(String srcName, String name, ColumnType type) { this.srcName = srcName; this.name = name; this.type = type; } public OTSAttrColumn(String name, ColumnType type, Boolean isTag) { this.name = name; this.type = type; this.isTag = isTag; } public String getName() { return name; } public ColumnType getType() { return type; } public String getSrcName() { return srcName; } public Boolean getTag() { return isTag; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBatchWriteRowTaskManager.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alicloud.openservices.tablestore.SyncClientInterface; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; /** * 控制Task的并发数目 * */ public class OTSBatchWriteRowTaskManager implements OTSTaskManagerInterface { private SyncClientInterface ots = null; private OTSBlockingExecutor executorService = null; private OTSConf conf = null; private static final Logger LOG = LoggerFactory.getLogger(OTSBatchWriteRowTaskManager.class); public OTSBatchWriteRowTaskManager( SyncClientInterface ots, OTSConf conf) { this.ots = ots; this.conf = conf; executorService = new OTSBlockingExecutor(conf.getConcurrencyWrite()); } public void execute(List lines) throws Exception { LOG.debug("Begin execute."); executorService.execute(new OTSBatchWriterRowTask(ots, conf, lines)); LOG.debug("End execute."); } public void close() throws Exception { LOG.debug("Begin close."); executorService.shutdown(); LOG.debug("End close."); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBatchWriterRowTask.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alibaba.datax.plugin.writer.otswriter.OTSErrorCode; import com.alibaba.datax.plugin.writer.otswriter.callable.BatchWriteRowCallable; import com.alibaba.datax.plugin.writer.otswriter.callable.PutRowChangeCallable; import com.alibaba.datax.plugin.writer.otswriter.callable.UpdateRowChangeCallable; import com.alibaba.datax.plugin.writer.otswriter.utils.CollectorUtil; import com.alibaba.datax.plugin.writer.otswriter.utils.Common; import com.alibaba.datax.plugin.writer.otswriter.utils.LineAndError; import com.alibaba.datax.plugin.writer.otswriter.utils.RetryHelper; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.TableStoreException; import com.alicloud.openservices.tablestore.model.*; import com.alicloud.openservices.tablestore.model.BatchWriteRowResponse.RowResult; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public class OTSBatchWriterRowTask implements Runnable { private SyncClientInterface ots = null; private OTSConf conf = null; private List otsLines = new ArrayList(); private boolean isDone = false; private int retryTimes = 0; private static final Logger LOG = LoggerFactory.getLogger(OTSBatchWriterRowTask.class); public OTSBatchWriterRowTask( final SyncClientInterface ots, final OTSConf conf, final List lines ) { this.ots = ots; this.conf = conf; this.otsLines.addAll(lines); } @Override public void run() { LOG.debug("Begin run"); sendAll(otsLines); LOG.debug("End run"); } public boolean isDone() { return this.isDone; } private boolean isExceptionForSendOneByOne(TableStoreException ee) { if (ee.getErrorCode().equals(OTSErrorCode.INVALID_PARAMETER)|| ee.getErrorCode().equals(OTSErrorCode.REQUEST_TOO_LARGE) ) { return true; } return false; } private BatchWriteRowRequest createRequest(List lines) { BatchWriteRowRequest newRequest = new BatchWriteRowRequest(); switch (conf.getOperation()) { case PUT_ROW: case UPDATE_ROW: for (OTSLine l : lines) { newRequest.addRowChange(l.getRowChange()); } break; default: throw new RuntimeException(String.format(OTSErrorMessage.OPERATION_PARSE_ERROR, conf.getOperation())); } return newRequest; } /** * 单行发送数据 * @param line */ public void sendLine(OTSLine line) { try { switch (conf.getOperation()) { case PUT_ROW: PutRowRequest putRowRequest = new PutRowRequest(); putRowRequest.setRowChange((RowPutChange) line.getRowChange()); PutRowResponse putResult = RetryHelper.executeWithRetry( new PutRowChangeCallable(ots, putRowRequest), conf.getRetry(), conf.getSleepInMillisecond()); LOG.debug("Requst ID : {}", putResult.getRequestId()); break; case UPDATE_ROW: UpdateRowRequest updateRowRequest = new UpdateRowRequest(); updateRowRequest.setRowChange((RowUpdateChange) line.getRowChange()); UpdateRowResponse updateResult = RetryHelper.executeWithRetry( new UpdateRowChangeCallable(ots, updateRowRequest), conf.getRetry(), conf.getSleepInMillisecond()); LOG.debug("Requst ID : {}", updateResult.getRequestId()); break; } } catch (Exception e) { LOG.warn("sendLine fail. ", e); CollectorUtil.collect(line.getRecords(), e.getMessage()); } } private void sendAllOneByOne(List lines) { for (OTSLine l : lines) { sendLine(l); } } /** * 批量发送数据 * 如果程序发送失败,BatchWriteRow接口可能整体异常返回或者返回每个子行的操作状态 * 1.在整体异常的情况下:方法会检查这个异常是否能通过把批量数据拆分成单行发送,如果不行, * 将会把这一批数据记录到脏数据回收器中,如果可以,方法会调用sendAllOneByOne进行单行数据发送。 * 2.如果BatchWriteRow成功执行,方法会检查每行的返回状态,如果子行操作失败,方法会收集所有失 * 败的行,重新调用sendAll,发送失败的数据。 * @param lines */ private void sendAll(List lines) { try { Thread.sleep(Common.getDelaySendMillinSeconds(retryTimes, conf.getSleepInMillisecond())); BatchWriteRowRequest batchWriteRowRequest = createRequest(lines); BatchWriteRowResponse result = RetryHelper.executeWithRetry( new BatchWriteRowCallable(ots, batchWriteRowRequest), conf.getRetry(), conf.getSleepInMillisecond()); LOG.debug("Requst ID : {}", result.getRequestId()); List errors = getLineAndError(result, lines); if (!errors.isEmpty()){ if(retryTimes < conf.getRetry()) { retryTimes++; LOG.warn("Retry times : {}", retryTimes); List newLines = new ArrayList(); for (LineAndError re : errors) { LOG.warn("Because: {}", re.getError().getMessage()); if (RetryHelper.canRetry(re.getError().getCode())) { newLines.add(re.getLine()); } else { LOG.warn("Can not retry, record row to collector. {}", re.getError().getMessage()); CollectorUtil.collect(re.getLine().getRecords(), re.getError().getMessage()); } } if (!newLines.isEmpty()) { sendAll(newLines); } } else { LOG.warn("Retry times more than limitation. RetryTime : {}", retryTimes); CollectorUtil.collect(errors); } } } catch (TableStoreException e) { LOG.warn("Send data fail. {}", e.getMessage()); if (isExceptionForSendOneByOne(e)) { if (lines.size() == 1) { LOG.warn("Can not retry.", e); CollectorUtil.collect(e.getMessage(), lines); } else { // 进入单行发送的分支 sendAllOneByOne(lines); } } else { LOG.error("Can not send lines to OTS for RuntimeException.", e); CollectorUtil.collect(e.getMessage(), lines); } } catch (Exception e) { LOG.error("Can not send lines to OTS for Exception.", e); CollectorUtil.collect(e.getMessage(), lines); } } private List getLineAndError(BatchWriteRowResponse result, List lines) throws OTSCriticalException { List errors = new ArrayList(); switch(conf.getOperation()) { case PUT_ROW: case UPDATE_ROW: { List status = result.getFailedRows(); for (RowResult r : status) { errors.add(new LineAndError(lines.get(r.getIndex()), r.getError())); } } break; default: LOG.error("Bug branch."); throw new OTSCriticalException(String.format(OTSErrorMessage.OPERATION_PARSE_ERROR, conf.getOperation())); } return errors; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSBlockingExecutor.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.concurrent.*; /** * 单个Channel会多线程并发的写入数据到OTS中,需要使用一个固定的线程池来执行Runnable对象,同时当 * 线程池满时,阻塞execute方法。原生的Executor并不能做到阻塞execute方法。只是当queue满时, * 方法抛出默认RejectedExecutionException,或者我们实现RejectedExecutionHandler, * 这两种方法都无法满足阻塞用户请求的需求,所以我们用信号量来实现了一个阻塞的Executor * @author redchen * */ public class OTSBlockingExecutor { private final ExecutorService exec; private final Semaphore semaphore; private static final Logger LOG = LoggerFactory.getLogger(OTSBlockingExecutor.class); public OTSBlockingExecutor(int concurrency) { this.exec = new ThreadPoolExecutor( concurrency, concurrency, 0L, TimeUnit.SECONDS, new LinkedBlockingQueue()); this.semaphore = new Semaphore(concurrency); } public void execute(final Runnable task) throws InterruptedException { LOG.debug("Begin execute"); try { semaphore.acquire(); exec.execute(new Runnable() { public void run() { try { task.run(); } finally { semaphore.release(); } } }); } catch (RejectedExecutionException e) { semaphore.release(); throw new RuntimeException(OTSErrorMessage.INSERT_TASK_ERROR); } LOG.debug("End execute"); } public void shutdown() throws InterruptedException { this.exec.shutdown(); while (!this.exec.awaitTermination(1, TimeUnit.SECONDS)){} } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConf.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; public class OTSConf { private String endpoint= null; private String accessId = null; private String accessKey = null; private String instanceName = null; private String tableName = null; private List primaryKeyColumn = null; private List attributeColumn = null; private int retry = -1; private int sleepInMillisecond = -1; private int batchWriteCount = -1; private int concurrencyWrite = -1; private int ioThreadCount = -1; private int socketTimeoutInMillisecond = -1; private int connectTimeoutInMillisecond = -1; private OTSOpType operation = null; private int requestTotalSizeLimitation = -1; private OTSMode mode = null; private boolean enableAutoIncrement = false; private boolean isNewVersion = false; private boolean isTimeseriesTable = false; private TimeUnit timeUnit = TimeUnit.MICROSECONDS; private long timestamp = -1; private Map encodePkColumnMapping = null; private String columnNamePrefixFilter = null; public Map getEncodePkColumnMapping() { return encodePkColumnMapping; } public void setEncodePkColumnMapping(Map encodePkColumnMapping) { this.encodePkColumnMapping = encodePkColumnMapping; } public int getSocketTimeoutInMillisecond() { return socketTimeoutInMillisecond; } public OTSOpType getOperation() { return operation; } public void setOperation(OTSOpType operation) { this.operation = operation; } public List getPrimaryKeyColumn() { return primaryKeyColumn; } public void setPrimaryKeyColumn(List primaryKeyColumn) { this.primaryKeyColumn = primaryKeyColumn; } public int getConcurrencyWrite() { return concurrencyWrite; } public void setConcurrencyWrite(int concurrencyWrite) { this.concurrencyWrite = concurrencyWrite; } public int getBatchWriteCount() { return batchWriteCount; } public void setBatchWriteCount(int batchWriteCount) { this.batchWriteCount = batchWriteCount; } public String getEndpoint() { return endpoint; } public void setEndpoint(String endpoint) { this.endpoint = endpoint; } public String getAccessId() { return accessId; } public void setAccessId(String accessId) { this.accessId = accessId; } public String getAccessKey() { return accessKey; } public void setAccessKey(String accessKey) { this.accessKey = accessKey; } public String getInstanceName() { return instanceName; } public void setInstanceName(String instanceName) { this.instanceName = instanceName; } public String getTableName() { return tableName; } public void setTableName(String tableName) { this.tableName = tableName; } public List getAttributeColumn() { return attributeColumn; } public void setAttributeColumn(List attributeColumn) { this.attributeColumn = attributeColumn; } public int getRetry() { return retry; } public void setRetry(int retry) { this.retry = retry; } public int getSleepInMillisecond() { return sleepInMillisecond; } public void setSleepInMillisecond(int sleepInMillisecond) { this.sleepInMillisecond = sleepInMillisecond; } public int getIoThreadCount() { return ioThreadCount; } public void setIoThreadCount(int ioThreadCount) { this.ioThreadCount = ioThreadCount; } public int getSocketTimeout() { return socketTimeoutInMillisecond; } public void setSocketTimeoutInMillisecond(int socketTimeoutInMillisecond) { this.socketTimeoutInMillisecond = socketTimeoutInMillisecond; } public int getConnectTimeoutInMillisecond() { return connectTimeoutInMillisecond; } public void setConnectTimeoutInMillisecond(int connectTimeoutInMillisecond) { this.connectTimeoutInMillisecond = connectTimeoutInMillisecond; } public OTSMode getMode() { return mode; } public void setMode(OTSMode mode) { this.mode = mode; } public long getTimestamp() { return timestamp; } public void setTimestamp(long timestamp) { this.timestamp = timestamp; } public String getColumnNamePrefixFilter() { return columnNamePrefixFilter; } public void setColumnNamePrefixFilter(String columnNamePrefixFilter) { this.columnNamePrefixFilter = columnNamePrefixFilter; } public boolean getEnableAutoIncrement() { return enableAutoIncrement; } public void setEnableAutoIncrement(boolean enableAutoIncrement) { this.enableAutoIncrement = enableAutoIncrement; } public boolean isNewVersion() { return isNewVersion; } public void setNewVersion(boolean newVersion) { isNewVersion = newVersion; } public boolean isTimeseriesTable() { return isTimeseriesTable; } public void setTimeseriesTable(boolean timeseriesTable) { isTimeseriesTable = timeseriesTable; } public TimeUnit getTimeUnit() { return timeUnit; } public void setTimeUnit(TimeUnit timeUnit) { this.timeUnit = timeUnit; } public int getRequestTotalSizeLimitation() { return requestTotalSizeLimitation; } public void setRequestTotalSizeLimitation(int requestTotalSizeLimitation) { this.requestTotalSizeLimitation = requestTotalSizeLimitation; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSConst.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; public class OTSConst { // Reader support type public final static String TYPE_STRING = "STRING"; public final static String TYPE_INTEGER = "INT"; public final static String TYPE_DOUBLE = "DOUBLE"; public final static String TYPE_BOOLEAN = "BOOL"; public final static String TYPE_BINARY = "BINARY"; // Column public final static String NAME = "name"; public final static String SRC_NAME = "srcName"; public final static String TYPE = "type"; public final static String IS_TAG = "is_timeseries_tag"; public final static String OTS_CONF = "OTS_CONF"; public final static String OTS_MODE_NORMAL = "normal"; public final static String OTS_MODE_MULTI_VERSION = "multiVersion"; public final static String OTS_MODE_TIME_SERIES = "timeseries"; public final static String OTS_OP_TYPE_PUT = "PutRow"; public final static String OTS_OP_TYPE_UPDATE = "UpdateRow"; // only support in old version public final static String OTS_OP_TYPE_DELETE = "DeleteRow"; // options public final static String RETRY = "maxRetryTime"; public final static String SLEEP_IN_MILLISECOND = "retrySleepInMillisecond"; public final static String BATCH_WRITE_COUNT = "batchWriteCount"; public final static String CONCURRENCY_WRITE = "concurrencyWrite"; public final static String IO_THREAD_COUNT = "ioThreadCount"; public final static String MAX_CONNECT_COUNT = "maxConnectCount"; public final static String SOCKET_TIMEOUTIN_MILLISECOND = "socketTimeoutInMillisecond"; public final static String CONNECT_TIMEOUT_IN_MILLISECOND = "connectTimeoutInMillisecond"; public final static String REQUEST_TOTAL_SIZE_LIMITATION = "requestTotalSizeLimitation"; public static final String MEASUREMENT_NAME = "_m_name"; public static final String DATA_SOURCE = "_data_source"; public static final String TAGS = "_tags"; public static final String TIME = "_time"; } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSErrorMessage.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; public class OTSErrorMessage { public static final String MODE_PARSE_ERROR = "The 'mode' only support 'normal' and 'multiVersion' not '%s'."; public static final String OPERATION_PARSE_ERROR = "The 'writeMode' only support 'PutRow' and 'UpdateRow' not '%s'."; public static final String MUTLI_MODE_OPERATION_PARSE_ERROR = "When configurion set mode='MultiVersion', the 'writeMode' only support 'UpdateRow' not '%s'."; public static final String UNSUPPORT_PARSE = "Unsupport parse '%s' to '%s'."; public static final String UNSUPPORT = "Unsupport : '%s'."; public static final String RECORD_AND_COLUMN_SIZE_ERROR = "Size of record not equal size of config column. record size : %d, config column size : %d, record data : %s."; public static final String PK_TYPE_ERROR = "Primary key type only support 'string', 'int' and 'binary', not support '%s'."; public static final String ATTR_TYPE_ERROR = "Column type only support 'string','int','double','bool' and 'binary', not support '%s'."; public static final String PK_COLUMN_MISSING_ERROR = "Missing the column '%s' in 'primaryKey'."; public static final String INPUT_PK_COUNT_NOT_EQUAL_META_ERROR = "The count of 'primaryKey' not equal meta, input count : %d, primary key count : %d in meta."; public static final String INPUT_PK_TYPE_NOT_MATCH_META_ERROR = "The type of 'primaryKey' not match meta, column name : %s, input type: %s, primary key type : %s in meta."; public static final String INPUT_PK_NAME_NOT_EXIST_IN_META_ERROR = "The input primary column '%s' is not exist in meta."; public static final String ATTR_REPEAT_COLUMN_ERROR = "Repeat column '%s' in 'column'."; public static final String MISSING_PARAMTER_ERROR = "The param '%s' is not exist."; public static final String PARAMTER_STRING_IS_EMPTY_ERROR = "The param length of '%s' is zero."; public static final String PARAMETER_LIST_IS_EMPTY_ERROR = "The param '%s' is a empty json array."; public static final String PARAMETER_IS_NOT_ARRAY_ERROR = "The param '%s' is not a json array."; public static final String PARAMETER_IS_NOT_MAP_ERROR = "The param '%s' is not a json map."; public static final String PARSE_TO_LIST_ERROR = "Can not parse '%s' to list."; public static final String PK_MAP_NAME_TYPE_ERROR = "The 'name' and 'type only support string in json map of 'primaryKey'."; public static final String ATTR_MAP_NAME_TYPE_ERROR = "The 'name' and 'type only support string in json map of 'column'."; public static final String ATTR_MAP_SRCNAME_NAME_TYPE_ERROR = "The 'srcName', 'name' and 'type' only support string in json map of 'column'."; public static final String PK_MAP_KEY_TYPE_ERROR = "The '%s' only support string in json map of 'primaryKey'."; public static final String ATTR_MAP_KEY_TYPE_ERROR = "The '%s' only support string in json map of 'column'."; public static final String PK_MAP_INCLUDE_NAME_TYPE_ERROR = "The only support 'name' and 'type' fileds in json map of 'primaryKey'."; public static final String ATTR_MAP_INCLUDE_NAME_TYPE_ERROR = "The only support 'name' and 'type' fileds in json map of 'column'."; public static final String PK_MAP_FILED_MISSING_ERROR = "The '%s' fileds is missing in json map of 'primaryKey'."; public static final String ATTR_MAP_FILED_MISSING_ERROR = "The '%s' fileds is missing in json map of 'column'."; public static final String ATTR_MAP_INCLUDE_SRCNAME_NAME_TYPE_ERROR = "The only support 'srcName', 'name' and 'type' fileds in json map of 'column'."; public static final String PK_ITEM_IS_ILLEAGAL_ERROR = "The item is not string or map in 'primaryKey'."; public static final String PK_IS_NOT_EXIST_AT_OTS_ERROR = "Can not find the pk('%s') at ots in 'primaryKey'."; public static final String ATTR_ITEM_IS_NOT_MAP_ERROR = "The item is not map in 'column'."; public static final String PK_COLUMN_NAME_IS_EMPTY_ERROR = "The name of item can not be a empty string in 'primaryKey'."; public static final String PK_COLUMN_TYPE_IS_EMPTY_ERROR = "The type of item can not be a empty string in 'primaryKey'."; public static final String ATTR_COLUMN_NAME_IS_EMPTY_ERROR = "The name of item can not be a empty string in 'column'."; public static final String ATTR_COLUMN_SRC_NAME_IS_EMPTY_ERROR = "The srcName of item can not be a empty string in 'column'."; public static final String ATTR_COLUMN_TYPE_IS_EMPTY_ERROR = "The type of item can not be a empty string in 'column'."; public static final String MULTI_PK_ATTR_COLUMN_ERROR = "Duplicate item in 'column' and 'primaryKey', column name : %s ."; public static final String MULTI_ATTR_COLUMN_ERROR = "Duplicate item in 'column', column name : %s ."; public static final String MULTI_ATTR_SRC_COLUMN_ERROR = "Duplicate src name in 'column', src name : %s ."; public static final String COLUMN_CONVERSION_ERROR = "Column coversion error, src type : %s, src value: %s, expect type: %s ."; public static final String PK_COLUMN_VALUE_IS_NULL_ERROR = "The column of record is NULL, primary key name : %s ."; public static final String PK_STRING_LENGTH_ERROR = "The length of pk string value is more than configuration, conf: %d, input: %d ."; public static final String ATTR_STRING_LENGTH_ERROR = "The length of attr string value is more than configuration, conf: %d, input: %d ."; public static final String BINARY_LENGTH_ERROR = "The length of binary value is more than configuration, conf: %d, input: %d ."; public static final String LINE_LENGTH_ERROR = "The length of row is more than length of request configuration, conf: %d, row: %d ."; public static final String INSERT_TASK_ERROR = "Can not execute the task, becase the ExecutorService is shutdown."; public static final String COLUMN_NOT_DEFINE = "The column name : '%s' not define in column."; public static final String INPUT_RECORDS_IS_EMPTY = "The input records can not be empty."; public static final String MULTI_VERSION_TIMESTAMP_IS_EMPTY = "The input timestamp can not be empty in the multiVersion mode."; public static final String MULTI_VERSION_VALUE_IS_EMPTY = "The input value can not be empty in the multiVersion mode."; public static final String INPUT_COLUMN_COUNT_LIMIT = "The input count(%d) of column more than max(%d)."; public static final String PUBLIC_SDK_NO_SUPPORT_MULTI_VERSION = "The old version do not support multi version function. Please add config in otswriter: \"newVersion\":\"true\" ."; public static final String PUBLIC_SDK_NO_SUPPORT_AUTO_INCREMENT = "The old version do not support auto increment primary key function. Please add config in otswriter: \"newVersion\":\"true\" ."; public static final String NOT_SUPPORT_MULTI_VERSION_AUTO_INCREMENT = "The multi version mode do not support auto increment primary key function."; public static final String PUBLIC_SDK_NO_SUPPORT_TIMESERIES_TABLE = "The old version do not support write timeseries table. Please add config in otswriter: \"newVersion\":\"true\" ."; public static final String NOT_SUPPORT_TIMESERIES_TABLE_AUTO_INCREMENT = "The timeseries table do not support auto increment primary key function."; public static final String NO_FOUND_M_NAME_FIELD_ERROR = "The '_m_name' field should be set in columns because 'measurement' is required in timeseries data."; public static final String NO_FOUND_TIME_FIELD_ERROR = "The '_time' field should be set in columns because 'time' is required in timeseries data."; public static final String TIMEUNIT_FORMAT_ERROR = "The value of param 'timeunit' is '%s', which should be in ['NANOSECONDS', 'MICROSECONDS', 'MILLISECONDS', 'SECONDS', 'MINUTES']."; } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSLine.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alibaba.datax.plugin.writer.otswriter.utils.CalculateHelper; import com.alicloud.openservices.tablestore.model.PrimaryKey; import com.alicloud.openservices.tablestore.model.RowChange; import com.alicloud.openservices.tablestore.model.RowPutChange; import com.alicloud.openservices.tablestore.model.RowUpdateChange; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesRow; import java.util.ArrayList; import java.util.List; public class OTSLine { private int dataSize = 0; private PrimaryKey pk = null; private RowChange change = null; private TimeseriesRow timeseriesRow = null; private List records = new ArrayList(); public OTSLine( PrimaryKey pk, List records, RowChange change) throws OTSCriticalException { this.pk = pk; this.change = change; this.records.addAll(records); setSize(this.change); } public OTSLine( PrimaryKey pk, Record record, RowChange change) throws OTSCriticalException { this.pk = pk; this.change = change; this.records.add(record); setSize(this.change); } public OTSLine( Record record, TimeseriesRow row) throws OTSCriticalException { this.timeseriesRow = row; this.records.add(record); setSize(this.timeseriesRow); } private void setSize(RowChange change) throws OTSCriticalException { if (change instanceof RowPutChange) { this.dataSize = CalculateHelper.getRowPutChangeSize((RowPutChange) change); } else if (change instanceof RowUpdateChange) { this.dataSize = CalculateHelper.getRowUpdateChangeSize((RowUpdateChange) change); } else { throw new RuntimeException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, change.getClass().toString(), "RowPutChange or RowUpdateChange")); } } private void setSize(TimeseriesRow row) throws OTSCriticalException { this.dataSize = CalculateHelper.getTimeseriesRowDataSize(row); } public List getRecords() { return records; } public PrimaryKey getPk() { return pk; } public int getDataSize() { return dataSize; } public RowChange getRowChange() { return change; } public TimeseriesRow getTimeseriesRow() { return timeseriesRow; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSMode.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; public enum OTSMode { NORMAL, // 普通模式 MULTI_VERSION // 多版本模式 } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSOpType.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; public enum OTSOpType { PUT_ROW, UPDATE_ROW, @Deprecated DELETE_ROW } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSSendBuffer.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alicloud.openservices.tablestore.SyncClientInterface; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public class OTSSendBuffer { private OTSConf conf = null; private OTSTaskManagerInterface manager = null; private int totalSize = 0; private List buffer = new ArrayList(); private static final Logger LOG = LoggerFactory.getLogger(OTSSendBuffer.class); public OTSSendBuffer( SyncClientInterface ots, OTSConf conf) { this.conf = conf; if (conf.isTimeseriesTable()){ this.manager = new OTSTimeseriesRowTaskManager(ots, conf); } else { this.manager = new OTSBatchWriteRowTaskManager(ots, conf); } } public void write(OTSLine line) throws OTSCriticalException { LOG.debug("write begin"); // 检查是否满足发送条件 if (buffer.size() >= conf.getBatchWriteCount() || ((totalSize + line.getDataSize()) > conf.getRequestTotalSizeLimitation() && totalSize > 0) ) { try { manager.execute(new ArrayList(buffer)); } catch (Exception e) { LOG.error("OTSBatchWriteRowTaskManager execute fail : {}", e.getMessage(), e); throw new OTSCriticalException(e); } buffer.clear(); totalSize = 0; } buffer.add(line); totalSize += line.getDataSize(); LOG.debug("write end"); } public void flush() throws OTSCriticalException { LOG.debug("flush begin"); if (!buffer.isEmpty()) { try { manager.execute(new ArrayList(buffer)); } catch (Exception e) { LOG.error("OTSBatchWriteRowTaskManager flush fail : {}", e.getMessage(), e); throw new OTSCriticalException(e); } } LOG.debug("flush end"); } public void close() throws OTSCriticalException { LOG.debug("close begin"); try { flush(); } finally { try { manager.close(); } catch (Exception e) { LOG.error("OTSBatchWriteRowTaskManager close fail : {}", e.getMessage(), e); throw new OTSCriticalException(e); } } LOG.debug("close end"); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTaskManagerInterface.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import java.util.List; public interface OTSTaskManagerInterface { public void execute(List lines) throws Exception; public void close() throws Exception; } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTimeseriesRowTask.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alibaba.datax.plugin.writer.otswriter.OTSErrorCode; import com.alibaba.datax.plugin.writer.otswriter.callable.PutTimeseriesDataCallable; import com.alibaba.datax.plugin.writer.otswriter.utils.CollectorUtil; import com.alibaba.datax.plugin.writer.otswriter.utils.Common; import com.alibaba.datax.plugin.writer.otswriter.utils.LineAndError; import com.alibaba.datax.plugin.writer.otswriter.utils.RetryHelper; import com.alicloud.openservices.tablestore.TableStoreException; import com.alicloud.openservices.tablestore.TimeseriesClient; import com.alicloud.openservices.tablestore.model.PutRowRequest; import com.alicloud.openservices.tablestore.model.timeseries.PutTimeseriesDataRequest; import com.alicloud.openservices.tablestore.model.timeseries.PutTimeseriesDataResponse; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public class OTSTimeseriesRowTask implements Runnable { private static final Logger LOG = LoggerFactory.getLogger(OTSTimeseriesRowTask.class); private TimeseriesClient client = null; private OTSConf conf = null; private List otsLines = new ArrayList(); private boolean isDone = false; private int retryTimes = 0; public OTSTimeseriesRowTask( final TimeseriesClient client, final OTSConf conf, final List lines ) { this.client = client; this.conf = conf; this.otsLines.addAll(lines); } @Override public void run() { LOG.debug("Begin run"); sendAll(otsLines); LOG.debug("End run"); } public boolean isDone() { return this.isDone; } private boolean isExceptionForSendOneByOne(TableStoreException ee) { if (ee.getErrorCode().equals(OTSErrorCode.INVALID_PARAMETER) || ee.getErrorCode().equals(OTSErrorCode.REQUEST_TOO_LARGE) ) { return true; } return false; } private PutTimeseriesDataRequest createRequest(List lines) { PutTimeseriesDataRequest newRequest = new PutTimeseriesDataRequest(conf.getTableName()); for (OTSLine l : lines) { newRequest.addRow(l.getTimeseriesRow()); } return newRequest; } /** * 单行发送数据 * * @param line */ public void sendLine(OTSLine line) { try { PutTimeseriesDataRequest putTimeseriesDataRequest = new PutTimeseriesDataRequest(conf.getTableName()); putTimeseriesDataRequest.addRow(line.getTimeseriesRow()); PutTimeseriesDataResponse result = RetryHelper.executeWithRetry( new PutTimeseriesDataCallable(client, putTimeseriesDataRequest), conf.getRetry(), conf.getSleepInMillisecond()); if (!result.isAllSuccess()){ String errMsg = result.getFailedRows().get(0).getError().getMessage(); LOG.warn("sendLine fail. " + errMsg); CollectorUtil.collect(line.getRecords(), errMsg); }else { LOG.debug("Request ID : {}", result.getRequestId()); } } catch (Exception e) { LOG.warn("sendLine fail. ", e); CollectorUtil.collect(line.getRecords(), e.getMessage()); } } private void sendAllOneByOne(List lines) { for (OTSLine l : lines) { sendLine(l); } } private void sendAll(List lines) { try { Thread.sleep(Common.getDelaySendMillinSeconds(retryTimes, conf.getSleepInMillisecond())); PutTimeseriesDataRequest putTimeseriesDataRequest = createRequest(lines); PutTimeseriesDataResponse result = RetryHelper.executeWithRetry( new PutTimeseriesDataCallable(client, putTimeseriesDataRequest), conf.getRetry(), conf.getSleepInMillisecond()); LOG.debug("Request ID : {}", result.getRequestId()); List errors = getLineAndError(result, lines); if (!errors.isEmpty()) { if (retryTimes < conf.getRetry()) { retryTimes++; LOG.warn("Retry times : {}", retryTimes); List newLines = new ArrayList(); for (LineAndError re : errors) { LOG.warn("Because: {}", re.getError().getMessage()); if (RetryHelper.canRetry(re.getError().getCode())) { newLines.add(re.getLine()); } else { LOG.warn("Can not retry, record row to collector. {}", re.getError().getMessage()); CollectorUtil.collect(re.getLine().getRecords(), re.getError().getMessage()); } } if (!newLines.isEmpty()) { sendAll(newLines); } } else { LOG.warn("Retry times more than limitation. RetryTime : {}", retryTimes); CollectorUtil.collect(errors); } } } catch (TableStoreException e) { LOG.warn("Send data fail. {}", e.getMessage()); if (isExceptionForSendOneByOne(e)) { if (lines.size() == 1) { LOG.warn("Can not retry.", e); CollectorUtil.collect(e.getMessage(), lines); } else { // 进入单行发送的分支 sendAllOneByOne(lines); } } else { LOG.error("Can not send lines to OTS for RuntimeException.", e); CollectorUtil.collect(e.getMessage(), lines); } } catch (Exception e) { LOG.error("Can not send lines to OTS for Exception.", e); CollectorUtil.collect(e.getMessage(), lines); } } private List getLineAndError(PutTimeseriesDataResponse result, List lines) throws OTSCriticalException { List errors = new ArrayList(); List status = result.getFailedRows(); for (PutTimeseriesDataResponse.FailedRowResult r : status) { errors.add(new LineAndError(lines.get(r.getIndex()), r.getError())); } return errors; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/OTSTimeseriesRowTaskManager.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alicloud.openservices.tablestore.SyncClient; import com.alicloud.openservices.tablestore.SyncClientInterface; import com.alicloud.openservices.tablestore.TimeseriesClient; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class OTSTimeseriesRowTaskManager implements OTSTaskManagerInterface{ private TimeseriesClient client = null; private OTSBlockingExecutor executorService = null; private OTSConf conf = null; private static final Logger LOG = LoggerFactory.getLogger(OTSTimeseriesRowTaskManager.class); public OTSTimeseriesRowTaskManager( SyncClientInterface ots, OTSConf conf) { this.client = ((SyncClient)ots).asTimeseriesClient(); this.conf = conf; executorService = new OTSBlockingExecutor(conf.getConcurrencyWrite()); } @Override public void execute(List lines) throws Exception { LOG.debug("Begin execute."); executorService.execute(new OTSTimeseriesRowTask(client, conf, lines)); LOG.debug("End execute."); } @Override public void close() throws Exception { LOG.debug("Begin close."); executorService.shutdown(); LOG.debug("End close."); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowDeleteChangeWithRecord.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.writer.otswriter.utils.WithRecord; public class RowDeleteChangeWithRecord extends com.aliyun.openservices.ots.model.RowDeleteChange implements WithRecord { private Record record; public RowDeleteChangeWithRecord(String tableName) { super(tableName); } @Override public Record getRecord() { return record; } @Override public void setRecord(Record record) { this.record = record; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowPutChangeWithRecord.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.writer.otswriter.utils.WithRecord; public class RowPutChangeWithRecord extends com.aliyun.openservices.ots.model.RowPutChange implements WithRecord { private Record record; public RowPutChangeWithRecord(String tableName) { super(tableName); } @Override public Record getRecord() { return record; } @Override public void setRecord(Record record) { this.record = record; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/model/RowUpdateChangeWithRecord.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.model; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.writer.otswriter.utils.WithRecord; public class RowUpdateChangeWithRecord extends com.aliyun.openservices.ots.model.RowUpdateChange implements WithRecord { private Record record; public RowUpdateChangeWithRecord(String tableName) { super(tableName); } @Override public Record getRecord() { return record; } @Override public void setRecord(Record record) { this.record = record; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CalculateHelper.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alicloud.openservices.tablestore.core.utils.Pair; import com.alicloud.openservices.tablestore.model.*; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesKey; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesRow; import java.util.List; import java.util.Map; import static com.alicloud.openservices.tablestore.model.PrimaryKeyValue.AUTO_INCREMENT; public class CalculateHelper { private static int getPrimaryKeyValueSize(PrimaryKeyValue primaryKeyValue) throws OTSCriticalException { int primaryKeySize = 0; if(primaryKeyValue == AUTO_INCREMENT){ return primaryKeySize; } switch (primaryKeyValue.getType()) { case INTEGER: primaryKeySize = 8; break; case STRING: primaryKeySize = primaryKeyValue.asStringInBytes().length; break; case BINARY: primaryKeySize = primaryKeyValue.asBinary().length; break; default: throw new OTSCriticalException("Bug: not support the type : " + primaryKeyValue.getType() + " in getPrimaryKeyValueSize"); } return primaryKeySize; } private static int getColumnValueSize(ColumnValue columnValue) throws OTSCriticalException { int columnSize = 0; switch (columnValue.getType()) { case INTEGER: columnSize += 8; break; case DOUBLE: columnSize += 8; break; case STRING: columnSize += columnValue.asStringInBytes().length; break; case BINARY: columnSize += columnValue.asBinary().length; break; case BOOLEAN: columnSize += 1; break; default: throw new OTSCriticalException("Bug: not support the type : " + columnValue.getType() + " in getColumnValueSize"); } return columnSize; } public static int getRowPutChangeSize(RowPutChange change) throws OTSCriticalException { int primaryKeyTotalSize = 0; int columnTotalSize = 0; // PrimaryKeys Total Size PrimaryKey primaryKey = change.getPrimaryKey(); PrimaryKeyColumn[] primaryKeyColumnArray = primaryKey.getPrimaryKeyColumns(); PrimaryKeyColumn primaryKeyColumn; byte[] primaryKeyName; PrimaryKeyValue primaryKeyValue; for (int i = 0; i < primaryKeyColumnArray.length; i++) { primaryKeyColumn = primaryKeyColumnArray[i]; primaryKeyName = primaryKeyColumn.getNameRawData(); primaryKeyValue = primaryKeyColumn.getValue(); // += PrimaryKey Name Data primaryKeyTotalSize += primaryKeyName.length; // += PrimaryKey Value Data primaryKeyTotalSize += getPrimaryKeyValueSize(primaryKeyValue); } // Columns Total Size List columnList = change.getColumnsToPut(); for (Column column : columnList) { // += Column Name columnTotalSize += column.getNameRawData().length; // += Column Value ColumnValue columnValue = column.getValue(); columnTotalSize += getColumnValueSize(columnValue); // += Timestamp if (column.hasSetTimestamp()) { columnTotalSize += 8; } } return primaryKeyTotalSize + columnTotalSize; } public static int getRowUpdateChangeSize(RowUpdateChange change) throws OTSCriticalException { int primaryKeyTotalSize = 0; int columnPutSize = 0; int columnDeleteSize = 0; // PrimaryKeys Total Size PrimaryKey primaryKey = change.getPrimaryKey(); PrimaryKeyColumn[] primaryKeyColumnArray = primaryKey.getPrimaryKeyColumns(); PrimaryKeyColumn primaryKeyColumn; byte[] primaryKeyName; PrimaryKeyValue primaryKeyValue; for (int i = 0; i < primaryKeyColumnArray.length; i++) { primaryKeyColumn = primaryKeyColumnArray[i]; primaryKeyName = primaryKeyColumn.getNameRawData(); primaryKeyValue = primaryKeyColumn.getValue(); // += PrimaryKey Name Data primaryKeyTotalSize += primaryKeyName.length; // += PrimaryKey Value Data primaryKeyTotalSize += getPrimaryKeyValueSize(primaryKeyValue); } // Column Total Size List> updatePairList = change.getColumnsToUpdate(); Column column; ColumnValue columnValue; RowUpdateChange.Type type; for (Pair updatePair : updatePairList) { column = updatePair.getFirst(); type = updatePair.getSecond(); switch (type) { case DELETE: columnDeleteSize += column.getNameRawData().length; columnDeleteSize += 8;// Timestamp break; case DELETE_ALL: columnDeleteSize += column.getNameRawData().length; break; case PUT: // Name columnPutSize += column.getNameRawData().length; // Value columnValue = column.getValue(); columnPutSize += getColumnValueSize(columnValue); break; default: throw new OTSCriticalException("Bug: not support the type : " + type); } } return primaryKeyTotalSize + columnPutSize + columnDeleteSize; } public static int getTimeseriesRowDataSize(TimeseriesRow row) { TimeseriesKey timeseriesKey = row.getTimeseriesKey(); Map fields = row.getFields(); int totalSize = 0; totalSize += 8; // time size totalSize += com.alicloud.openservices.tablestore.core.utils.CalculateHelper.calcStringSizeInBytes(timeseriesKey.getMeasurementName()); totalSize += com.alicloud.openservices.tablestore.core.utils.CalculateHelper.calcStringSizeInBytes(timeseriesKey.getDataSource()); totalSize += com.alicloud.openservices.tablestore.core.utils.CalculateHelper.calcStringSizeInBytes(timeseriesKey.buildTagsString()); for (Map.Entry entry : fields.entrySet()) { totalSize += entry.getValue().getDataSize() + com.alicloud.openservices.tablestore.core.utils.CalculateHelper.calcStringSizeInBytes(entry.getKey()); } return totalSize; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CollectorUtil.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.plugin.writer.otswriter.model.OTSLine; import java.util.List; public class CollectorUtil { private static TaskPluginCollector taskPluginCollector = null; public static void init(TaskPluginCollector collector) { taskPluginCollector = collector; } public static void collect(Record dirtyRecord, String errorMessage) { if (taskPluginCollector != null) { taskPluginCollector.collectDirtyRecord(dirtyRecord, errorMessage); } } public static void collect(List dirtyRecords, String errorMessage) { for (Record r:dirtyRecords) { collect(r, errorMessage); } } public static void collect(List errors) { for (LineAndError e:errors) { collect(e.getLine().getRecords(), e.getError().getMessage()); } } public static void collect(String errorMessage, List lines) { for (OTSLine l:lines) { collect(l.getRecords(), errorMessage); } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversion.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; import com.alicloud.openservices.tablestore.model.ColumnValue; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import com.alicloud.openservices.tablestore.model.PrimaryKeyValue; /** * 备注:datax提供的转换机制有如下限制,如下规则是不能转换的 * 1. bool -> binary * 2. binary -> long, double, bool * 3. double -> bool, binary * 4. long -> binary */ public class ColumnConversion { public static PrimaryKeyValue columnToPrimaryKeyValue(Column c, PrimaryKeySchema col) throws OTSCriticalException { try { switch (col.getType()) { case STRING: return PrimaryKeyValue.fromString(c.asString()); case INTEGER: return PrimaryKeyValue.fromLong(c.asLong()); case BINARY: return PrimaryKeyValue.fromBinary(c.asBytes()); default: throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "PrimaryKeyValue")); } } catch (DataXException e) { throw new IllegalArgumentException(String.format( OTSErrorMessage.COLUMN_CONVERSION_ERROR, c.getType(), c.asString(), col.getType().toString() ), e); } } public static ColumnValue columnToColumnValue(Column c) throws OTSCriticalException { switch (c.getType()) { case STRING: return ColumnValue.fromString(c.asString()); case LONG: return ColumnValue.fromLong(c.asLong()); case BOOL: return ColumnValue.fromBoolean(c.asBoolean()); case DOUBLE: return ColumnValue.fromDouble(c.asDouble()); case BYTES: return ColumnValue.fromBinary(c.asBytes()); default: throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, c.getType(), "ColumnValue")); } } public static ColumnValue columnToColumnValue(Column c, OTSAttrColumn col) throws OTSCriticalException { try { switch (col.getType()) { case STRING: return ColumnValue.fromString(c.asString()); case INTEGER: return ColumnValue.fromLong(c.asLong()); case BOOLEAN: return ColumnValue.fromBoolean(c.asBoolean()); case DOUBLE: return ColumnValue.fromDouble(c.asDouble()); case BINARY: return ColumnValue.fromBinary(c.asBytes()); default: throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "ColumnValue")); } } catch (DataXException e) { throw new IllegalArgumentException(String.format( OTSErrorMessage.COLUMN_CONVERSION_ERROR, c.getType(), c.asString(), col.getType().toString() ), e); } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ColumnConversionOld.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import com.aliyun.openservices.ots.model.ColumnValue; import com.aliyun.openservices.ots.model.PrimaryKeyValue; /** * 备注:datax提供的转换机制有如下限制,如下规则是不能转换的 * 1. bool -> binary * 2. binary -> long, double, bool * 3. double -> bool, binary * 4. long -> binary */ public class ColumnConversionOld { public static PrimaryKeyValue columnToPrimaryKeyValue(Column c, PrimaryKeySchema col) { try { switch (col.getType()) { case STRING: return PrimaryKeyValue.fromString(c.asString()); case INTEGER: return PrimaryKeyValue.fromLong(c.asLong()); default: throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "PrimaryKeyValue")); } } catch (DataXException e) { throw new IllegalArgumentException(String.format( OTSErrorMessage.COLUMN_CONVERSION_ERROR, c.getType(), c.asString(), col.getType().toString() )); } } public static ColumnValue columnToColumnValue(Column c, OTSAttrColumn col) { try { switch (col.getType()) { case STRING: return ColumnValue.fromString(c.asString()); case INTEGER: return ColumnValue.fromLong(c.asLong()); case BOOLEAN: return ColumnValue.fromBoolean(c.asBoolean()); case DOUBLE: return ColumnValue.fromDouble(c.asDouble()); case BINARY: return ColumnValue.fromBinary(c.asBytes()); default: throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, col.getType(), "ColumnValue")); } } catch (DataXException e) { throw new IllegalArgumentException(String.format( OTSErrorMessage.COLUMN_CONVERSION_ERROR, c.getType(), c.asString(), col.getType().toString() )); } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/Common.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; import com.alicloud.openservices.tablestore.ClientConfiguration; import com.alicloud.openservices.tablestore.SyncClient; import com.alicloud.openservices.tablestore.core.utils.Pair; import com.alicloud.openservices.tablestore.model.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; import java.util.Map.Entry; public class Common { private static final Logger LOG = LoggerFactory.getLogger(Common.class); /** * 从record中分析出PK,如果分析成功,则返回PK,如果分析失败,则返回null,并记录数据到脏数据回收器中 * @param pkColumns * @param r * @return * @throws OTSCriticalException */ public static PrimaryKey getPKFromRecord(Map pkColumns, Record r) throws OTSCriticalException { if (r.getColumnNumber() < pkColumns.size()) { throw new OTSCriticalException(String.format("Bug branch, the count(%d) of record < count(%d) of (pk) from config.", r.getColumnNumber(), pkColumns.size())); } try { PrimaryKeyBuilder builder = PrimaryKeyBuilder.createPrimaryKeyBuilder(); for (Entry en : pkColumns.entrySet()) { Column col = r.getColumn(en.getValue()); PrimaryKeySchema expect = en.getKey(); if (col.getRawData() == null) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_VALUE_IS_NULL_ERROR, expect.getName())); } PrimaryKeyValue pk = ColumnConversion.columnToPrimaryKeyValue(col, expect); builder.addPrimaryKeyColumn(new PrimaryKeyColumn(expect.getName(), pk)); } return builder.build(); } catch (IllegalArgumentException e) { LOG.warn("getPKFromRecord fail : {}", e.getMessage(), e); CollectorUtil.collect(r, e.getMessage()); return null; } } public static PrimaryKey getPKFromRecordWithAutoIncrement(Map pkColumns, Record r, PrimaryKeySchema autoIncrementPrimaryKey) throws OTSCriticalException { if (r.getColumnNumber() < pkColumns.size()) { throw new OTSCriticalException(String.format("Bug branch, the count(%d) of record < count(%d) of (pk) from config.", r.getColumnNumber(), pkColumns.size())); } try { PrimaryKeyBuilder builder = PrimaryKeyBuilder.createPrimaryKeyBuilder(); for (Entry en : pkColumns.entrySet()) { Column col = r.getColumn(en.getValue()); PrimaryKeySchema expect = en.getKey(); if (col.getRawData() == null) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_VALUE_IS_NULL_ERROR, expect.getName())); } PrimaryKeyValue pk = ColumnConversion.columnToPrimaryKeyValue(col, expect); builder.addPrimaryKeyColumn(new PrimaryKeyColumn(expect.getName(), pk)); } if(autoIncrementPrimaryKey != null){ if(autoIncrementPrimaryKey.getOption()!= PrimaryKeyOption.AUTO_INCREMENT){ throw new OTSCriticalException(String.format("The auto Increment PrimaryKey [(%s)] option should be PrimaryKeyOption.AUTO_INCREMENT.", autoIncrementPrimaryKey.getName())); } builder.addPrimaryKeyColumn(autoIncrementPrimaryKey.getName(),PrimaryKeyValue.AUTO_INCREMENT); } return builder.build(); } catch (IllegalArgumentException e) { LOG.warn("getPKFromRecord fail : {}", e.getMessage(), e); CollectorUtil.collect(r, e.getMessage()); return null; } } /** * 从Record中解析ColumnValue,如果Record转换为ColumnValue失败,方法会返回null * @param pkCount * @param attrColumns * @param r * @return * @throws OTSCriticalException */ public static List> getAttrFromRecord(int pkCount, List attrColumns, Record r) throws OTSCriticalException { if (pkCount + attrColumns.size() != r.getColumnNumber()) { throw new OTSCriticalException(String.format("Bug branch, the count(%d) of record != count(%d) of (pk + column) from config.", r.getColumnNumber(), (pkCount + attrColumns.size()))); } try { List> attr = new ArrayList>(r.getColumnNumber()); for (int i = 0; i < attrColumns.size(); i++) { Column col = r.getColumn(i + pkCount); OTSAttrColumn expect = attrColumns.get(i); if (col.getRawData() == null) { attr.add(new Pair(expect.getName(), null)); continue; } ColumnValue cv = ColumnConversion.columnToColumnValue(col, expect); attr.add(new Pair(expect.getName(), cv)); } return attr; } catch (IllegalArgumentException e) { LOG.warn("getAttrFromRecord fail : {}", e.getMessage(), e); CollectorUtil.collect(r, e.getMessage()); return null; } } public static long getDelaySendMillinSeconds(int hadRetryTimes, int initSleepInMilliSecond) { if (hadRetryTimes <= 0) { return 0; } int sleepTime = initSleepInMilliSecond; for (int i = 1; i < hadRetryTimes; i++) { sleepTime += sleepTime; if (sleepTime > 30000) { sleepTime = 30000; break; } } return sleepTime; } public static SyncClient getOTSInstance(OTSConf conf) { ClientConfiguration clientConfigure = new ClientConfiguration(); clientConfigure.setIoThreadCount(conf.getIoThreadCount()); clientConfigure.setMaxConnections(conf.getConcurrencyWrite()); clientConfigure.setSocketTimeoutInMillisecond(conf.getSocketTimeout()); clientConfigure.setConnectionTimeoutInMillisecond(conf.getConnectTimeoutInMillisecond()); clientConfigure.setRetryStrategy(new DefaultNoRetry()); SyncClient ots = new SyncClient( conf.getEndpoint(), conf.getAccessId(), conf.getAccessKey(), conf.getInstanceName(), clientConfigure); Map extraHeaders = new HashMap(); extraHeaders.put("x-ots-sdk-type", "public"); extraHeaders.put("x-ots-request-source", "datax-otswriter"); ots.setExtraHeaders(extraHeaders); return ots; } public static LinkedHashMap getEncodePkColumnMapping(TableMeta meta, List attrColumns) throws OTSCriticalException { LinkedHashMap attrColumnMapping = new LinkedHashMap(); for (Entry en : meta.getPrimaryKeyMap().entrySet()) { // don't care performance int i = 0; for (; i < attrColumns.size(); i++) { if (attrColumns.get(i).getName().equals(en.getKey())) { attrColumnMapping.put(GsonParser.primaryKeySchemaToJson(attrColumns.get(i)), i); break; } } if (i == attrColumns.size()) { // exception branch throw new OTSCriticalException(String.format(OTSErrorMessage.INPUT_PK_NAME_NOT_EXIST_IN_META_ERROR, en.getKey())); } } return attrColumnMapping; } public static LinkedHashMap getEncodePkColumnMappingWithAutoIncrement(TableMeta meta, List attrColumns) throws OTSCriticalException { LinkedHashMap attrColumnMapping = new LinkedHashMap(); for (Entry en : meta.getPrimaryKeySchemaMap().entrySet()) { // don't care performance if(en.getValue().hasOption()){ continue; } int i = 0; for (; i < attrColumns.size(); i++) { if (attrColumns.get(i).getName().equals(en.getKey())) { attrColumnMapping.put(GsonParser.primaryKeySchemaToJson(attrColumns.get(i)), i); break; } } if (i == attrColumns.size()) { // exception branch throw new OTSCriticalException(String.format(OTSErrorMessage.INPUT_PK_NAME_NOT_EXIST_IN_META_ERROR, en.getKey())); } } return attrColumnMapping; } public static Map getPkColumnMapping(Map mapping) { Map target = new LinkedHashMap(); for (Entry en : mapping.entrySet()) { target.put(GsonParser.jsonToPrimaryKeySchema(en.getKey()), en.getValue()); } return target; } public static Map getAttrColumnMapping(List attrColumns) { Map attrColumnMapping = new LinkedHashMap(); for (OTSAttrColumn c : attrColumns) { attrColumnMapping.put(c.getSrcName(), c); } return attrColumnMapping; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/CommonOld.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; import com.alibaba.datax.plugin.writer.otswriter.model.RowDeleteChangeWithRecord; import com.alibaba.datax.plugin.writer.otswriter.model.RowPutChangeWithRecord; import com.alibaba.datax.plugin.writer.otswriter.model.RowUpdateChangeWithRecord; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import com.aliyun.openservices.ots.ClientException; import com.aliyun.openservices.ots.OTSException; import com.aliyun.openservices.ots.model.ColumnValue; import com.aliyun.openservices.ots.model.PrimaryKeyValue; import com.aliyun.openservices.ots.model.RowChange; import com.aliyun.openservices.ots.model.RowPrimaryKey; import org.apache.commons.math3.util.Pair; import java.util.ArrayList; import java.util.List; public class CommonOld { public static RowPrimaryKey getPKFromRecord(List pkColumns, Record r) { RowPrimaryKey primaryKey = new RowPrimaryKey(); int pkCount = pkColumns.size(); for (int i = 0; i < pkCount; i++) { Column col = r.getColumn(i); PrimaryKeySchema expect = pkColumns.get(i); if (col.getRawData() == null) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_VALUE_IS_NULL_ERROR, expect.getName())); } PrimaryKeyValue pk = ColumnConversionOld.columnToPrimaryKeyValue(col, expect); primaryKey.addPrimaryKeyColumn(expect.getName(), pk); } return primaryKey; } public static List> getAttrFromRecord(int pkCount, List attrColumns, Record r) { List> attr = new ArrayList>(r.getColumnNumber()); for (int i = 0; i < attrColumns.size(); i++) { Column col = r.getColumn(i + pkCount); com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn expect = attrColumns.get(i); if (col.getRawData() == null) { attr.add(new Pair(expect.getName(), null)); continue; } ColumnValue cv = ColumnConversionOld.columnToColumnValue(col, expect); attr.add(new Pair(expect.getName(), cv)); } return attr; } public static RowChange columnValuesToRowChange(String tableName, com.alibaba.datax.plugin.writer.otswriter.model.OTSOpType type, RowPrimaryKey pk, List> values) { switch (type) { case PUT_ROW: RowPutChangeWithRecord rowPutChange = new RowPutChangeWithRecord(tableName); rowPutChange.setPrimaryKey(pk); for (Pair en : values) { if (en.getValue() != null) { rowPutChange.addAttributeColumn(en.getKey(), en.getValue()); } } return rowPutChange; case UPDATE_ROW: RowUpdateChangeWithRecord rowUpdateChange = new RowUpdateChangeWithRecord(tableName); rowUpdateChange.setPrimaryKey(pk); for (Pair en : values) { if (en.getValue() != null) { rowUpdateChange.addAttributeColumn(en.getKey(), en.getValue()); } else { rowUpdateChange.deleteAttributeColumn(en.getKey()); } } return rowUpdateChange; case DELETE_ROW: RowDeleteChangeWithRecord rowDeleteChange = new RowDeleteChangeWithRecord(tableName); rowDeleteChange.setPrimaryKey(pk); return rowDeleteChange; default: throw new IllegalArgumentException(String.format(OTSErrorMessage.UNSUPPORT_PARSE, type, "RowChange")); } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/DefaultNoRetry.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alicloud.openservices.tablestore.model.DefaultRetryStrategy; import com.alicloud.openservices.tablestore.model.RetryStrategy; public class DefaultNoRetry extends DefaultRetryStrategy { public DefaultNoRetry() { super(); } @Override public RetryStrategy clone() { return super.clone(); } @Override public int getRetries() { return super.getRetries(); } @Override public boolean shouldRetry(String action, Exception ex) { return false; } @Override public long nextPause(String action, Exception ex) { return super.nextPause(action, ex); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/GsonParser.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; import com.alicloud.openservices.tablestore.model.Direction; import com.alicloud.openservices.tablestore.model.PrimaryKey; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import com.alicloud.openservices.tablestore.model.TableMeta; import com.google.gson.Gson; import com.google.gson.GsonBuilder; public class GsonParser { private static Gson gsonBuilder() { return new GsonBuilder() .create(); } public static String confToJson (OTSConf conf) { Gson g = gsonBuilder(); return g.toJson(conf); } public static OTSConf jsonToConf (String jsonStr) { Gson g = gsonBuilder(); return g.fromJson(jsonStr, OTSConf.class); } public static String directionToJson (Direction direction) { Gson g = gsonBuilder(); return g.toJson(direction); } public static Direction jsonToDirection (String jsonStr) { Gson g = gsonBuilder(); return g.fromJson(jsonStr, Direction.class); } public static String metaToJson (TableMeta meta) { Gson g = gsonBuilder(); return g.toJson(meta); } public static String primaryKeyToJson (PrimaryKey row) { Gson g = gsonBuilder(); return g.toJson(row); } public static String primaryKeySchemaToJson (PrimaryKeySchema schema) { Gson g = gsonBuilder(); return g.toJson(schema); } public static PrimaryKeySchema jsonToPrimaryKeySchema (String jsonStr) { Gson g = gsonBuilder(); return g.fromJson(jsonStr, PrimaryKeySchema.class); } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/LineAndError.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.plugin.writer.otswriter.model.OTSLine; public class LineAndError { private OTSLine line; private com.alicloud.openservices.tablestore.model.Error error; public LineAndError(OTSLine record, com.alicloud.openservices.tablestore.model.Error error) { this.line = record; this.error = error; } public OTSLine getLine() { return line; } public com.alicloud.openservices.tablestore.model.Error getError() { return error; } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParamChecker.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.exception.CommonErrorCode; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.otswriter.model.OTSAttrColumn; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; import com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage; import com.alibaba.datax.plugin.writer.otswriter.model.OTSMode; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import com.alicloud.openservices.tablestore.model.PrimaryKeyType; import com.alicloud.openservices.tablestore.model.TableMeta; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; import java.util.concurrent.TimeUnit; import static com.alibaba.datax.plugin.writer.otswriter.model.OTSErrorMessage.*; public class ParamChecker { private static void throwNotExistException(String key) { throw new IllegalArgumentException(String.format(OTSErrorMessage.MISSING_PARAMTER_ERROR, key)); } private static void throwStringLengthZeroException(String key) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PARAMTER_STRING_IS_EMPTY_ERROR, key)); } private static void throwEmptyListException(String key) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PARAMETER_LIST_IS_EMPTY_ERROR, key)); } private static void throwNotListException(String key, Throwable t) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PARAMETER_IS_NOT_ARRAY_ERROR, key), t); } public static String checkStringAndGet(Configuration param, String key) { String value = param.getString(key); value = value != null ? value.trim() : null; if (null == value) { throwNotExistException(key); } else if (value.length() == 0) { throwStringLengthZeroException(key); } return value; } public static List checkListAndGet(Configuration param, String key, boolean isCheckEmpty) { List value = null; try { value = param.getList(key); } catch (ClassCastException e) { throwNotListException(key, e); } if (null == value) { throwNotExistException(key); } else if (isCheckEmpty && value.isEmpty()) { throwEmptyListException(key); } return value; } public static void checkPrimaryKey(TableMeta meta, List pk) { Map pkNameAndTypeMapping = meta.getPrimaryKeyMap(); // 个数是否相等 if (pkNameAndTypeMapping.size() != pk.size()) { throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_COUNT_NOT_EQUAL_META_ERROR, pk.size(), pkNameAndTypeMapping.size())); } // 名字类型是否相等 for (PrimaryKeySchema col : pk) { PrimaryKeyType type = pkNameAndTypeMapping.get(col.getName()); if (type == null) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_MISSING_ERROR, col.getName())); } if (type != col.getType()) { throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_TYPE_NOT_MATCH_META_ERROR, col.getName(), type, col.getType())); } } } public static void checkVersion(OTSConf conf) { /** * conf检查遵循以下规则 * 1. 旧版本插件 不支持 主键自增列 * 2. 旧版本插件 不支持 多版本模式 * 3. 多版本模式 不支持 主键自增列 * 4. 旧版本插件 不支持 时序数据表 * 5. 时序数据表 不支持 主键自增列 */ if (!conf.isNewVersion() && conf.getEnableAutoIncrement()) { throw new IllegalArgumentException(PUBLIC_SDK_NO_SUPPORT_AUTO_INCREMENT); } if (!conf.isNewVersion() && conf.getMode() == OTSMode.MULTI_VERSION) { throw new IllegalArgumentException(PUBLIC_SDK_NO_SUPPORT_MULTI_VERSION); } if (conf.getMode() == OTSMode.MULTI_VERSION && conf.getEnableAutoIncrement()) { throw new IllegalArgumentException(NOT_SUPPORT_MULTI_VERSION_AUTO_INCREMENT); } if (!conf.isNewVersion() && conf.isTimeseriesTable()) { throw new IllegalArgumentException(PUBLIC_SDK_NO_SUPPORT_TIMESERIES_TABLE); } if (conf.isTimeseriesTable() && conf.getEnableAutoIncrement()) { throw new IllegalArgumentException(NOT_SUPPORT_TIMESERIES_TABLE_AUTO_INCREMENT); } } public static void checkPrimaryKeyWithAutoIncrement(TableMeta meta, List pk) { Map pkNameAndTypeMapping = meta.getPrimaryKeyMap(); int autoIncrementKeySize = 0; for(PrimaryKeySchema p : meta.getPrimaryKeyList()){ if(p.hasOption()){ autoIncrementKeySize++; } } // 个数是否相等 if (pkNameAndTypeMapping.size() != pk.size() + autoIncrementKeySize) { throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_COUNT_NOT_EQUAL_META_ERROR, pk.size() + autoIncrementKeySize, pkNameAndTypeMapping.size())); } // 名字类型是否相等 for (PrimaryKeySchema col : pk) { if(col.hasOption()){ continue; } PrimaryKeyType type = pkNameAndTypeMapping.get(col.getName()); if (type == null) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_COLUMN_MISSING_ERROR, col.getName())); } if (type != col.getType()) { throw new IllegalArgumentException(String.format(OTSErrorMessage.INPUT_PK_TYPE_NOT_MATCH_META_ERROR, col.getName(), type, col.getType())); } } } public static void checkAttribute(List attr) { // 检查重复列 Set names = new HashSet(); for (OTSAttrColumn col : attr) { if (names.contains(col.getName())) { throw new IllegalArgumentException(String.format(OTSErrorMessage.ATTR_REPEAT_COLUMN_ERROR, col.getName())); } else { names.add(col.getName()); } } } public static TimeUnit checkTimeUnitAndGet(String str) { if (null == str) { return null; } else if ("NANOSECONDS".equalsIgnoreCase(str)) { return TimeUnit.NANOSECONDS; } else if ("MICROSECONDS".equalsIgnoreCase(str)) { return TimeUnit.MICROSECONDS; } else if ("MILLISECONDS".equalsIgnoreCase(str)) { return TimeUnit.MILLISECONDS; } else if ("SECONDS".equalsIgnoreCase(str)) { return TimeUnit.SECONDS; } else if ("MINUTES".equalsIgnoreCase(str)) { return TimeUnit.MINUTES; } else { throw new IllegalArgumentException(String.format(OTSErrorMessage.TIMEUNIT_FORMAT_ERROR, str)); } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/ParseRecord.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.writer.otswriter.OTSCriticalException; import com.alibaba.datax.plugin.writer.otswriter.model.*; import com.alicloud.openservices.tablestore.core.protocol.timeseries.TimeseriesResponseFactory; import com.alicloud.openservices.tablestore.core.utils.Pair; import com.alicloud.openservices.tablestore.model.*; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesKey; import com.alicloud.openservices.tablestore.model.timeseries.TimeseriesRow; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; public class ParseRecord { private static final Logger LOG = LoggerFactory.getLogger(ParseRecord.class); private static com.alicloud.openservices.tablestore.model.Column buildColumn(String name, ColumnValue value, long timestamp) { if (timestamp > 0) { return new com.alicloud.openservices.tablestore.model.Column( name, value, timestamp ); } else { return new com.alicloud.openservices.tablestore.model.Column( name, value ); } } /** * 基于普通方式处理Record * 当PK或者Attr解析失败时,方法会返回null * @param tableName * @param type * @param pkColumns * @param attrColumns * @param record * @param timestamp * @return * @throws OTSCriticalException */ public static OTSLine parseNormalRecordToOTSLine( String tableName, OTSOpType type, Map pkColumns, List attrColumns, Record record, long timestamp) throws OTSCriticalException { PrimaryKey pk = Common.getPKFromRecord(pkColumns, record); if (pk == null) { return null; } List> values = Common.getAttrFromRecord(pkColumns.size(), attrColumns, record); if (values == null) { return null; } switch (type) { case PUT_ROW: RowPutChange rowPutChange = new RowPutChange(tableName, pk); for (Pair en : values) { if (en.getSecond() != null) { rowPutChange.addColumn(buildColumn(en.getFirst(), en.getSecond(), timestamp)); } } if (rowPutChange.getColumnsToPut().isEmpty()) { return null; } return new OTSLine(pk, record, rowPutChange); case UPDATE_ROW: RowUpdateChange rowUpdateChange = new RowUpdateChange(tableName, pk); for (Pair en : values) { if (en.getSecond() != null) { rowUpdateChange.put(buildColumn(en.getFirst(), en.getSecond(), timestamp)); } else { rowUpdateChange.deleteColumns(en.getFirst()); // 删除整列 } } return new OTSLine(pk, record, rowUpdateChange); default: LOG.error("Bug branch, can not support : {}(OTSOpType)", type); throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT, type)); } } public static OTSLine parseNormalRecordToOTSLineWithAutoIncrement( String tableName, OTSOpType type, Map pkColumns, List attrColumns, Record record, long timestamp, PrimaryKeySchema autoIncrementPrimaryKey) throws OTSCriticalException { PrimaryKey pk = Common.getPKFromRecordWithAutoIncrement(pkColumns, record, autoIncrementPrimaryKey); if (pk == null) { return null; } List> values = Common.getAttrFromRecord(pkColumns.size(), attrColumns, record); if (values == null) { return null; } switch (type) { case PUT_ROW: RowPutChange rowPutChange = new RowPutChange(tableName, pk); for (Pair en : values) { if (en.getSecond() != null) { rowPutChange.addColumn(buildColumn(en.getFirst(), en.getSecond(), timestamp)); } } if (rowPutChange.getColumnsToPut().isEmpty()) { return null; } return new OTSLine(pk, record, rowPutChange); case UPDATE_ROW: RowUpdateChange rowUpdateChange = new RowUpdateChange(tableName, pk); for (Pair en : values) { if (en.getSecond() != null) { rowUpdateChange.put(buildColumn(en.getFirst(), en.getSecond(), timestamp)); } else { rowUpdateChange.deleteColumns(en.getFirst()); // 删除整列 } } return new OTSLine(pk, record, rowUpdateChange); default: LOG.error("Bug branch, can not support : {}(OTSOpType)", type); throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT, type)); } } public static OTSLine parseNormalRecordToOTSLineOfTimeseriesTable( List attrColumns, Record record, TimeUnit timeUnit ) throws OTSCriticalException { if (attrColumns.size() != record.getColumnNumber()){ throw new OTSCriticalException(String.format("Bug branch, the count(%d) of record != count(%d) of column from config.", record.getColumnNumber(), (attrColumns.size()))); } Map tags = new HashMap<>(); String measurementName = null; String dataSource = null; Long timeInUs = null; Map columnsValues = new HashMap<>(); try { for (int i = 0; i < attrColumns.size(); i++) { // 如果是tags内部字段 if (attrColumns.get(i).getTag()){ tags.put(attrColumns.get(i).getName(), record.getColumn(i).asString()); } else if (attrColumns.get(i).getName().equals(OTSConst.MEASUREMENT_NAME)){ measurementName = record.getColumn(i).asString(); } else if (attrColumns.get(i).getName().equals(OTSConst.DATA_SOURCE)){ dataSource = record.getColumn(i).asString(); } else if (attrColumns.get(i).getName().equals(OTSConst.TAGS)){ String tagString = record.getColumn(i).asString(); tags.putAll(TimeseriesResponseFactory.parseTagsOrAttrs(tagString)); } else if (attrColumns.get(i).getName().equals(OTSConst.TIME)){ timeInUs = record.getColumn(i).asLong(); } else{ switch (attrColumns.get(i).getType()){ case INTEGER: columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromLong(record.getColumn(i).asLong())); break; case BOOLEAN: columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromBoolean(record.getColumn(i).asBoolean())); break; case DOUBLE: columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromDouble(record.getColumn(i).asDouble())); break; case BINARY: columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromBinary(record.getColumn(i).asBytes())); break; case STRING: default: columnsValues.put(attrColumns.get(i).getName(), ColumnValue.fromString(record.getColumn(i).asString())); break; } } } // 度量名称与时间戳字段值不能为空,否则报错 if (measurementName == null){ throw new IllegalArgumentException("The value of the '_m_name' (measurement) field cannot be empty. Please check the input of writer"); } else if (timeInUs == null){ throw new IllegalArgumentException("The value of the '_time' field cannot be empty. Please check the input of writer"); } } catch (IllegalArgumentException e) { LOG.warn("getAttrFromRecord fail : {}", e.getMessage(), e); CollectorUtil.collect(record, e.getMessage()); return null; } TimeseriesKey key = new TimeseriesKey(measurementName, dataSource, tags); TimeseriesRow row = new TimeseriesRow(key); switch (timeUnit){ case NANOSECONDS: timeInUs = timeInUs / 1000; break; case MILLISECONDS: timeInUs = timeInUs * 1000; break; case SECONDS: timeInUs = timeInUs * 1000 * 1000; break; case MINUTES: timeInUs = timeInUs * 1000 * 1000 * 60; break; case MICROSECONDS: default: break; } row.setTimeInUs(timeInUs); for (Map.Entry entry : columnsValues.entrySet()){ row.addField(entry.getKey(), entry.getValue()); } return new OTSLine(record, row); } public static String getDefineCoumnName(String attrColumnNamePrefixFilter, int columnNameIndex, Record r) { String columnName = r.getColumn(columnNameIndex).asString(); if (attrColumnNamePrefixFilter != null) { if (columnName.startsWith(attrColumnNamePrefixFilter) && columnName.length() > attrColumnNamePrefixFilter.length()) { columnName = columnName.substring(attrColumnNamePrefixFilter.length()); } else { throw new IllegalArgumentException(String.format(OTSErrorMessage.COLUMN_NOT_DEFINE, columnName)); } } return columnName; } private static void appendCellToRowUpdateChange( Map pkColumns, String attrColumnNamePrefixFilter, Record r, RowUpdateChange updateChange ) throws OTSCriticalException { try { String columnName = getDefineCoumnName(attrColumnNamePrefixFilter, pkColumns.size(), r); Column timestamp = r.getColumn(pkColumns.size() + 1); Column value = r.getColumn(pkColumns.size() + 2); if (timestamp.getRawData() == null) { throw new IllegalArgumentException(OTSErrorMessage.MULTI_VERSION_TIMESTAMP_IS_EMPTY); } if (value.getRawData() == null) { updateChange.deleteColumn(columnName, timestamp.asLong()); return; } ColumnValue otsValue = ColumnConversion.columnToColumnValue(value); com.alicloud.openservices.tablestore.model.Column c = new com.alicloud.openservices.tablestore.model.Column( columnName, otsValue, timestamp.asLong() ); updateChange.put(c); return; } catch (IllegalArgumentException e) { LOG.warn("parseToColumn fail : {}", e.getMessage(), e); CollectorUtil.collect(r, e.getMessage()); return; } catch (DataXException e) { LOG.warn("parseToColumn fail : {}", e.getMessage(), e); CollectorUtil.collect(r, e.getMessage()); return; } } /** * 基于特殊模式处理Record * 当所有Record转换为Column失败时,方法会返回null * @param tableName * @param type * @param pkColumns * @param records * @return * @throws Exception */ public static OTSLine parseMultiVersionRecordToOTSLine( String tableName, OTSOpType type, Map pkColumns, String attrColumnNamePrefixFilter, PrimaryKey pk, List records) throws OTSCriticalException { switch(type) { case UPDATE_ROW: RowUpdateChange updateChange = new RowUpdateChange(tableName, pk); for (Record r : records) { appendCellToRowUpdateChange(pkColumns, attrColumnNamePrefixFilter, r, updateChange); } if (updateChange.getColumnsToUpdate().isEmpty()) { return null; } else { return new OTSLine(pk, records, updateChange); } default: LOG.error("Bug branch, can not support : {}(OTSOpType)", type); throw new OTSCriticalException(String.format(OTSErrorMessage.UNSUPPORT, type)); } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/RetryHelper.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.plugin.writer.otswriter.OTSErrorCode; import com.alicloud.openservices.tablestore.ClientException; import com.alicloud.openservices.tablestore.TableStoreException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.HashSet; import java.util.Set; import java.util.concurrent.Callable; public class RetryHelper { private static final Logger LOG = LoggerFactory.getLogger(RetryHelper.class); private static final Set noRetryErrorCode = prepareNoRetryErrorCode(); /** * 对重试的封装,方法需要用户传入最大重试次数,最大的重试时间。 * 如果方法执行失败,方法会进入重试,每次重试之前,方法会sleep一段时间(sleep机制请参见 * Common.getDelaySendMillinSeconds方法),直到重试次数达到上限,系统会抛出异常。 * @param callable * @param maxRetryTimes * @param sleepInMilliSecond * @return * @throws Exception */ public static V executeWithRetry(Callable callable, int maxRetryTimes, int sleepInMilliSecond) throws Exception { int retryTimes = 0; while (true){ Thread.sleep(Common.getDelaySendMillinSeconds(retryTimes, sleepInMilliSecond)); try { return callable.call(); } catch (Exception e) { LOG.warn("Call callable fail.", e); if (!canRetry(e)){ LOG.error("Can not retry for Exception : {}", e.getMessage()); throw e; } else if (retryTimes >= maxRetryTimes) { LOG.error("Retry times more than limition. maxRetryTimes : {}", maxRetryTimes); throw e; } retryTimes++; LOG.warn("Retry time : {}", retryTimes); } } } private static Set prepareNoRetryErrorCode() { final Set pool = new HashSet(); pool.add(OTSErrorCode.AUTHORIZATION_FAILURE); pool.add(OTSErrorCode.INVALID_PARAMETER); pool.add(OTSErrorCode.REQUEST_TOO_LARGE); pool.add(OTSErrorCode.OBJECT_NOT_EXIST); pool.add(OTSErrorCode.OBJECT_ALREADY_EXIST); pool.add(OTSErrorCode.INVALID_PK); pool.add(OTSErrorCode.OUT_OF_COLUMN_COUNT_LIMIT); pool.add(OTSErrorCode.OUT_OF_ROW_SIZE_LIMIT); pool.add(OTSErrorCode.CONDITION_CHECK_FAIL); return pool; } public static boolean canRetry(String otsErrorCode) { if (noRetryErrorCode.contains(otsErrorCode)) { return false; } else { return true; } } public static boolean canRetry(Exception exception) { TableStoreException e = null; if (exception instanceof TableStoreException) { e = (TableStoreException) exception; LOG.warn( "OTSException:ErrorCode:{}, ErrorMsg:{}, RequestId:{}", new Object[]{e.getErrorCode(), e.getMessage(), e.getRequestId()} ); return canRetry(e.getErrorCode()); } else if (exception instanceof ClientException) { ClientException ce = (ClientException) exception; LOG.warn( "ClientException:ErrorMsg:{}", ce.getMessage() ); return true; } else { return false; } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WithRecord.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.common.element.Record; public interface WithRecord { Record getRecord(); void setRecord(Record record); } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterModelParser.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.plugin.writer.otswriter.model.*; import com.alicloud.openservices.tablestore.model.ColumnType; import com.alicloud.openservices.tablestore.model.PrimaryKeySchema; import com.alicloud.openservices.tablestore.model.PrimaryKeyType; import com.alicloud.openservices.tablestore.model.TableMeta; import java.util.*; /** * 解析配置中参数 * @author redchen * */ public class WriterModelParser { public static PrimaryKeyType parsePrimaryKeyType(String type) { if (type.equalsIgnoreCase(OTSConst.TYPE_STRING)) { return PrimaryKeyType.STRING; } else if (type.equalsIgnoreCase(OTSConst.TYPE_INTEGER)) { return PrimaryKeyType.INTEGER; } else if (type.equalsIgnoreCase(OTSConst.TYPE_BINARY)) { return PrimaryKeyType.BINARY; } else { throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_TYPE_ERROR, type)); } } private static Object columnGetObject(Map column, String key, String error) { Object value = column.get(key); if (value == null) { throw new IllegalArgumentException(error); } return value; } private static String checkString(Object value, String error) { if (!(value instanceof String)) { throw new IllegalArgumentException(error); } return (String)value; } private static void checkStringEmpty(String value, String error) { if (value.isEmpty()) { throw new IllegalArgumentException(error); } } public static PrimaryKeySchema parseOTSPKColumn(Map column) { String typeStr = checkString( columnGetObject(column, OTSConst.TYPE, String.format(OTSErrorMessage.PK_MAP_FILED_MISSING_ERROR, OTSConst.TYPE)), String.format(OTSErrorMessage.PK_MAP_KEY_TYPE_ERROR, OTSConst.TYPE) ); String nameStr = checkString( columnGetObject(column, OTSConst.NAME, String.format(OTSErrorMessage.PK_MAP_FILED_MISSING_ERROR, OTSConst.NAME)), String.format(OTSErrorMessage.PK_MAP_KEY_TYPE_ERROR, OTSConst.NAME) ); checkStringEmpty(typeStr, OTSErrorMessage.PK_COLUMN_TYPE_IS_EMPTY_ERROR); checkStringEmpty(nameStr, OTSErrorMessage.PK_COLUMN_NAME_IS_EMPTY_ERROR); if (column.size() == 2) { return new PrimaryKeySchema(nameStr, parsePrimaryKeyType(typeStr)); } else { throw new IllegalArgumentException(OTSErrorMessage.PK_MAP_INCLUDE_NAME_TYPE_ERROR); } } public static List parseOTSPKColumnList(TableMeta meta, List values) { Map pkMapping = meta.getPrimaryKeyMap(); List pks = new ArrayList(); for (Object obj : values) { /** * json 中primary key格式为: * "primaryKey":[ * "userid", * "groupid" *] */ if (obj instanceof String) { String name = (String) obj; PrimaryKeyType type = pkMapping.get(name); if (null == type) { throw new IllegalArgumentException(String.format(OTSErrorMessage.PK_IS_NOT_EXIST_AT_OTS_ERROR, name)); } else { pks.add(new PrimaryKeySchema(name, type)); } } /** * json 中primary key格式为: * "primaryKey" : [ * {"name":"pk1", "type":"string"}, * {"name":"pk2", "type":"int"} *], */ else if (obj instanceof Map) { @SuppressWarnings("unchecked") Map column = (Map) obj; pks.add(parseOTSPKColumn(column)); } else { throw new IllegalArgumentException(OTSErrorMessage.PK_ITEM_IS_ILLEAGAL_ERROR); } } return pks; } public static ColumnType parseColumnType(String type) { if (type.equalsIgnoreCase(OTSConst.TYPE_STRING)) { return ColumnType.STRING; } else if (type.equalsIgnoreCase(OTSConst.TYPE_INTEGER)) { return ColumnType.INTEGER; } else if (type.equalsIgnoreCase(OTSConst.TYPE_BOOLEAN)) { return ColumnType.BOOLEAN; } else if (type.equalsIgnoreCase(OTSConst.TYPE_DOUBLE)) { return ColumnType.DOUBLE; } else if (type.equalsIgnoreCase(OTSConst.TYPE_BINARY)) { return ColumnType.BINARY; } else { throw new IllegalArgumentException(String.format(OTSErrorMessage.ATTR_TYPE_ERROR, type)); } } public static OTSAttrColumn parseOTSAttrColumn(Map column, OTSMode mode) { String typeStr = checkString( columnGetObject(column, OTSConst.TYPE, String.format(OTSErrorMessage.ATTR_MAP_FILED_MISSING_ERROR, OTSConst.TYPE)), String.format(OTSErrorMessage.ATTR_MAP_KEY_TYPE_ERROR, OTSConst.TYPE) ); String nameStr = checkString( columnGetObject(column, OTSConst.NAME, String.format(OTSErrorMessage.ATTR_MAP_FILED_MISSING_ERROR, OTSConst.NAME)), String.format(OTSErrorMessage.ATTR_MAP_KEY_TYPE_ERROR, OTSConst.NAME) ); checkStringEmpty(typeStr, OTSErrorMessage.ATTR_COLUMN_TYPE_IS_EMPTY_ERROR); checkStringEmpty(nameStr, OTSErrorMessage.ATTR_COLUMN_NAME_IS_EMPTY_ERROR); if (mode == OTSMode.MULTI_VERSION) { String srcNameStr = checkString( columnGetObject(column, OTSConst.SRC_NAME, String.format(OTSErrorMessage.ATTR_MAP_FILED_MISSING_ERROR, OTSConst.SRC_NAME)), String.format(OTSErrorMessage.ATTR_MAP_KEY_TYPE_ERROR, OTSConst.SRC_NAME) ); checkStringEmpty(srcNameStr, OTSErrorMessage.ATTR_COLUMN_SRC_NAME_IS_EMPTY_ERROR); if (column.size() == 3) { return new OTSAttrColumn(srcNameStr, nameStr, parseColumnType(typeStr)); } else { throw new IllegalArgumentException(OTSErrorMessage.ATTR_MAP_INCLUDE_SRCNAME_NAME_TYPE_ERROR); } } else { if (column.size() == 2) { return new OTSAttrColumn(nameStr, parseColumnType(typeStr)); } else { throw new IllegalArgumentException(OTSErrorMessage.ATTR_MAP_INCLUDE_NAME_TYPE_ERROR); } } } public static List parseOTSTimeseriesRowAttrList(List values) { List attrs = new ArrayList(); // columns内部必须配置_m_name与_time字段,否则报错 boolean getMeasurementField = false; boolean getTimeField = false; for (Object obj : values) { if (obj instanceof Map) { @SuppressWarnings("unchecked") Map column = (Map) obj; String nameStr = checkString( columnGetObject(column, OTSConst.NAME, String.format(OTSErrorMessage.ATTR_MAP_FILED_MISSING_ERROR, OTSConst.NAME)), String.format(OTSErrorMessage.ATTR_MAP_KEY_TYPE_ERROR, OTSConst.NAME) ); boolean isTag = column.get(OTSConst.IS_TAG) != null && Boolean.parseBoolean((String) column.get(OTSConst.IS_TAG)); String typeStr = "String"; if (column.get(OTSConst.TYPE) != null){ typeStr = (String) column.get(OTSConst.TYPE); } checkStringEmpty(nameStr, OTSErrorMessage.ATTR_COLUMN_NAME_IS_EMPTY_ERROR); if (nameStr.equals(OTSConst.MEASUREMENT_NAME)){ getMeasurementField = true; } else if (nameStr.equals(OTSConst.TIME)) { getTimeField = true; } attrs.add(new OTSAttrColumn(nameStr, parseColumnType(typeStr), isTag)); } else { throw new IllegalArgumentException(OTSErrorMessage.ATTR_ITEM_IS_NOT_MAP_ERROR); } } if (!getMeasurementField){ throw new IllegalArgumentException(OTSErrorMessage.NO_FOUND_M_NAME_FIELD_ERROR); } else if (!getTimeField) { throw new IllegalArgumentException(OTSErrorMessage.NO_FOUND_TIME_FIELD_ERROR); } return attrs; } private static void checkMultiAttrColumn(List pk, List attrs, OTSMode mode) { // duplicate column name { Set pool = new HashSet(); for (OTSAttrColumn col : attrs) { if (pool.contains(col.getName())) { throw new IllegalArgumentException(String.format(OTSErrorMessage.MULTI_ATTR_COLUMN_ERROR, col.getName())); } else { pool.add(col.getName()); } } for (PrimaryKeySchema col : pk) { if (pool.contains(col.getName())) { throw new IllegalArgumentException(String.format(OTSErrorMessage.MULTI_PK_ATTR_COLUMN_ERROR, col.getName())); } else { pool.add(col.getName()); } } } // duplicate src column name if (mode == OTSMode.MULTI_VERSION) { Set pool = new HashSet(); for (OTSAttrColumn col : attrs) { if (pool.contains(col.getSrcName())) { throw new IllegalArgumentException(String.format(OTSErrorMessage.MULTI_ATTR_SRC_COLUMN_ERROR, col.getSrcName())); } else { pool.add(col.getSrcName()); } } } } public static List parseOTSAttrColumnList(List pk, List values, OTSMode mode) { List attrs = new ArrayList(); for (Object obj : values) { if (obj instanceof Map) { @SuppressWarnings("unchecked") Map column = (Map) obj; attrs.add(parseOTSAttrColumn(column, mode)); } else { throw new IllegalArgumentException(OTSErrorMessage.ATTR_ITEM_IS_NOT_MAP_ERROR); } } checkMultiAttrColumn(pk, attrs, mode); return attrs; } public static OTSOpType parseOTSOpType(String value, OTSMode mode) { OTSOpType type = null; if (value.equalsIgnoreCase(OTSConst.OTS_OP_TYPE_PUT)) { type = OTSOpType.PUT_ROW; } else if (value.equalsIgnoreCase(OTSConst.OTS_OP_TYPE_UPDATE)) { type = OTSOpType.UPDATE_ROW; }else if (value.equalsIgnoreCase(OTSConst.OTS_OP_TYPE_DELETE)) { type = OTSOpType.DELETE_ROW; }else { throw new IllegalArgumentException(String.format(OTSErrorMessage.OPERATION_PARSE_ERROR, value)); } if (mode == OTSMode.MULTI_VERSION && type == OTSOpType.PUT_ROW) { throw new IllegalArgumentException(String.format(OTSErrorMessage.MUTLI_MODE_OPERATION_PARSE_ERROR, value)); } return type; } public static OTSMode parseOTSMode(String value) { if (value.equalsIgnoreCase(OTSConst.OTS_MODE_NORMAL)) { return OTSMode.NORMAL; } else if (value.equalsIgnoreCase(OTSConst.OTS_MODE_MULTI_VERSION)) { return OTSMode.MULTI_VERSION; } else { throw new IllegalArgumentException(String.format(OTSErrorMessage.MODE_PARSE_ERROR, value)); } } } ================================================ FILE: otswriter/src/main/java/com/alibaba/datax/plugin/writer/otswriter/utils/WriterRetryPolicy.java ================================================ package com.alibaba.datax.plugin.writer.otswriter.utils; import com.alibaba.datax.plugin.writer.otswriter.model.OTSConf; import com.aliyun.openservices.ots.internal.OTSRetryStrategy; public class WriterRetryPolicy implements OTSRetryStrategy { OTSConf conf; public WriterRetryPolicy(OTSConf conf) { this.conf = conf; } @Override public boolean shouldRetry(String action, Exception ex, int retries) { return retries <= conf.getRetry(); } @Override public long getPauseDelay(String action, Exception ex, int retries) { if (retries <= 0) { return 0; } int sleepTime = conf.getSleepInMillisecond() * retries; return sleepTime > 30000 ? 30000 : sleepTime; } } ================================================ FILE: otswriter/src/main/resources/plugin.json ================================================ { "name": "otswriter", "class": "com.alibaba.datax.plugin.writer.otswriter.OtsWriter", "description": "", "developer": "alibaba" } ================================================ FILE: otswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "otswriter", "parameter": { "endpoint":"", "accessId":"", "accessKey":"", "instanceName":"", "table":"", "primaryKey" : [], "column" : [], "writeMode" : "" } } ================================================ FILE: package.xml ================================================ tar.gz dir false transformer/target/datax/ **/*.* datax core/target/datax/ **/*.* datax mysqlreader/target/datax/ **/*.* datax oceanbasev10reader/target/datax/ **/*.* datax obhbasereader/target/datax/ **/*.* datax drdsreader/target/datax/ **/*.* datax oraclereader/target/datax/ **/*.* datax sqlserverreader/target/datax/ **/*.* datax postgresqlreader/target/datax/ **/*.* datax kingbaseesreader/target/datax/ **/*.* datax rdbmsreader/target/datax/ **/*.* datax odpsreader/target/datax/ **/*.* datax otsreader/target/datax/ **/*.* datax otsstreamreader/target/datax/ **/*.* datax txtfilereader/target/datax/ **/*.* datax ossreader/target/datax/ **/*.* datax mongodbreader/target/datax/ **/*.* datax tdenginereader/target/datax/ **/*.* datax streamreader/target/datax/ **/*.* datax ftpreader/target/datax/ **/*.* datax clickhousereader/target/datax/ **/*.* datax hdfsreader/target/datax/ **/*.* datax hbase11xreader/target/datax/ **/*.* datax hbase094xreader/target/datax/ **/*.* datax opentsdbreader/target/datax/ **/*.* datax cassandrareader/target/datax/ **/*.* datax gdbreader/target/datax/ **/*.* datax hbase11xsqlreader/target/datax/ **/*.* datax hbase20xsqlreader/target/datax/ **/*.* datax tsdbreader/target/datax/ **/*.* datax datahubreader/target/datax/ **/*.* datax loghubreader/target/datax/ **/*.* datax starrocksreader/target/datax/ **/*.* datax dorisreader/target/datax/ **/*.* datax sybasereader/target/datax/ **/*.* datax gaussdbreader/target/datax/ **/*.* datax mysqlwriter/target/datax/ **/*.* datax tdenginewriter/target/datax/ **/*.* datax starrockswriter/target/datax/ **/*.* datax drdswriter/target/datax/ **/*.* datax odpswriter/target/datax/ **/*.* datax doriswriter/target/datax/ **/*.* datax txtfilewriter/target/datax/ **/*.* datax ftpwriter/target/datax/ **/*.* datax osswriter/target/datax/ **/*.* datax adswriter/target/datax/ **/*.* datax streamwriter/target/datax/ **/*.* datax otswriter/target/datax/ **/*.* datax mongodbwriter/target/datax/ **/*.* datax oraclewriter/target/datax/ **/*.* datax sqlserverwriter/target/datax/ **/*.* datax postgresqlwriter/target/datax/ **/*.* datax kingbaseeswriter/target/datax/ **/*.* datax rdbmswriter/target/datax/ **/*.* datax ocswriter/target/datax/ **/*.* datax hdfswriter/target/datax/ **/*.* datax hbase11xwriter/target/datax/ **/*.* datax hbase094xwriter/target/datax/ **/*.* datax hbase11xsqlwriter/target/datax/ **/*.* datax elasticsearchwriter/target/datax/ **/*.* datax hbase20xsqlwriter/target/datax/ **/*.* datax tsdbwriter/target/datax/ **/*.* datax adbpgwriter/target/datax/ **/*.* datax cassandrawriter/target/datax/ **/*.* datax clickhousewriter/target/datax/ **/*.* datax databendwriter/target/datax/ **/*.* datax oscarwriter/target/datax/ **/*.* datax oceanbasev10writer/target/datax/ **/*.* datax obhbasewriter/target/datax/ **/*.* datax gdbwriter/target/datax/ **/*.* datax kuduwriter/target/datax/ **/*.* datax hologresjdbcwriter/target/datax/ **/*.* datax datahubwriter/target/datax/ **/*.* datax loghubwriter/target/datax/ **/*.* datax selectdbwriter/target/datax/ **/*.* datax neo4jwriter/target/datax/ **/*.* datax sybasewriter/target/datax/ **/*.* datax gaussdbwriter/target/datax/ **/*.* datax milvuswriter/target/datax/ **/*.* datax ================================================ FILE: plugin-rdbms-util/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT plugin-rdbms-util plugin-rdbms-util jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j commons-collections commons-collections 3.0 mysql mysql-connector-java ${mysql.driver.version} test com.oceanbase oceanbase-client 2.4.11 com.google.guava guava org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba druid 1.0.15 junit junit test org.mockito mockito-all 1.9.5 test com.google.guava guava r05 ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/CommonRdbmsReader.java ================================================ package com.alibaba.datax.plugin.rdbms.reader; import com.alibaba.datax.common.element.BoolColumn; import com.alibaba.datax.common.element.BytesColumn; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.statistics.PerfRecord; import com.alibaba.datax.common.statistics.PerfTrace; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.util.OriginalConfPretreatmentUtil; import com.alibaba.datax.plugin.rdbms.reader.util.PreCheckTask; import com.alibaba.datax.plugin.rdbms.reader.util.ReaderSplitUtil; import com.alibaba.datax.plugin.rdbms.reader.util.SingleTableSplitUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.google.common.collect.Lists; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.Types; import java.util.ArrayList; import java.util.Collection; import java.util.List; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; public class CommonRdbmsReader { public static class Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); public Job(DataBaseType dataBaseType) { OriginalConfPretreatmentUtil.DATABASE_TYPE = dataBaseType; SingleTableSplitUtil.DATABASE_TYPE = dataBaseType; } public void init(Configuration originalConfig) { OriginalConfPretreatmentUtil.doPretreatment(originalConfig); LOG.debug("After job init(), job config now is:[\n{}\n]", originalConfig.toJSON()); } public void preCheck(Configuration originalConfig,DataBaseType dataBaseType) { /*检查每个表是否有读权限,以及querySql跟splik Key是否正确*/ Configuration queryConf = ReaderSplitUtil.doPreCheckSplit(originalConfig); String splitPK = queryConf.getString(Key.SPLIT_PK); List connList = queryConf.getList(Constant.CONN_MARK, Object.class); String username = queryConf.getString(Key.USERNAME); String password = queryConf.getString(Key.PASSWORD); ExecutorService exec; if (connList.size() < 10){ exec = Executors.newFixedThreadPool(connList.size()); }else{ exec = Executors.newFixedThreadPool(10); } Collection taskList = new ArrayList(); for (int i = 0, len = connList.size(); i < len; i++){ Configuration connConf = Configuration.from(connList.get(i).toString()); PreCheckTask t = new PreCheckTask(username,password,connConf,dataBaseType,splitPK); taskList.add(t); } List> results = Lists.newArrayList(); try { results = exec.invokeAll(taskList); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } for (Future result : results){ try { result.get(); } catch (ExecutionException e) { DataXException de = (DataXException) e.getCause(); throw de; }catch (InterruptedException e) { Thread.currentThread().interrupt(); } } exec.shutdownNow(); } public List split(Configuration originalConfig, int adviceNumber) { return ReaderSplitUtil.doSplit(originalConfig, adviceNumber); } public void post(Configuration originalConfig) { // do nothing } public void destroy(Configuration originalConfig) { // do nothing } } public static class Task { private static final Logger LOG = LoggerFactory .getLogger(Task.class); private static final boolean IS_DEBUG = LOG.isDebugEnabled(); protected final byte[] EMPTY_CHAR_ARRAY = new byte[0]; private DataBaseType dataBaseType; private int taskGroupId = -1; private int taskId=-1; private String username; private String password; private String jdbcUrl; private String mandatoryEncoding; // 作为日志显示信息时,需要附带的通用信息。比如信息所对应的数据库连接等信息,针对哪个表做的操作 private String basicMsg; public Task(DataBaseType dataBaseType) { this(dataBaseType, -1, -1); } public Task(DataBaseType dataBaseType,int taskGropuId, int taskId) { this.dataBaseType = dataBaseType; this.taskGroupId = taskGropuId; this.taskId = taskId; } public void init(Configuration readerSliceConfig) { /* for database connection */ this.username = readerSliceConfig.getString(Key.USERNAME); this.password = readerSliceConfig.getString(Key.PASSWORD); this.jdbcUrl = readerSliceConfig.getString(Key.JDBC_URL); //ob10的处理 if (this.jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING) && this.dataBaseType == DataBaseType.MySql) { String[] ss = this.jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); if (ss.length != 3) { throw DataXException .asDataXException( DBUtilErrorCode.JDBC_OB10_ADDRESS_ERROR, "JDBC OB10格式错误,请联系askdatax"); } LOG.info("this is ob1_0 jdbc url."); this.username = ss[1].trim() +":"+this.username; this.jdbcUrl = ss[2]; LOG.info("this is ob1_0 jdbc url. user=" + this.username + " :url=" + this.jdbcUrl); } this.mandatoryEncoding = readerSliceConfig.getString(Key.MANDATORY_ENCODING, ""); basicMsg = String.format("jdbcUrl:[%s]", this.jdbcUrl); } public void startRead(Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector, int fetchSize) { String querySql = readerSliceConfig.getString(Key.QUERY_SQL); String table = readerSliceConfig.getString(Key.TABLE); PerfTrace.getInstance().addTaskDetails(taskId, table + "," + basicMsg); LOG.info("Begin to read record by Sql: [{}\n] {}.", querySql, basicMsg); PerfRecord queryPerfRecord = new PerfRecord(taskGroupId,taskId, PerfRecord.PHASE.SQL_QUERY); queryPerfRecord.start(); Connection conn = DBUtil.getConnection(this.dataBaseType, jdbcUrl, username, password); // session config .etc related DBUtil.dealWithSessionConfig(conn, readerSliceConfig, this.dataBaseType, basicMsg); int columnNumber = 0; ResultSet rs = null; try { rs = DBUtil.query(conn, querySql, fetchSize); queryPerfRecord.end(); ResultSetMetaData metaData = rs.getMetaData(); columnNumber = metaData.getColumnCount(); //这个统计干净的result_Next时间 PerfRecord allResultPerfRecord = new PerfRecord(taskGroupId, taskId, PerfRecord.PHASE.RESULT_NEXT_ALL); allResultPerfRecord.start(); long rsNextUsedTime = 0; long lastTime = System.nanoTime(); while (rs.next()) { rsNextUsedTime += (System.nanoTime() - lastTime); this.transportOneRecord(recordSender, rs, metaData, columnNumber, mandatoryEncoding, taskPluginCollector); lastTime = System.nanoTime(); } allResultPerfRecord.end(rsNextUsedTime); //目前大盘是依赖这个打印,而之前这个Finish read record是包含了sql查询和result next的全部时间 LOG.info("Finished read record by Sql: [{}\n] {}.", querySql, basicMsg); }catch (Exception e) { throw RdbmsException.asQueryException(this.dataBaseType, e, querySql, table, username); } finally { DBUtil.closeDBResources(null, conn); } } public void post(Configuration originalConfig) { // do nothing } public void destroy(Configuration originalConfig) { // do nothing } protected Record transportOneRecord(RecordSender recordSender, ResultSet rs, ResultSetMetaData metaData, int columnNumber, String mandatoryEncoding, TaskPluginCollector taskPluginCollector) { Record record = buildRecord(recordSender,rs,metaData,columnNumber,mandatoryEncoding,taskPluginCollector); recordSender.sendToWriter(record); return record; } protected Record buildRecord(RecordSender recordSender,ResultSet rs, ResultSetMetaData metaData, int columnNumber, String mandatoryEncoding, TaskPluginCollector taskPluginCollector) { Record record = recordSender.createRecord(); try { for (int i = 1; i <= columnNumber; i++) { switch (metaData.getColumnType(i)) { case Types.CHAR: case Types.NCHAR: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: String rawData; if(StringUtils.isBlank(mandatoryEncoding)){ rawData = rs.getString(i); }else{ rawData = new String((rs.getBytes(i) == null ? EMPTY_CHAR_ARRAY : rs.getBytes(i)), mandatoryEncoding); } record.addColumn(new StringColumn(rawData)); break; case Types.CLOB: case Types.NCLOB: record.addColumn(new StringColumn(rs.getString(i))); break; case Types.SMALLINT: case Types.TINYINT: case Types.INTEGER: case Types.BIGINT: record.addColumn(new LongColumn(rs.getString(i))); break; case Types.NUMERIC: case Types.DECIMAL: record.addColumn(new DoubleColumn(rs.getString(i))); break; case Types.FLOAT: case Types.REAL: case Types.DOUBLE: record.addColumn(new DoubleColumn(rs.getString(i))); break; case Types.TIME: record.addColumn(new DateColumn(rs.getTime(i))); break; // for mysql bug, see http://bugs.mysql.com/bug.php?id=35115 case Types.DATE: if (metaData.getColumnTypeName(i).equalsIgnoreCase("year")) { record.addColumn(new LongColumn(rs.getInt(i))); } else { record.addColumn(new DateColumn(rs.getDate(i))); } break; case Types.TIMESTAMP: record.addColumn(new DateColumn(rs.getTimestamp(i))); break; case Types.BINARY: case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: record.addColumn(new BytesColumn(rs.getBytes(i))); break; // warn: bit(1) -> Types.BIT 可使用BoolColumn // warn: bit(>1) -> Types.VARBINARY 可使用BytesColumn case Types.BOOLEAN: case Types.BIT: record.addColumn(new BoolColumn(rs.getBoolean(i))); break; case Types.NULL: String stringData = null; if(rs.getObject(i) != null) { stringData = rs.getObject(i).toString(); } record.addColumn(new StringColumn(stringData)); break; default: throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库读取这种字段类型. 字段名:[%s], 字段名称:[%s], 字段Java类型:[%s]. 请尝试使用数据库函数将其转换datax支持的类型 或者不同步该字段 .", metaData.getColumnName(i), metaData.getColumnType(i), metaData.getColumnClassName(i))); } } } catch (Exception e) { if (IS_DEBUG) { LOG.debug("read data " + record.toString() + " occur exception:", e); } //TODO 这里识别为脏数据靠谱吗? taskPluginCollector.collectDirtyRecord(record, e); if (e instanceof DataXException) { throw (DataXException) e; } } return record; } } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Constant.java ================================================ package com.alibaba.datax.plugin.rdbms.reader; public final class Constant { public static final String PK_TYPE = "pkType"; public static final Object PK_TYPE_STRING = "pkTypeString"; public static final Object PK_TYPE_LONG = "pkTypeLong"; public static final Object PK_TYPE_MONTECARLO = "pkTypeMonteCarlo"; public static final String SPLIT_MODE_RANDOMSAMPLE = "randomSampling"; public static String CONN_MARK = "connection"; public static String TABLE_NUMBER_MARK = "tableNumber"; public static String IS_TABLE_MODE = "isTableMode"; public final static String FETCH_SIZE = "fetchSize"; public static String QUERY_SQL_TEMPLATE_WITHOUT_WHERE = "select %s from %s "; public static String QUERY_SQL_TEMPLATE = "select %s from %s where (%s)"; public static String TABLE_NAME_PLACEHOLDER = "@table"; public static Integer SPLIT_FACTOR = 5; } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/Key.java ================================================ package com.alibaba.datax.plugin.rdbms.reader; /** * 编码,时区等配置,暂未定. */ public final class Key { public final static String JDBC_URL = "jdbcUrl"; public final static String USERNAME = "username"; public final static String PASSWORD = "password"; public final static String TABLE = "table"; public final static String MANDATORY_ENCODING = "mandatoryEncoding"; // 是数组配置 public final static String COLUMN = "column"; public final static String COLUMN_LIST = "columnList"; public final static String WHERE = "where"; public final static String HINT = "hint"; public final static String SPLIT_PK = "splitPk"; public final static String SPLIT_MODE = "splitMode"; public final static String SAMPLE_PERCENTAGE = "samplePercentage"; public final static String QUERY_SQL = "querySql"; public final static String SPLIT_PK_SQL = "splitPkSql"; public final static String PRE_SQL = "preSql"; public final static String POST_SQL = "postSql"; public final static String CHECK_SLAVE = "checkSlave"; public final static String SESSION = "session"; public final static String DBNAME = "dbName"; public final static String DRYRUN = "dryRun"; public static String SPLIT_FACTOR = "splitFactor"; public final static String WEAK_READ = "weakRead"; public final static String SAVE_POINT = "savePoint"; public final static String REUSE_CONN = "reuseConn"; public final static String PARTITION_NAME = "partitionName"; } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/ResultSetReadProxy.java ================================================ package com.alibaba.datax.plugin.rdbms.reader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.Types; public class ResultSetReadProxy { private static final Logger LOG = LoggerFactory .getLogger(ResultSetReadProxy.class); private static final boolean IS_DEBUG = LOG.isDebugEnabled(); private static final byte[] EMPTY_CHAR_ARRAY = new byte[0]; //TODO public static void transportOneRecord(RecordSender recordSender, ResultSet rs, ResultSetMetaData metaData, int columnNumber, String mandatoryEncoding, TaskPluginCollector taskPluginCollector) { Record record = recordSender.createRecord(); try { for (int i = 1; i <= columnNumber; i++) { switch (metaData.getColumnType(i)) { case Types.CHAR: case Types.NCHAR: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: String rawData; if(StringUtils.isBlank(mandatoryEncoding)){ rawData = rs.getString(i); }else{ rawData = new String((rs.getBytes(i) == null ? EMPTY_CHAR_ARRAY : rs.getBytes(i)), mandatoryEncoding); } record.addColumn(new StringColumn(rawData)); break; case Types.CLOB: case Types.NCLOB: record.addColumn(new StringColumn(rs.getString(i))); break; case Types.SMALLINT: case Types.TINYINT: case Types.INTEGER: case Types.BIGINT: record.addColumn(new LongColumn(rs.getString(i))); break; case Types.NUMERIC: case Types.DECIMAL: record.addColumn(new DoubleColumn(rs.getString(i))); break; case Types.FLOAT: case Types.REAL: case Types.DOUBLE: record.addColumn(new DoubleColumn(rs.getString(i))); break; case Types.TIME: record.addColumn(new DateColumn(rs.getTime(i))); break; // for mysql bug, see http://bugs.mysql.com/bug.php?id=35115 case Types.DATE: if (metaData.getColumnTypeName(i).equalsIgnoreCase("year")) { record.addColumn(new LongColumn(rs.getInt(i))); } else { record.addColumn(new DateColumn(rs.getDate(i))); } break; case Types.TIMESTAMP: record.addColumn(new DateColumn(rs.getTimestamp(i))); break; case Types.BINARY: case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: record.addColumn(new BytesColumn(rs.getBytes(i))); break; // warn: bit(1) -> Types.BIT 可使用BoolColumn // warn: bit(>1) -> Types.VARBINARY 可使用BytesColumn case Types.BOOLEAN: case Types.BIT: record.addColumn(new BoolColumn(rs.getBoolean(i))); break; case Types.NULL: String stringData = null; if(rs.getObject(i) != null) { stringData = rs.getObject(i).toString(); } record.addColumn(new StringColumn(stringData)); break; // TODO 添加BASIC_MESSAGE default: throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库读取这种字段类型. 字段名:[%s], 字段名称:[%s], 字段Java类型:[%s]. 请尝试使用数据库函数将其转换datax支持的类型 或者不同步该字段 .", metaData.getColumnName(i), metaData.getColumnType(i), metaData.getColumnClassName(i))); } } } catch (Exception e) { if (IS_DEBUG) { LOG.debug("read data " + record.toString() + " occur exception:", e); } //TODO 这里识别为脏数据靠谱吗? taskPluginCollector.collectDirtyRecord(record, e); if (e instanceof DataXException) { throw (DataXException) e; } } recordSender.sendToWriter(record); } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/HintUtil.java ================================================ package com.alibaba.datax.plugin.rdbms.reader.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * Created by liuyi on 15/9/18. */ public class HintUtil { private static final Logger LOG = LoggerFactory.getLogger(ReaderSplitUtil.class); private static DataBaseType dataBaseType; private static String username; private static String password; private static Pattern tablePattern; private static String hintExpression; public static void initHintConf(DataBaseType type, Configuration configuration){ dataBaseType = type; username = configuration.getString(Key.USERNAME); password = configuration.getString(Key.PASSWORD); String hint = configuration.getString(Key.HINT); if(StringUtils.isNotBlank(hint)){ String[] tablePatternAndHint = hint.split("#"); if(tablePatternAndHint.length==1){ tablePattern = Pattern.compile(".*"); hintExpression = tablePatternAndHint[0]; }else{ tablePattern = Pattern.compile(tablePatternAndHint[0]); hintExpression = tablePatternAndHint[1]; } } } public static String buildQueryColumn(String jdbcUrl, String table, String column){ try{ if(tablePattern != null && DataBaseType.Oracle.equals(dataBaseType)) { Matcher m = tablePattern.matcher(table); if(m.find()){ String[] tableStr = table.split("\\."); String tableWithoutSchema = tableStr[tableStr.length-1]; String finalHint = hintExpression.replaceAll(Constant.TABLE_NAME_PLACEHOLDER, tableWithoutSchema); //主库不并发读取 if(finalHint.indexOf("parallel") > 0 && DBUtil.isOracleMaster(jdbcUrl, username, password)){ LOG.info("master:{} will not use hint:{}", jdbcUrl, finalHint); }else{ LOG.info("table:{} use hint:{}.", table, finalHint); return finalHint + column; } } } } catch (Exception e){ LOG.warn("match hint exception, will not use hint", e); } return column; } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/ObVersion.java ================================================ package com.alibaba.datax.plugin.rdbms.reader.util; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * @author johnrobbet */ public class ObVersion implements Comparable { private static final Logger LOG = LoggerFactory.getLogger(ObVersion.class); private int majorVersion; private int minorVersion; private int releaseNumber; private int patchNumber; public static final ObVersion V2276 = valueOf("2.2.76"); public static final ObVersion V2252 = valueOf("2.2.52"); public static final ObVersion V3 = valueOf("3.0.0.0"); public static final ObVersion V4000 = valueOf("4.0.0.0"); private static final ObVersion DEFAULT_VERSION = valueOf(System.getProperty("defaultObVersion","3.2.3.0")); private static final int VERSION_PART_COUNT = 4; public ObVersion(String version) { try { String[] versionParts = version.split("\\."); majorVersion = Integer.valueOf(versionParts[0]); minorVersion = Integer.valueOf(versionParts[1]); releaseNumber = Integer.valueOf(versionParts[2]); int tempPatchNum = 0; if (versionParts.length == VERSION_PART_COUNT) { try { tempPatchNum = Integer.valueOf(versionParts[3]); } catch (Exception e) { LOG.warn("fail to parse ob version: " + e.getMessage()); } } patchNumber = tempPatchNum; } catch (Exception ex) { LOG.warn("fail to get ob version, using default {} {}", DEFAULT_VERSION, ex.getMessage()); majorVersion = DEFAULT_VERSION.majorVersion; minorVersion = DEFAULT_VERSION.minorVersion; releaseNumber = DEFAULT_VERSION.releaseNumber; patchNumber = DEFAULT_VERSION.patchNumber; } } public static ObVersion valueOf(String version) { return new ObVersion(version); } @Override public int compareTo(ObVersion o) { if (this.majorVersion > o.majorVersion) { return 1; } else if (this.majorVersion < o.majorVersion) { return -1; } if (this.minorVersion > o.minorVersion) { return 1; } else if (this.minorVersion < o.minorVersion) { return -1; } if (this.releaseNumber > o.releaseNumber) { return 1; } else if (this.releaseNumber < o.releaseNumber) { return -1; } if (this.patchNumber > o.patchNumber) { return 1; } else if (this.patchNumber < o.patchNumber) { return -1; } return 0; } @Override public String toString() { return String.format("%d.%d.%d.%d", majorVersion, minorVersion, releaseNumber, patchNumber); } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/OriginalConfPretreatmentUtil.java ================================================ package com.alibaba.datax.plugin.rdbms.reader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.ListUtil; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.TableExpandUtil; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public final class OriginalConfPretreatmentUtil { private static final Logger LOG = LoggerFactory .getLogger(OriginalConfPretreatmentUtil.class); public static DataBaseType DATABASE_TYPE; public static void doPretreatment(Configuration originalConfig) { // 检查 username/password 配置(必填) originalConfig.getNecessaryValue(Key.USERNAME, DBUtilErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.PASSWORD, DBUtilErrorCode.REQUIRED_VALUE); dealWhere(originalConfig); simplifyConf(originalConfig); } public static void dealWhere(Configuration originalConfig) { String where = originalConfig.getString(Key.WHERE, null); if(StringUtils.isNotBlank(where)) { String whereImprove = where.trim(); if(whereImprove.endsWith(";") || whereImprove.endsWith(";")) { whereImprove = whereImprove.substring(0,whereImprove.length()-1); } originalConfig.set(Key.WHERE, whereImprove); } } /** * 对配置进行初步处理: *
      *
    1. 处理同一个数据库配置了多个jdbcUrl的情况
    2. *
    3. 识别并标记是采用querySql 模式还是 table 模式
    4. *
    5. 对 table 模式,确定分表个数,并处理 column 转 *事项
    6. *
    */ private static void simplifyConf(Configuration originalConfig) { boolean isTableMode = recognizeTableOrQuerySqlMode(originalConfig); originalConfig.set(Constant.IS_TABLE_MODE, isTableMode); dealJdbcAndTable(originalConfig); dealColumnConf(originalConfig); } private static void dealJdbcAndTable(Configuration originalConfig) { String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); boolean checkSlave = originalConfig.getBool(Key.CHECK_SLAVE, false); boolean isTableMode = originalConfig.getBool(Constant.IS_TABLE_MODE); boolean isPreCheck = originalConfig.getBool(Key.DRYRUN,false); List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); List preSql = originalConfig.getList(Key.PRE_SQL, String.class); int tableNum = 0; for (int i = 0, len = conns.size(); i < len; i++) { Configuration connConf = Configuration .from(conns.get(i).toString()); connConf.getNecessaryValue(Key.JDBC_URL, DBUtilErrorCode.REQUIRED_VALUE); List jdbcUrls = connConf .getList(Key.JDBC_URL, String.class); String jdbcUrl; if (isPreCheck) { jdbcUrl = DBUtil.chooseJdbcUrlWithoutRetry(DATABASE_TYPE, jdbcUrls, username, password, preSql, checkSlave); } else { jdbcUrl = DBUtil.chooseJdbcUrl(DATABASE_TYPE, jdbcUrls, username, password, preSql, checkSlave); } jdbcUrl = DATABASE_TYPE.appendJDBCSuffixForReader(jdbcUrl); // 回写到connection[i].jdbcUrl originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.JDBC_URL), jdbcUrl); LOG.info("Available jdbcUrl:{}.",jdbcUrl); if (isTableMode) { // table 方式 // 对每一个connection 上配置的table 项进行解析(已对表名称进行了 ` 处理的) List tables = connConf.getList(Key.TABLE, String.class); List expandedTables = TableExpandUtil.expandTableConf( DATABASE_TYPE, tables); if (null == expandedTables || expandedTables.isEmpty()) { throw DataXException.asDataXException( DBUtilErrorCode.ILLEGAL_VALUE, String.format("您所配置的读取数据库表:%s 不正确. 因为DataX根据您的配置找不到这张表. 请检查您的配置并作出修改." + "请先了解 DataX 配置.", StringUtils.join(tables, ","))); } tableNum += expandedTables.size(); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.TABLE), expandedTables); } else { // 说明是配置的 querySql 方式,不做处理. } } originalConfig.set(Constant.TABLE_NUMBER_MARK, tableNum); } private static void dealColumnConf(Configuration originalConfig) { boolean isTableMode = originalConfig.getBool(Constant.IS_TABLE_MODE); List userConfiguredColumns = originalConfig.getList(Key.COLUMN, String.class); if (isTableMode) { if (null == userConfiguredColumns || userConfiguredColumns.isEmpty()) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, "您未配置读取数据库表的列信息. " + "正确的配置方式是给 column 配置上您需要读取的列名称,用英文逗号分隔. 例如: \"column\": [\"id\", \"name\"],请参考上述配置并作出修改."); } else { String splitPk = originalConfig.getString(Key.SPLIT_PK, null); if (1 == userConfiguredColumns.size() && "*".equals(userConfiguredColumns.get(0))) { LOG.warn("您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改."); // 回填其值,需要以 String 的方式转交后续处理 originalConfig.set(Key.COLUMN, "*"); } else { String jdbcUrl = originalConfig.getString(String.format( "%s[0].%s", Constant.CONN_MARK, Key.JDBC_URL)); String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); String tableName = originalConfig.getString(String.format( "%s[0].%s[0]", Constant.CONN_MARK, Key.TABLE)); List allColumns = DBUtil.getTableColumns( DATABASE_TYPE, jdbcUrl, username, password, tableName); LOG.info("table:[{}] has columns:[{}].", tableName, StringUtils.join(allColumns, ",")); // warn:注意mysql表名区分大小写 allColumns = ListUtil.valueToLowerCase(allColumns); List quotedColumns = new ArrayList(); for (String column : userConfiguredColumns) { if ("*".equals(column)) { throw DataXException.asDataXException( DBUtilErrorCode.ILLEGAL_VALUE, "您的配置文件中的列配置信息有误. 因为根据您的配置,数据库表的列中存在多个*. 请检查您的配置并作出修改. "); } quotedColumns.add(column); //以下判断没有任何意义 // if (null == column) { // quotedColumns.add(null); // } else { // if (allColumns.contains(column.toLowerCase())) { // quotedColumns.add(column); // } else { // // 可能是由于用户填写为函数,或者自己对字段进行了`处理或者常量 // quotedColumns.add(column); // } // } } originalConfig.set(Key.COLUMN_LIST, quotedColumns); originalConfig.set(Key.COLUMN, StringUtils.join(quotedColumns, ",")); if (StringUtils.isNotBlank(splitPk)) { if (!allColumns.contains(splitPk.toLowerCase())) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, String.format("您的配置文件中的列配置信息有误. 因为根据您的配置,您读取的数据库表:%s 中没有主键名为:%s. 请检查您的配置并作出修改.", tableName, splitPk)); } } } } } else { // querySql模式,不希望配制 column,那样是混淆不清晰的 if (null != userConfiguredColumns && userConfiguredColumns.size() > 0) { LOG.warn("您的配置有误. 由于您读取数据库表采用了querySql的方式, 所以您不需要再配置 column. 如果您不想看到这条提醒,请移除您源头表中配置中的 column."); originalConfig.remove(Key.COLUMN); } // querySql模式,不希望配制 where,那样是混淆不清晰的 String where = originalConfig.getString(Key.WHERE, null); if (StringUtils.isNotBlank(where)) { LOG.warn("您的配置有误. 由于您读取数据库表采用了querySql的方式, 所以您不需要再配置 where. 如果您不想看到这条提醒,请移除您源头表中配置中的 where."); originalConfig.remove(Key.WHERE); } // querySql模式,不希望配制 splitPk,那样是混淆不清晰的 String splitPk = originalConfig.getString(Key.SPLIT_PK, null); if (StringUtils.isNotBlank(splitPk)) { LOG.warn("您的配置有误. 由于您读取数据库表采用了querySql的方式, 所以您不需要再配置 splitPk. 如果您不想看到这条提醒,请移除您源头表中配置中的 splitPk."); originalConfig.remove(Key.SPLIT_PK); } } } private static boolean recognizeTableOrQuerySqlMode( Configuration originalConfig) { List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); List tableModeFlags = new ArrayList(); List querySqlModeFlags = new ArrayList(); String table = null; String querySql = null; boolean isTableMode = false; boolean isQuerySqlMode = false; for (int i = 0, len = conns.size(); i < len; i++) { Configuration connConf = Configuration .from(conns.get(i).toString()); table = connConf.getString(Key.TABLE, null); querySql = connConf.getString(Key.QUERY_SQL, null); isTableMode = StringUtils.isNotBlank(table); tableModeFlags.add(isTableMode); isQuerySqlMode = StringUtils.isNotBlank(querySql); querySqlModeFlags.add(isQuerySqlMode); if (false == isTableMode && false == isQuerySqlMode) { // table 和 querySql 二者均未配制 throw DataXException.asDataXException( DBUtilErrorCode.TABLE_QUERYSQL_MISSING, "您的配置有误. 因为table和querySql应该配置并且只能配置一个. 请检查您的配置并作出修改."); } else if (true == isTableMode && true == isQuerySqlMode) { // table 和 querySql 二者均配置 throw DataXException.asDataXException(DBUtilErrorCode.TABLE_QUERYSQL_MIXED, "您的配置凌乱了. 因为datax不能同时既配置table又配置querySql.请检查您的配置并作出修改."); } } // 混合配制 table 和 querySql if (!ListUtil.checkIfValueSame(tableModeFlags) || !ListUtil.checkIfValueSame(querySqlModeFlags)) { throw DataXException.asDataXException(DBUtilErrorCode.TABLE_QUERYSQL_MIXED, "您配置凌乱了. 不能同时既配置table又配置querySql. 请检查您的配置并作出修改."); } return tableModeFlags.get(0); } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/PreCheckTask.java ================================================ package com.alibaba.datax.plugin.rdbms.reader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.druid.sql.parser.ParserException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.ResultSet; import java.util.List; import java.util.concurrent.Callable; /** * Created by judy.lt on 2015/6/4. */ public class PreCheckTask implements Callable{ private static final Logger LOG = LoggerFactory.getLogger(PreCheckTask.class); private String userName; private String password; private String splitPkId; private Configuration connection; private DataBaseType dataBaseType; public PreCheckTask(String userName, String password, Configuration connection, DataBaseType dataBaseType, String splitPkId){ this.connection = connection; this.userName=userName; this.password=password; this.dataBaseType = dataBaseType; this.splitPkId = splitPkId; } @Override public Boolean call() throws DataXException { String jdbcUrl = this.connection.getString(Key.JDBC_URL); List querySqls = this.connection.getList(Key.QUERY_SQL, Object.class); List splitPkSqls = this.connection.getList(Key.SPLIT_PK_SQL, Object.class); List tables = this.connection.getList(Key.TABLE,Object.class); Connection conn = DBUtil.getConnectionWithoutRetry(this.dataBaseType, jdbcUrl, this.userName, password); int fetchSize = 1; if(DataBaseType.MySql.equals(dataBaseType) || DataBaseType.DRDS.equals(dataBaseType)) { fetchSize = Integer.MIN_VALUE; } try{ for (int i=0;i doSplit( Configuration originalSliceConfig, int adviceNumber) { boolean isTableMode = originalSliceConfig.getBool(Constant.IS_TABLE_MODE).booleanValue(); int eachTableShouldSplittedNumber = -1; if (isTableMode) { // adviceNumber这里是channel数量大小, 即datax并发task数量 // eachTableShouldSplittedNumber是单表应该切分的份数, 向上取整可能和adviceNumber没有比例关系了已经 eachTableShouldSplittedNumber = calculateEachTableShouldSplittedNumber( adviceNumber, originalSliceConfig.getInt(Constant.TABLE_NUMBER_MARK)); } String column = originalSliceConfig.getString(Key.COLUMN); String where = originalSliceConfig.getString(Key.WHERE, null); List conns = originalSliceConfig.getList(Constant.CONN_MARK, Object.class); List splittedConfigs = new ArrayList(); for (int i = 0, len = conns.size(); i < len; i++) { Configuration sliceConfig = originalSliceConfig.clone(); Configuration connConf = Configuration.from(conns.get(i).toString()); String jdbcUrl = connConf.getString(Key.JDBC_URL); sliceConfig.set(Key.JDBC_URL, jdbcUrl); // 抽取 jdbcUrl 中的 ip/port 进行资源使用的打标,以提供给 core 做有意义的 shuffle 操作 sliceConfig.set(CommonConstant.LOAD_BALANCE_RESOURCE_MARK, DataBaseType.parseIpFromJdbcUrl(jdbcUrl)); sliceConfig.remove(Constant.CONN_MARK); Configuration tempSlice; // 说明是配置的 table 方式 if (isTableMode) { // 已在之前进行了扩展和`处理,可以直接使用 List tables = connConf.getList(Key.TABLE, String.class); Validate.isTrue(null != tables && !tables.isEmpty(), "您读取数据库表配置错误."); String splitPk = originalSliceConfig.getString(Key.SPLIT_PK, null); //最终切分份数不一定等于 eachTableShouldSplittedNumber boolean needSplitTable = eachTableShouldSplittedNumber > 1 && StringUtils.isNotBlank(splitPk); if (needSplitTable) { if (tables.size() == 1) { //原来:如果是单表的,主键切分num=num*2+1 // splitPk is null这类的情况的数据量本身就比真实数据量少很多, 和channel大小比率关系时,不建议考虑 //eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * 2 + 1;// 不应该加1导致长尾 //考虑其他比率数字?(splitPk is null, 忽略此长尾) //eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * 5; //为避免导入hive小文件 默认基数为5,可以通过 splitFactor 配置基数 // 最终task数为(channel/tableNum)向上取整*splitFactor Integer splitFactor = originalSliceConfig.getInt(Key.SPLIT_FACTOR, Constant.SPLIT_FACTOR); eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * splitFactor; } // 尝试对每个表,切分为eachTableShouldSplittedNumber 份 for (String table : tables) { tempSlice = sliceConfig.clone(); tempSlice.set(Key.TABLE, table); List splittedSlices = SingleTableSplitUtil .splitSingleTable(tempSlice, eachTableShouldSplittedNumber); splittedConfigs.addAll(splittedSlices); } } else { for (String table : tables) { tempSlice = sliceConfig.clone(); tempSlice.set(Key.TABLE, table); String queryColumn = HintUtil.buildQueryColumn(jdbcUrl, table, column); tempSlice.set(Key.QUERY_SQL, SingleTableSplitUtil.buildQuerySql(queryColumn, table, where)); splittedConfigs.add(tempSlice); } } } else { // 说明是配置的 querySql 方式 List sqls = connConf.getList(Key.QUERY_SQL, String.class); // TODO 是否check 配置为多条语句?? for (String querySql : sqls) { tempSlice = sliceConfig.clone(); tempSlice.set(Key.QUERY_SQL, querySql); splittedConfigs.add(tempSlice); } } } return splittedConfigs; } public static Configuration doPreCheckSplit(Configuration originalSliceConfig) { Configuration queryConfig = originalSliceConfig.clone(); boolean isTableMode = originalSliceConfig.getBool(Constant.IS_TABLE_MODE).booleanValue(); String splitPK = originalSliceConfig.getString(Key.SPLIT_PK); String column = originalSliceConfig.getString(Key.COLUMN); String where = originalSliceConfig.getString(Key.WHERE, null); List conns = queryConfig.getList(Constant.CONN_MARK, Object.class); for (int i = 0, len = conns.size(); i < len; i++){ Configuration connConf = Configuration.from(conns.get(i).toString()); List querys = new ArrayList(); List splitPkQuerys = new ArrayList(); String connPath = String.format("connection[%d]",i); // 说明是配置的 table 方式 if (isTableMode) { // 已在之前进行了扩展和`处理,可以直接使用 List tables = connConf.getList(Key.TABLE, String.class); Validate.isTrue(null != tables && !tables.isEmpty(), "您读取数据库表配置错误."); for (String table : tables) { querys.add(SingleTableSplitUtil.buildQuerySql(column,table,where)); if (splitPK != null && !splitPK.isEmpty()){ splitPkQuerys.add(SingleTableSplitUtil.genPKSql(splitPK.trim(),table,where)); } } if (!splitPkQuerys.isEmpty()){ connConf.set(Key.SPLIT_PK_SQL,splitPkQuerys); } connConf.set(Key.QUERY_SQL,querys); queryConfig.set(connPath,connConf); } else { // 说明是配置的 querySql 方式 List sqls = connConf.getList(Key.QUERY_SQL, String.class); for (String querySql : sqls) { querys.add(querySql); } connConf.set(Key.QUERY_SQL,querys); queryConfig.set(connPath,connConf); } } return queryConfig; } private static int calculateEachTableShouldSplittedNumber(int adviceNumber, int tableNumber) { double tempNum = 1.0 * adviceNumber / tableNumber; return (int) Math.ceil(tempNum); } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/reader/util/SingleTableSplitUtil.java ================================================ package com.alibaba.datax.plugin.rdbms.reader.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.*; import com.alibaba.fastjson2.JSON; import java.text.MessageFormat; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.ImmutablePair; import org.apache.commons.lang3.tuple.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.math.BigInteger; import java.sql.Connection; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.Types; import java.util.ArrayList; import java.util.List; import static org.apache.commons.lang3.StringUtils.EMPTY; public class SingleTableSplitUtil { private static final Logger LOG = LoggerFactory .getLogger(SingleTableSplitUtil.class); public static DataBaseType DATABASE_TYPE; private SingleTableSplitUtil() { } public static List splitSingleTable( Configuration configuration, int adviceNum) { List pluginParams = new ArrayList(); List rangeList; String splitPkName = configuration.getString(Key.SPLIT_PK); String column = configuration.getString(Key.COLUMN); String table = configuration.getString(Key.TABLE); String where = configuration.getString(Key.WHERE, null); boolean hasWhere = StringUtils.isNotBlank(where); //String splitMode = configuration.getString(Key.SPLIT_MODE, ""); //if (Constant.SPLIT_MODE_RANDOMSAMPLE.equals(splitMode) && DATABASE_TYPE == DataBaseType.Oracle) { if (DATABASE_TYPE == DataBaseType.Oracle) { rangeList = genSplitSqlForOracle(splitPkName, table, where, configuration, adviceNum); // warn: mysql etc to be added... } else { Pair minMaxPK = getPkRange(configuration); if (null == minMaxPK) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "根据切分主键切分表失败. DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系 DBA 进行处理."); } configuration.set(Key.QUERY_SQL, buildQuerySql(column, table, where)); if (null == minMaxPK.getLeft() || null == minMaxPK.getRight()) { // 切分后获取到的start/end 有 Null 的情况 pluginParams.add(configuration); return pluginParams; } boolean isStringType = Constant.PK_TYPE_STRING.equals(configuration .getString(Constant.PK_TYPE)); boolean isLongType = Constant.PK_TYPE_LONG.equals(configuration .getString(Constant.PK_TYPE)); if (isStringType) { rangeList = RdbmsRangeSplitWrap.splitAndWrap( String.valueOf(minMaxPK.getLeft()), String.valueOf(minMaxPK.getRight()), adviceNum, splitPkName, "'", DATABASE_TYPE); } else if (isLongType) { rangeList = RdbmsRangeSplitWrap.splitAndWrap( new BigInteger(minMaxPK.getLeft().toString()), new BigInteger(minMaxPK.getRight().toString()), adviceNum, splitPkName); } else { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "您配置的切分主键(splitPk) 类型 DataX 不支持. DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系 DBA 进行处理."); } } String tempQuerySql; List allQuerySql = new ArrayList(); if (null != rangeList && !rangeList.isEmpty()) { for (String range : rangeList) { Configuration tempConfig = configuration.clone(); tempQuerySql = buildQuerySql(column, table, where) + (hasWhere ? " and " : " where ") + range; allQuerySql.add(tempQuerySql); tempConfig.set(Key.QUERY_SQL, tempQuerySql); tempConfig.set(Key.WHERE, (hasWhere ? ("(" + where + ") and") : "") + range); pluginParams.add(tempConfig); } } else { //pluginParams.add(configuration); // this is wrong for new & old split Configuration tempConfig = configuration.clone(); tempQuerySql = buildQuerySql(column, table, where) + (hasWhere ? " and " : " where ") + String.format(" %s IS NOT NULL", splitPkName); allQuerySql.add(tempQuerySql); tempConfig.set(Key.QUERY_SQL, tempQuerySql); tempConfig.set(Key.WHERE, (hasWhere ? "(" + where + ") and" : "") + String.format(" %s IS NOT NULL", splitPkName)); pluginParams.add(tempConfig); } // deal pk is null Configuration tempConfig = configuration.clone(); tempQuerySql = buildQuerySql(column, table, where) + (hasWhere ? " and " : " where ") + String.format(" %s IS NULL", splitPkName); allQuerySql.add(tempQuerySql); LOG.info("After split(), allQuerySql=[\n{}\n].", StringUtils.join(allQuerySql, "\n")); tempConfig.set(Key.QUERY_SQL, tempQuerySql); tempConfig.set(Key.WHERE, (hasWhere ? "(" + where + ") and" : "") + String.format(" %s IS NULL", splitPkName)); pluginParams.add(tempConfig); return pluginParams; } public static String buildQuerySql(String column, String table, String where) { String querySql; if (StringUtils.isBlank(where)) { querySql = String.format(Constant.QUERY_SQL_TEMPLATE_WITHOUT_WHERE, column, table); } else { querySql = String.format(Constant.QUERY_SQL_TEMPLATE, column, table, where); } return querySql; } @SuppressWarnings("resource") private static Pair getPkRange(Configuration configuration) { String pkRangeSQL = genPKRangeSQL(configuration); int fetchSize = configuration.getInt(Constant.FETCH_SIZE); String jdbcURL = configuration.getString(Key.JDBC_URL); String username = configuration.getString(Key.USERNAME); String password = configuration.getString(Key.PASSWORD); String table = configuration.getString(Key.TABLE); Connection conn = DBUtil.getConnection(DATABASE_TYPE, jdbcURL, username, password); Pair minMaxPK = checkSplitPk(conn, pkRangeSQL, fetchSize, table, username, configuration); DBUtil.closeDBResources(null, null, conn); return minMaxPK; } public static void precheckSplitPk(Connection conn, String pkRangeSQL, int fetchSize, String table, String username) { Pair minMaxPK = checkSplitPk(conn, pkRangeSQL, fetchSize, table, username, null); if (null == minMaxPK) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "根据切分主键切分表失败. DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系 DBA 进行处理."); } } /** * 检测splitPk的配置是否正确。 * configuration为null, 是precheck的逻辑,不需要回写PK_TYPE到configuration中 * */ private static Pair checkSplitPk(Connection conn, String pkRangeSQL, int fetchSize, String table, String username, Configuration configuration) { LOG.info("split pk [sql={}] is running... ", pkRangeSQL); ResultSet rs = null; Pair minMaxPK = null; try { try { rs = DBUtil.query(conn, pkRangeSQL, fetchSize); }catch (Exception e) { throw RdbmsException.asQueryException(DATABASE_TYPE, e, pkRangeSQL,table,username); } ResultSetMetaData rsMetaData = rs.getMetaData(); if (isPKTypeValid(rsMetaData)) { if (isStringType(rsMetaData.getColumnType(1))) { if(configuration != null) { configuration .set(Constant.PK_TYPE, Constant.PK_TYPE_STRING); } while (DBUtil.asyncResultSetNext(rs)) { minMaxPK = new ImmutablePair( rs.getString(1), rs.getString(2)); } } else if (isLongType(rsMetaData.getColumnType(1))) { if(configuration != null) { configuration.set(Constant.PK_TYPE, Constant.PK_TYPE_LONG); } while (DBUtil.asyncResultSetNext(rs)) { minMaxPK = new ImmutablePair( rs.getString(1), rs.getString(2)); // check: string shouldn't contain '.', for oracle String minMax = rs.getString(1) + rs.getString(2); if (StringUtils.contains(minMax, '.')) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "您配置的DataX切分主键(splitPk)有误. 因为您配置的切分主键(splitPk) 类型 DataX 不支持. DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系 DBA 进行处理."); } } } else { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "您配置的DataX切分主键(splitPk)有误. 因为您配置的切分主键(splitPk) 类型 DataX 不支持. DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系 DBA 进行处理."); } } else { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "您配置的DataX切分主键(splitPk)有误. 因为您配置的切分主键(splitPk) 类型 DataX 不支持. DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系 DBA 进行处理."); } } catch(DataXException e) { throw e; } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "DataX尝试切分表发生错误. 请检查您的配置并作出修改.", e); } finally { DBUtil.closeDBResources(rs, null, null); } return minMaxPK; } private static boolean isPKTypeValid(ResultSetMetaData rsMetaData) { boolean ret = false; try { int minType = rsMetaData.getColumnType(1); int maxType = rsMetaData.getColumnType(2); boolean isNumberType = isLongType(minType); boolean isStringType = isStringType(minType); if (minType == maxType && (isNumberType || isStringType)) { ret = true; } } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_SPLIT_PK, "DataX获取切分主键(splitPk)字段类型失败. 该错误通常是系统底层异常导致. 请联系旺旺:askdatax或者DBA处理."); } return ret; } // warn: Types.NUMERIC is used for oracle! because oracle use NUMBER to // store INT, SMALLINT, INTEGER etc, and only oracle need to concern // Types.NUMERIC private static boolean isLongType(int type) { boolean isValidLongType = type == Types.BIGINT || type == Types.INTEGER || type == Types.SMALLINT || type == Types.TINYINT; switch (SingleTableSplitUtil.DATABASE_TYPE) { case Oracle: case OceanBase: isValidLongType |= type == Types.NUMERIC; break; default: break; } return isValidLongType; } private static boolean isStringType(int type) { return type == Types.CHAR || type == Types.NCHAR || type == Types.VARCHAR || type == Types.LONGVARCHAR || type == Types.NVARCHAR; } private static String genPKRangeSQL(Configuration configuration) { String splitPK = configuration.getString(Key.SPLIT_PK).trim(); String table = configuration.getString(Key.TABLE).trim(); String where = configuration.getString(Key.WHERE, null); String obMode = configuration.getString("obCompatibilityMode"); // OceanBase对SELECT MIN(%s),MAX(%s) FROM %s这条sql没有做查询改写,会进行全表扫描,在数据量的时候查询耗时很大甚至超时; // 所以对于OceanBase数据库,查询模板需要改写为分别查询最大值和最小值。这样可以提升查询数量级的性能。 if (DATABASE_TYPE == DataBaseType.OceanBase && StringUtils.isNotEmpty(obMode)) { boolean isOracleMode = "ORACLE".equalsIgnoreCase(obMode); String minMaxTemplate = isOracleMode ? "select v2.id as min_a, v1.id as max_a from (" + "select * from (select %s as id from %s {0} order by id desc) where rownum =1 ) v1," + "(select * from (select %s as id from %s order by id asc) where rownum =1 ) v2;" : "select v2.id as min_a, v1.id as max_a from (select %s as id from %s {0} order by id desc limit 1) v1," + "(select %s as id from %s order by id asc limit 1) v2;"; String pkRangeSQL = String.format(minMaxTemplate, splitPK, table, splitPK, table); String whereString = StringUtils.isNotBlank(where) ? String.format("WHERE (%s AND %s IS NOT NULL)", where, splitPK) : EMPTY; pkRangeSQL = MessageFormat.format(pkRangeSQL, whereString); return pkRangeSQL; } return genPKSql(splitPK, table, where); } public static String genPKSql(String splitPK, String table, String where){ String minMaxTemplate = "SELECT MIN(%s),MAX(%s) FROM %s"; String pkRangeSQL = String.format(minMaxTemplate, splitPK, splitPK, table); if (StringUtils.isNotBlank(where)) { pkRangeSQL = String.format("%s WHERE (%s AND %s IS NOT NULL)", pkRangeSQL, where, splitPK); } return pkRangeSQL; } /** * support Number and String split * */ public static List genSplitSqlForOracle(String splitPK, String table, String where, Configuration configuration, int adviceNum) { if (adviceNum < 1) { throw new IllegalArgumentException(String.format( "切分份数不能小于1. 此处:adviceNum=[%s].", adviceNum)); } else if (adviceNum == 1) { return null; } String whereSql = String.format("%s IS NOT NULL", splitPK); if (StringUtils.isNotBlank(where)) { whereSql = String.format(" WHERE (%s) AND (%s) ", whereSql, where); } else { whereSql = String.format(" WHERE (%s) ", whereSql); } Double percentage = configuration.getDouble(Key.SAMPLE_PERCENTAGE, 0.1); String sampleSqlTemplate = "SELECT * FROM ( SELECT %s FROM %s SAMPLE (%s) %s ORDER BY DBMS_RANDOM.VALUE) WHERE ROWNUM <= %s ORDER by %s ASC"; String splitSql = String.format(sampleSqlTemplate, splitPK, table, percentage, whereSql, adviceNum, splitPK); int fetchSize = configuration.getInt(Constant.FETCH_SIZE, 32); String jdbcURL = configuration.getString(Key.JDBC_URL); String username = configuration.getString(Key.USERNAME); String password = configuration.getString(Key.PASSWORD); Connection conn = DBUtil.getConnection(DATABASE_TYPE, jdbcURL, username, password); LOG.info("split pk [sql={}] is running... ", splitSql); ResultSet rs = null; List> splitedRange = new ArrayList>(); try { try { rs = DBUtil.query(conn, splitSql, fetchSize); } catch (Exception e) { throw RdbmsException.asQueryException(DATABASE_TYPE, e, splitSql, table, username); } if (configuration != null) { configuration .set(Constant.PK_TYPE, Constant.PK_TYPE_MONTECARLO); } ResultSetMetaData rsMetaData = rs.getMetaData(); while (DBUtil.asyncResultSetNext(rs)) { ImmutablePair eachPoint = new ImmutablePair( rs.getObject(1), rsMetaData.getColumnType(1)); splitedRange.add(eachPoint); } } catch (DataXException e) { throw e; } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.ILLEGAL_SPLIT_PK, "DataX尝试切分表发生错误. 请检查您的配置并作出修改.", e); } finally { DBUtil.closeDBResources(rs, null, null); } LOG.debug(JSON.toJSONString(splitedRange)); List rangeSql = new ArrayList(); int splitedRangeSize = splitedRange.size(); // warn: splitedRangeSize may be 0 or 1,切分规则为IS NULL以及 IS NOT NULL // demo: Parameter rangeResult can not be null and its length can not <2. detail:rangeResult=[24999930]. if (splitedRangeSize >= 2) { // warn: oracle Number is long type here if (isLongType(splitedRange.get(0).getRight())) { BigInteger[] integerPoints = new BigInteger[splitedRange.size()]; for (int i = 0; i < splitedRangeSize; i++) { integerPoints[i] = new BigInteger(splitedRange.get(i) .getLeft().toString()); } rangeSql.addAll(RdbmsRangeSplitWrap.wrapRange(integerPoints, splitPK)); // its ok if splitedRangeSize is 1 rangeSql.add(RdbmsRangeSplitWrap.wrapFirstLastPoint( integerPoints[0], integerPoints[splitedRangeSize - 1], splitPK)); } else if (isStringType(splitedRange.get(0).getRight())) { // warn: treated as string type String[] stringPoints = new String[splitedRange.size()]; for (int i = 0; i < splitedRangeSize; i++) { stringPoints[i] = new String(splitedRange.get(i).getLeft() .toString()); } rangeSql.addAll(RdbmsRangeSplitWrap.wrapRange(stringPoints, splitPK, "'", DATABASE_TYPE)); // its ok if splitedRangeSize is 1 rangeSql.add(RdbmsRangeSplitWrap.wrapFirstLastPoint( stringPoints[0], stringPoints[splitedRangeSize - 1], splitPK, "'", DATABASE_TYPE)); } else { throw DataXException .asDataXException( DBUtilErrorCode.ILLEGAL_SPLIT_PK, "您配置的DataX切分主键(splitPk)有误. 因为您配置的切分主键(splitPk) 类型 DataX 不支持. DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型. 请尝试使用其他的切分主键或者联系 DBA 进行处理."); } } return rangeSql; } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/ConnectionFactory.java ================================================ package com.alibaba.datax.plugin.rdbms.util; import java.sql.Connection; /** * Date: 15/3/16 下午2:17 */ public interface ConnectionFactory { public Connection getConnecttion(); public Connection getConnecttionWithoutRetry(); public String getConnectionInfo(); } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/Constant.java ================================================ package com.alibaba.datax.plugin.rdbms.util; public final class Constant { static final int TIMEOUT_SECONDS = 15; static final int MAX_TRY_TIMES = 4; static final int SOCKET_TIMEOUT_INSECOND = 172800; public static final String MYSQL_DATABASE = "Unknown database"; public static final String MYSQL_CONNEXP = "Communications link failure"; public static final String MYSQL_ACCDENIED = "Access denied"; public static final String MYSQL_TABLE_NAME_ERR1 = "Table"; public static final String MYSQL_TABLE_NAME_ERR2 = "doesn't exist"; public static final String MYSQL_SELECT_PRI = "SELECT command denied to user"; public static final String MYSQL_COLUMN1 = "Unknown column"; public static final String MYSQL_COLUMN2 = "field list"; public static final String MYSQL_WHERE = "where clause"; public static final String ORACLE_DATABASE = "ORA-12505"; public static final String ORACLE_CONNEXP = "The Network Adapter could not establish the connection"; public static final String ORACLE_ACCDENIED = "ORA-01017"; public static final String ORACLE_TABLE_NAME = "table or view does not exist"; public static final String ORACLE_SELECT_PRI = "insufficient privileges"; public static final String ORACLE_SQL = "invalid identifier"; } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtil.java ================================================ package com.alibaba.datax.plugin.rdbms.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.druid.sql.parser.SQLParserUtils; import com.alibaba.druid.sql.parser.SQLStatementParser; import com.google.common.util.concurrent.ThreadFactoryBuilder; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.ImmutableTriple; import org.apache.commons.lang3.tuple.Triple; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.File; import java.sql.*; import java.util.*; import java.util.concurrent.*; public final class DBUtil { private static final Logger LOG = LoggerFactory.getLogger(DBUtil.class); private static final ThreadLocal rsExecutors = new ThreadLocal() { @Override protected ExecutorService initialValue() { return Executors.newFixedThreadPool(1, new ThreadFactoryBuilder() .setNameFormat("rsExecutors-%d") .setDaemon(true) .build()); } }; private DBUtil() { } public static String chooseJdbcUrl(final DataBaseType dataBaseType, final List jdbcUrls, final String username, final String password, final List preSql, final boolean checkSlave) { if (null == jdbcUrls || jdbcUrls.isEmpty()) { throw DataXException.asDataXException( DBUtilErrorCode.CONF_ERROR, String.format("您的jdbcUrl的配置信息有错, 因为jdbcUrl[%s]不能为空. 请检查您的配置并作出修改.", StringUtils.join(jdbcUrls, ","))); } try { return RetryUtil.executeWithRetry(new Callable() { @Override public String call() throws Exception { boolean connOK = false; for (String url : jdbcUrls) { if (StringUtils.isNotBlank(url)) { url = url.trim(); if (null != preSql && !preSql.isEmpty()) { connOK = testConnWithoutRetry(dataBaseType, url, username, password, preSql); } else { connOK = testConnWithoutRetry(dataBaseType, url, username, password, checkSlave); } if (connOK) { return url; } } } throw new Exception("DataX无法连接对应的数据库,可能原因是:1) 配置的ip/port/database/jdbc错误,无法连接。2) 配置的username/password错误,鉴权失败。请和DBA确认该数据库的连接信息是否正确。"); // throw new Exception(DBUtilErrorCode.JDBC_NULL.toString()); } }, 7, 1000L, true); //warn: 7 means 2 minutes } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.CONN_DB_ERROR, String.format("数据库连接失败. 因为根据您配置的连接信息,无法从:%s 中找到可连接的jdbcUrl. 请检查您的配置并作出修改.", StringUtils.join(jdbcUrls, ",")), e); } } public static String chooseJdbcUrlWithoutRetry(final DataBaseType dataBaseType, final List jdbcUrls, final String username, final String password, final List preSql, final boolean checkSlave) throws DataXException { if (null == jdbcUrls || jdbcUrls.isEmpty()) { throw DataXException.asDataXException( DBUtilErrorCode.CONF_ERROR, String.format("您的jdbcUrl的配置信息有错, 因为jdbcUrl[%s]不能为空. 请检查您的配置并作出修改.", StringUtils.join(jdbcUrls, ","))); } boolean connOK = false; for (String url : jdbcUrls) { if (StringUtils.isNotBlank(url)) { url = url.trim(); if (null != preSql && !preSql.isEmpty()) { connOK = testConnWithoutRetry(dataBaseType, url, username, password, preSql); } else { try { connOK = testConnWithoutRetry(dataBaseType, url, username, password, checkSlave); } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.CONN_DB_ERROR, String.format("数据库连接失败. 因为根据您配置的连接信息,无法从:%s 中找到可连接的jdbcUrl. 请检查您的配置并作出修改.", StringUtils.join(jdbcUrls, ",")), e); } } if (connOK) { return url; } } } throw DataXException.asDataXException( DBUtilErrorCode.CONN_DB_ERROR, String.format("数据库连接失败. 因为根据您配置的连接信息,无法从:%s 中找到可连接的jdbcUrl. 请检查您的配置并作出修改.", StringUtils.join(jdbcUrls, ","))); } /** * 检查slave的库中的数据是否已到凌晨00:00 * 如果slave同步的数据还未到00:00返回false * 否则范围true * * @author ZiChi * @version 1.0 2014-12-01 */ private static boolean isSlaveBehind(Connection conn) { try { ResultSet rs = query(conn, "SHOW VARIABLES LIKE 'read_only'"); if (DBUtil.asyncResultSetNext(rs)) { String readOnly = rs.getString("Value"); if ("ON".equalsIgnoreCase(readOnly)) { //备库 ResultSet rs1 = query(conn, "SHOW SLAVE STATUS"); if (DBUtil.asyncResultSetNext(rs1)) { String ioRunning = rs1.getString("Slave_IO_Running"); String sqlRunning = rs1.getString("Slave_SQL_Running"); long secondsBehindMaster = rs1.getLong("Seconds_Behind_Master"); if ("Yes".equalsIgnoreCase(ioRunning) && "Yes".equalsIgnoreCase(sqlRunning)) { ResultSet rs2 = query(conn, "SELECT TIMESTAMPDIFF(SECOND, CURDATE(), NOW())"); DBUtil.asyncResultSetNext(rs2); long secondsOfDay = rs2.getLong(1); return secondsBehindMaster > secondsOfDay; } else { return true; } } else { LOG.warn("SHOW SLAVE STATUS has no result"); } } } else { LOG.warn("SHOW VARIABLES like 'read_only' has no result"); } } catch (Exception e) { LOG.warn("checkSlave failed, errorMessage:[{}].", e.getMessage()); } return false; } /** * 检查表是否具有insert 权限 * insert on *.* 或者 insert on database.* 时验证通过 * 当insert on database.tableName时,确保tableList中的所有table有insert 权限,验证通过 * 其它验证都不通过 * * @author ZiChi * @version 1.0 2015-01-28 */ public static boolean hasInsertPrivilege(DataBaseType dataBaseType, String jdbcURL, String userName, String password, List tableList) { /*准备参数*/ String[] urls = jdbcURL.split("/"); String dbName; if (urls != null && urls.length != 0) { dbName = urls[3]; }else{ return false; } String dbPattern = "`" + dbName + "`.*"; Collection tableNames = new HashSet(tableList.size()); tableNames.addAll(tableList); Connection connection = connect(dataBaseType, jdbcURL, userName, password); try { ResultSet rs = query(connection, "SHOW GRANTS FOR " + userName); while (DBUtil.asyncResultSetNext(rs)) { String grantRecord = rs.getString("Grants for " + userName + "@%"); String[] params = grantRecord.split("\\`"); if (params != null && params.length >= 3) { String tableName = params[3]; if (params[0].contains("INSERT") && !tableName.equals("*") && tableNames.contains(tableName)) tableNames.remove(tableName); } else { if (grantRecord.contains("INSERT") ||grantRecord.contains("ALL PRIVILEGES")) { if (grantRecord.contains("*.*")) return true; else if (grantRecord.contains(dbPattern)) { return true; } } } } } catch (Exception e) { LOG.warn("Check the database has the Insert Privilege failed, errorMessage:[{}]", e.getMessage()); } if (tableNames.isEmpty()) return true; return false; } public static boolean checkInsertPrivilege(DataBaseType dataBaseType, String jdbcURL, String userName, String password, List tableList) { Connection connection = connect(dataBaseType, jdbcURL, userName, password); String insertTemplate = "insert into %s(select * from %s where 1 = 2)"; boolean hasInsertPrivilege = true; Statement insertStmt = null; for(String tableName : tableList) { String checkInsertPrivilegeSql = String.format(insertTemplate, tableName, tableName); try { insertStmt = connection.createStatement(); executeSqlWithoutResultSet(insertStmt, checkInsertPrivilegeSql); } catch (Exception e) { if(DataBaseType.Oracle.equals(dataBaseType)) { if(e.getMessage() != null && e.getMessage().contains("insufficient privileges")) { hasInsertPrivilege = false; LOG.warn("User [" + userName +"] has no 'insert' privilege on table[" + tableName + "], errorMessage:[{}]", e.getMessage()); } } else { hasInsertPrivilege = false; LOG.warn("User [" + userName + "] has no 'insert' privilege on table[" + tableName + "], errorMessage:[{}]", e.getMessage()); } } } try { connection.close(); } catch (SQLException e) { LOG.warn("connection close failed, " + e.getMessage()); } return hasInsertPrivilege; } public static boolean checkDeletePrivilege(DataBaseType dataBaseType,String jdbcURL, String userName, String password, List tableList) { Connection connection = connect(dataBaseType, jdbcURL, userName, password); String deleteTemplate = "delete from %s WHERE 1 = 2"; boolean hasInsertPrivilege = true; Statement deleteStmt = null; for(String tableName : tableList) { String checkDeletePrivilegeSQL = String.format(deleteTemplate, tableName); try { deleteStmt = connection.createStatement(); executeSqlWithoutResultSet(deleteStmt, checkDeletePrivilegeSQL); } catch (Exception e) { hasInsertPrivilege = false; LOG.warn("User [" + userName +"] has no 'delete' privilege on table[" + tableName + "], errorMessage:[{}]", e.getMessage()); } } try { connection.close(); } catch (SQLException e) { LOG.warn("connection close failed, " + e.getMessage()); } return hasInsertPrivilege; } public static boolean needCheckDeletePrivilege(Configuration originalConfig) { List allSqls =new ArrayList(); List preSQLs = originalConfig.getList(Key.PRE_SQL, String.class); List postSQLs = originalConfig.getList(Key.POST_SQL, String.class); if (preSQLs != null && !preSQLs.isEmpty()){ allSqls.addAll(preSQLs); } if (postSQLs != null && !postSQLs.isEmpty()){ allSqls.addAll(postSQLs); } for(String sql : allSqls) { if(StringUtils.isNotBlank(sql)) { if (sql.trim().toUpperCase().startsWith("DELETE")) { return true; } } } return false; } /** * Get direct JDBC connection *

    * if connecting failed, try to connect for MAX_TRY_TIMES times *

    * NOTE: In DataX, we don't need connection pool in fact */ public static Connection getConnection(final DataBaseType dataBaseType, final String jdbcUrl, final String username, final String password) { return getConnection(dataBaseType, jdbcUrl, username, password, String.valueOf(Constant.SOCKET_TIMEOUT_INSECOND * 1000)); } /** * * @param dataBaseType * @param jdbcUrl * @param username * @param password * @param socketTimeout 设置socketTimeout,单位ms,String类型 * @return */ public static Connection getConnection(final DataBaseType dataBaseType, final String jdbcUrl, final String username, final String password, final String socketTimeout) { try { return RetryUtil.executeWithRetry(new Callable() { @Override public Connection call() throws Exception { return DBUtil.connect(dataBaseType, jdbcUrl, username, password, socketTimeout); } }, 9, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.CONN_DB_ERROR, String.format("数据库连接失败. 因为根据您配置的连接信息:%s获取数据库连接失败. 请检查您的配置并作出修改.", jdbcUrl), e); } } /** * Get direct JDBC connection *

    * if connecting failed, try to connect for MAX_TRY_TIMES times *

    * NOTE: In DataX, we don't need connection pool in fact */ public static Connection getConnectionWithoutRetry(final DataBaseType dataBaseType, final String jdbcUrl, final String username, final String password) { return getConnectionWithoutRetry(dataBaseType, jdbcUrl, username, password, String.valueOf(Constant.SOCKET_TIMEOUT_INSECOND * 1000)); } public static Connection getConnectionWithoutRetry(final DataBaseType dataBaseType, final String jdbcUrl, final String username, final String password, String socketTimeout) { return DBUtil.connect(dataBaseType, jdbcUrl, username, password, socketTimeout); } private static synchronized Connection connect(DataBaseType dataBaseType, String url, String user, String pass) { return connect(dataBaseType, url, user, pass, String.valueOf(Constant.SOCKET_TIMEOUT_INSECOND * 1000)); } private static synchronized Connection connect(DataBaseType dataBaseType, String url, String user, String pass, String socketTimeout) { //ob10的处理 if (url.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { String[] ss = url.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); if (ss.length != 3) { throw DataXException .asDataXException( DBUtilErrorCode.JDBC_OB10_ADDRESS_ERROR, "JDBC OB10格式错误,请联系askdatax"); } LOG.info("this is ob1_0 jdbc url."); user = ss[1].trim() +":"+user; url = ss[2].replace("jdbc:mysql:", "jdbc:oceanbase:"); LOG.info("this is ob1_0 jdbc url. user="+user+" :url="+url); } Properties prop = new Properties(); prop.put("user", user); prop.put("password", pass); if (dataBaseType == DataBaseType.Oracle) { //oracle.net.READ_TIMEOUT for jdbc versions < 10.1.0.5 oracle.jdbc.ReadTimeout for jdbc versions >=10.1.0.5 // unit ms prop.put("oracle.jdbc.ReadTimeout", socketTimeout); } if (dataBaseType == DataBaseType.OceanBase) { url = url.replace("jdbc:mysql:", "jdbc:oceanbase:"); } return connect(dataBaseType, url, prop); } private static synchronized Connection connect(DataBaseType dataBaseType, String url, Properties prop) { try { Class.forName(dataBaseType.getDriverClassName()); DriverManager.setLoginTimeout(Constant.TIMEOUT_SECONDS); return DriverManager.getConnection(url, prop); } catch (Exception e) { throw RdbmsException.asConnException(dataBaseType, e, prop.getProperty("user"), null); } } /** * a wrapped method to execute select-like sql statement . * * @param conn Database connection . * @param sql sql statement to be executed * @return a {@link ResultSet} * @throws SQLException if occurs SQLException. */ public static ResultSet query(Connection conn, String sql, int fetchSize) throws SQLException { // 默认3600 s 的query Timeout return query(conn, sql, fetchSize, Constant.SOCKET_TIMEOUT_INSECOND); } /** * a wrapped method to execute select-like sql statement . * * @param conn Database connection . * @param sql sql statement to be executed * @param fetchSize * @param queryTimeout unit:second * @return * @throws SQLException */ public static ResultSet query(Connection conn, String sql, int fetchSize, int queryTimeout) throws SQLException { // make sure autocommit is off conn.setAutoCommit(false); Statement stmt = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(fetchSize); stmt.setQueryTimeout(queryTimeout); return query(stmt, sql); } /** * a wrapped method to execute select-like sql statement . * * @param stmt {@link Statement} * @param sql sql statement to be executed * @return a {@link ResultSet} * @throws SQLException if occurs SQLException. */ public static ResultSet query(Statement stmt, String sql) throws SQLException { return stmt.executeQuery(sql); } public static void executeSqlWithoutResultSet(Statement stmt, String sql) throws SQLException { stmt.execute(sql); } /** * Close {@link ResultSet}, {@link Statement} referenced by this * {@link ResultSet} * * @param rs {@link ResultSet} to be closed * @throws IllegalArgumentException */ public static void closeResultSet(ResultSet rs) { try { if (null != rs) { Statement stmt = rs.getStatement(); if (null != stmt) { stmt.close(); stmt = null; } rs.close(); } rs = null; } catch (SQLException e) { throw new IllegalStateException(e); } } public static void closeDBResources(ResultSet rs, Statement stmt, Connection conn) { if (null != rs) { try { rs.close(); } catch (SQLException unused) { } } if (null != stmt) { try { stmt.close(); } catch (SQLException unused) { } } if (null != conn) { try { conn.close(); } catch (SQLException unused) { } } } public static void closeDBResources(Statement stmt, Connection conn) { closeDBResources(null, stmt, conn); } public static List getTableColumns(DataBaseType dataBaseType, String jdbcUrl, String user, String pass, String tableName) { Connection conn = getConnection(dataBaseType, jdbcUrl, user, pass); return getTableColumnsByConn(dataBaseType, conn, tableName, "jdbcUrl:"+jdbcUrl); } public static List getTableColumnsByConn(DataBaseType dataBaseType, Connection conn, String tableName, String basicMsg) { List columns = new ArrayList(); Statement statement = null; ResultSet rs = null; String queryColumnSql = null; try { statement = conn.createStatement(); queryColumnSql = String.format("select * from %s where 1=2", tableName); rs = statement.executeQuery(queryColumnSql); ResultSetMetaData rsMetaData = rs.getMetaData(); for (int i = 0, len = rsMetaData.getColumnCount(); i < len; i++) { columns.add(rsMetaData.getColumnName(i + 1)); } } catch (SQLException e) { throw RdbmsException.asQueryException(dataBaseType,e,queryColumnSql,tableName,null); } finally { DBUtil.closeDBResources(rs, statement, conn); } return columns; } /** * @return Left:ColumnName Middle:ColumnType Right:ColumnTypeName */ public static Triple, List, List> getColumnMetaData( DataBaseType dataBaseType, String jdbcUrl, String user, String pass, String tableName, String column) { Connection conn = null; try { conn = getConnection(dataBaseType, jdbcUrl, user, pass); return getColumnMetaData(conn, tableName, column); } finally { DBUtil.closeDBResources(null, null, conn); } } /** * @return Left:ColumnName Middle:ColumnType Right:ColumnTypeName */ public static Triple, List, List> getColumnMetaData( Connection conn, String tableName, String column) { Statement statement = null; ResultSet rs = null; Triple, List, List> columnMetaData = new ImmutableTriple, List, List>( new ArrayList(), new ArrayList(), new ArrayList()); try { statement = conn.createStatement(); String queryColumnSql = "select " + column + " from " + tableName + " where 1=2"; rs = statement.executeQuery(queryColumnSql); ResultSetMetaData rsMetaData = rs.getMetaData(); for (int i = 0, len = rsMetaData.getColumnCount(); i < len; i++) { columnMetaData.getLeft().add(rsMetaData.getColumnName(i + 1)); columnMetaData.getMiddle().add(rsMetaData.getColumnType(i + 1)); columnMetaData.getRight().add( rsMetaData.getColumnTypeName(i + 1)); } return columnMetaData; } catch (SQLException e) { throw DataXException .asDataXException(DBUtilErrorCode.GET_COLUMN_INFO_FAILED, String.format("获取表:%s 的字段的元信息时失败. 请联系 DBA 核查该库、表信息.", tableName), e); } finally { DBUtil.closeDBResources(rs, statement, null); } } public static boolean testConnWithoutRetry(DataBaseType dataBaseType, String url, String user, String pass, boolean checkSlave){ Connection connection = null; try { connection = connect(dataBaseType, url, user, pass); if (connection != null) { if (dataBaseType.equals(dataBaseType.MySql) && checkSlave) { //dataBaseType.MySql boolean connOk = !isSlaveBehind(connection); return connOk; } else { return true; } } } catch (Exception e) { LOG.warn("test connection of [{}] failed, for {}.", url, e.getMessage()); } finally { DBUtil.closeDBResources(null, connection); } return false; } public static boolean testConnWithoutRetry(DataBaseType dataBaseType, String url, String user, String pass, List preSql) { Connection connection = null; try { connection = connect(dataBaseType, url, user, pass); if (null != connection) { for (String pre : preSql) { if (doPreCheck(connection, pre) == false) { LOG.warn("doPreCheck failed."); return false; } } return true; } } catch (Exception e) { LOG.warn("test connection of [{}] failed, for {}.", url, e.getMessage()); } finally { DBUtil.closeDBResources(null, connection); } return false; } public static boolean isOracleMaster(final String url, final String user, final String pass) { try { return RetryUtil.executeWithRetry(new Callable() { @Override public Boolean call() throws Exception { Connection conn = null; try { conn = connect(DataBaseType.Oracle, url, user, pass); ResultSet rs = query(conn, "select DATABASE_ROLE from V$DATABASE"); if (DBUtil.asyncResultSetNext(rs, 5)) { String role = rs.getString("DATABASE_ROLE"); return "PRIMARY".equalsIgnoreCase(role); } throw DataXException.asDataXException(DBUtilErrorCode.RS_ASYNC_ERROR, String.format("select DATABASE_ROLE from V$DATABASE failed,请检查您的jdbcUrl:%s.", url)); } finally { DBUtil.closeDBResources(null, conn); } } }, 3, 1000L, true); } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.CONN_DB_ERROR, String.format("select DATABASE_ROLE from V$DATABASE failed, url: %s", url), e); } } public static ResultSet query(Connection conn, String sql) throws SQLException { Statement stmt = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY); //默认3600 seconds stmt.setQueryTimeout(Constant.SOCKET_TIMEOUT_INSECOND); return query(stmt, sql); } private static boolean doPreCheck(Connection conn, String pre) { ResultSet rs = null; try { rs = query(conn, pre); int checkResult = -1; if (DBUtil.asyncResultSetNext(rs)) { checkResult = rs.getInt(1); if (DBUtil.asyncResultSetNext(rs)) { LOG.warn( "pre check failed. It should return one result:0, pre:[{}].", pre); return false; } } if (0 == checkResult) { return true; } LOG.warn( "pre check failed. It should return one result:0, pre:[{}].", pre); } catch (Exception e) { LOG.warn("pre check failed. pre:[{}], errorMessage:[{}].", pre, e.getMessage()); } finally { DBUtil.closeResultSet(rs); } return false; } // warn:until now, only oracle need to handle session config. public static void dealWithSessionConfig(Connection conn, Configuration config, DataBaseType databaseType, String message) { List sessionConfig = null; switch (databaseType) { case Oracle: sessionConfig = config.getList(Key.SESSION, new ArrayList(), String.class); DBUtil.doDealWithSessionConfig(conn, sessionConfig, message); break; case DRDS: // 用于关闭 drds 的分布式事务开关 sessionConfig = new ArrayList(); sessionConfig.add("set transaction policy 4"); DBUtil.doDealWithSessionConfig(conn, sessionConfig, message); break; case MySql: sessionConfig = config.getList(Key.SESSION, new ArrayList(), String.class); DBUtil.doDealWithSessionConfig(conn, sessionConfig, message); break; case SQLServer: sessionConfig = config.getList(Key.SESSION, new ArrayList(), String.class); DBUtil.doDealWithSessionConfig(conn, sessionConfig, message); break; default: break; } } private static void doDealWithSessionConfig(Connection conn, List sessions, String message) { if (null == sessions || sessions.isEmpty()) { return; } Statement stmt; try { stmt = conn.createStatement(); } catch (SQLException e) { throw DataXException .asDataXException(DBUtilErrorCode.SET_SESSION_ERROR, String .format("session配置有误. 因为根据您的配置执行 session 设置失败. 上下文信息是:[%s]. 请检查您的配置并作出修改.", message), e); } for (String sessionSql : sessions) { LOG.info("execute sql:[{}]", sessionSql); try { DBUtil.executeSqlWithoutResultSet(stmt, sessionSql); } catch (SQLException e) { throw DataXException.asDataXException( DBUtilErrorCode.SET_SESSION_ERROR, String.format( "session配置有误. 因为根据您的配置执行 session 设置失败. 上下文信息是:[%s]. 请检查您的配置并作出修改.", message), e); } } DBUtil.closeDBResources(stmt, null); } public static void sqlValid(String sql, DataBaseType dataBaseType){ SQLStatementParser statementParser = SQLParserUtils.createSQLStatementParser(sql,dataBaseType.getTypeName()); statementParser.parseStatementList(); } /** * 异步获取resultSet的next(),注意,千万不能应用在数据的读取中。只能用在meta的获取 * @param resultSet * @return */ public static boolean asyncResultSetNext(final ResultSet resultSet) { return asyncResultSetNext(resultSet, 3600); } public static boolean asyncResultSetNext(final ResultSet resultSet, int timeout) { Future future = rsExecutors.get().submit(new Callable() { @Override public Boolean call() throws Exception { return resultSet.next(); } }); try { return future.get(timeout, TimeUnit.SECONDS); } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.RS_ASYNC_ERROR, "异步获取ResultSet失败", e); } } public static void loadDriverClass(String pluginType, String pluginName) { try { String pluginJsonPath = StringUtils.join( new String[] { System.getProperty("datax.home"), "plugin", pluginType, String.format("%s%s", pluginName, pluginType), "plugin.json" }, File.separator); Configuration configuration = Configuration.from(new File( pluginJsonPath)); List drivers = configuration.getList("drivers", String.class); for (String driver : drivers) { Class.forName(driver); } } catch (ClassNotFoundException e) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, "数据库驱动加载错误, 请确认libs目录有驱动jar包且plugin.json中drivers配置驱动类正确!", e); } } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DBUtilErrorCode.java ================================================ package com.alibaba.datax.plugin.rdbms.util; import com.alibaba.datax.common.spi.ErrorCode; //TODO public enum DBUtilErrorCode implements ErrorCode { //连接错误 MYSQL_CONN_USERPWD_ERROR("MYSQLErrCode-01","数据库用户名或者密码错误,请检查填写的账号密码或者联系DBA确认账号和密码是否正确"), MYSQL_CONN_IPPORT_ERROR("MYSQLErrCode-02","数据库服务的IP地址或者Port错误,请检查填写的IP地址和Port或者联系DBA确认IP地址和Port是否正确。如果是同步中心用户请联系DBA确认idb上录入的IP和PORT信息和数据库的当前实际信息是一致的"), MYSQL_CONN_DB_ERROR("MYSQLErrCode-03","数据库名称错误,请检查数据库实例名称或者联系DBA确认该实例是否存在并且在正常服务"), ORACLE_CONN_USERPWD_ERROR("ORACLEErrCode-01","数据库用户名或者密码错误,请检查填写的账号密码或者联系DBA确认账号和密码是否正确"), ORACLE_CONN_IPPORT_ERROR("ORACLEErrCode-02","数据库服务的IP地址或者Port错误,请检查填写的IP地址和Port或者联系DBA确认IP地址和Port是否正确。如果是同步中心用户请联系DBA确认idb上录入的IP和PORT信息和数据库的当前实际信息是一致的"), ORACLE_CONN_DB_ERROR("ORACLEErrCode-03","数据库名称错误,请检查数据库实例名称或者联系DBA确认该实例是否存在并且在正常服务"), //execute query错误 MYSQL_QUERY_TABLE_NAME_ERROR("MYSQLErrCode-04","表不存在,请检查表名或者联系DBA确认该表是否存在"), MYSQL_QUERY_SQL_ERROR("MYSQLErrCode-05","SQL语句执行出错,请检查Where条件是否存在拼写或语法错误"), MYSQL_QUERY_COLUMN_ERROR("MYSQLErrCode-06","Column信息错误,请检查该列是否存在,如果是常量或者变量,请使用英文单引号’包起来"), MYSQL_QUERY_SELECT_PRI_ERROR("MYSQLErrCode-07","读表数据出错,因为账号没有读表的权限,请联系DBA确认该账号的权限并授权"), ORACLE_QUERY_TABLE_NAME_ERROR("ORACLEErrCode-04","表不存在,请检查表名或者联系DBA确认该表是否存在"), ORACLE_QUERY_SQL_ERROR("ORACLEErrCode-05","SQL语句执行出错,原因可能是你填写的列不存在或者where条件不符合要求,1,请检查该列是否存在,如果是常量或者变量,请使用英文单引号’包起来; 2,请检查Where条件是否存在拼写或语法错误"), ORACLE_QUERY_SELECT_PRI_ERROR("ORACLEErrCode-06","读表数据出错,因为账号没有读表的权限,请联系DBA确认该账号的权限并授权"), ORACLE_QUERY_SQL_PARSER_ERROR("ORACLEErrCode-07","SQL语法出错,请检查Where条件是否存在拼写或语法错误"), //PreSql,Post Sql错误 MYSQL_PRE_SQL_ERROR("MYSQLErrCode-08","PreSQL语法错误,请检查"), MYSQL_POST_SQL_ERROR("MYSQLErrCode-09","PostSql语法错误,请检查"), MYSQL_QUERY_SQL_PARSER_ERROR("MYSQLErrCode-10","SQL语法出错,请检查Where条件是否存在拼写或语法错误"), ORACLE_PRE_SQL_ERROR("ORACLEErrCode-08", "PreSQL语法错误,请检查"), ORACLE_POST_SQL_ERROR("ORACLEErrCode-09", "PostSql语法错误,请检查"), //SplitPK 错误 MYSQL_SPLIT_PK_ERROR("MYSQLErrCode-11","SplitPK错误,请检查"), ORACLE_SPLIT_PK_ERROR("ORACLEErrCode-10","SplitPK错误,请检查"), //Insert,Delete 权限错误 MYSQL_INSERT_ERROR("MYSQLErrCode-12","数据库没有写权限,请联系DBA"), MYSQL_DELETE_ERROR("MYSQLErrCode-13","数据库没有Delete权限,请联系DBA"), ORACLE_INSERT_ERROR("ORACLEErrCode-11","数据库没有写权限,请联系DBA"), ORACLE_DELETE_ERROR("ORACLEErrCode-12","数据库没有Delete权限,请联系DBA"), JDBC_NULL("DBUtilErrorCode-20","JDBC URL为空,请检查配置"), JDBC_OB10_ADDRESS_ERROR("DBUtilErrorCode-OB10-01","JDBC OB10格式错误,请联系askdatax"), CONF_ERROR("DBUtilErrorCode-00", "您的配置错误."), CONN_DB_ERROR("DBUtilErrorCode-10", "连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境)."), GET_COLUMN_INFO_FAILED("DBUtilErrorCode-01", "获取表字段相关信息失败."), UNSUPPORTED_TYPE("DBUtilErrorCode-12", "不支持的数据库类型. 请注意查看 DataX 已经支持的数据库类型以及数据库版本."), COLUMN_SPLIT_ERROR("DBUtilErrorCode-13", "根据主键进行切分失败."), SET_SESSION_ERROR("DBUtilErrorCode-14", "设置 session 失败."), RS_ASYNC_ERROR("DBUtilErrorCode-15", "异步获取ResultSet next失败."), REQUIRED_VALUE("DBUtilErrorCode-03", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("DBUtilErrorCode-02", "您填写的参数值不合法."), ILLEGAL_SPLIT_PK("DBUtilErrorCode-04", "您填写的主键列不合法, DataX 仅支持切分主键为一个,并且类型为整数或者字符串类型."), SPLIT_FAILED_ILLEGAL_SQL("DBUtilErrorCode-15", "DataX尝试切分表时, 执行数据库 Sql 失败. 请检查您的配置 table/splitPk/where 并作出修改."), SQL_EXECUTE_FAIL("DBUtilErrorCode-06", "执行数据库 Sql 失败, 请检查您的配置的 column/table/where/querySql或者向 DBA 寻求帮助."), // only for reader READ_RECORD_FAIL("DBUtilErrorCode-07", "读取数据库数据失败. 请检查您的配置的 column/table/where/querySql或者向 DBA 寻求帮助."), TABLE_QUERYSQL_MIXED("DBUtilErrorCode-08", "您配置凌乱了. 不能同时既配置table又配置querySql"), TABLE_QUERYSQL_MISSING("DBUtilErrorCode-09", "您配置错误. table和querySql 应该并且只能配置一个."), // only for writer WRITE_DATA_ERROR("DBUtilErrorCode-05", "往您配置的写入表中写入数据时失败."), NO_INSERT_PRIVILEGE("DBUtilErrorCode-11", "数据库没有写权限,请联系DBA"), NO_DELETE_PRIVILEGE("DBUtilErrorCode-16", "数据库没有DELETE权限,请联系DBA"), ; private final String code; private final String description; private DBUtilErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java ================================================ package com.alibaba.datax.plugin.rdbms.util; import com.alibaba.datax.common.exception.DataXException; import java.util.regex.Matcher; import java.util.regex.Pattern; /** * refer:http://blog.csdn.net/ring0hx/article/details/6152528 *

    */ public enum DataBaseType { MySql("mysql", "com.mysql.jdbc.Driver"), Tddl("mysql", "com.mysql.jdbc.Driver"), DRDS("drds", "com.mysql.jdbc.Driver"), Oracle("oracle", "oracle.jdbc.OracleDriver"), SQLServer("sqlserver", "com.microsoft.sqlserver.jdbc.SQLServerDriver"), PostgreSQL("postgresql", "org.postgresql.Driver"), RDBMS("rdbms", "com.alibaba.datax.plugin.rdbms.util.DataBaseType"), DB2("db2", "com.ibm.db2.jcc.DB2Driver"), ADB("adb","com.mysql.jdbc.Driver"), ADS("ads","com.mysql.jdbc.Driver"), ClickHouse("clickhouse", "ru.yandex.clickhouse.ClickHouseDriver"), KingbaseES("kingbasees", "com.kingbase8.Driver"), Oscar("oscar", "com.oscar.Driver"), OceanBase("oceanbase", "com.alipay.oceanbase.jdbc.Driver"), StarRocks("starrocks", "com.mysql.jdbc.Driver"), Sybase("sybase", "com.sybase.jdbc4.jdbc.SybDriver"), GaussDB("gaussdb", "org.opengauss.Driver"), Databend("databend", "com.databend.jdbc.DatabendDriver"), Doris("doris","com.mysql.jdbc.Driver"); private String typeName; private String driverClassName; DataBaseType(String typeName, String driverClassName) { this.typeName = typeName; this.driverClassName = driverClassName; } public String getDriverClassName() { return this.driverClassName; } public String appendJDBCSuffixForReader(String jdbc) { String result = jdbc; String suffix = null; switch (this) { case MySql: case DRDS: case OceanBase: suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true"; if (jdbc.contains("?")) { result = jdbc + "&" + suffix; } else { result = jdbc + "?" + suffix; } break; case Oracle: break; case SQLServer: break; case DB2: break; case PostgreSQL: break; case ClickHouse: break; case RDBMS: break; case KingbaseES: break; case Oscar: break; case StarRocks: break; case GaussDB: break; case Doris: break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } return result; } public String appendJDBCSuffixForWriter(String jdbc) { String result = jdbc; String suffix = null; switch (this) { case MySql: suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false"; if (jdbc.contains("?")) { result = jdbc + "&" + suffix; } else { result = jdbc + "?" + suffix; } break; case ADB: suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false"; if (jdbc.contains("?")) { result = jdbc + "&" + suffix; } else { result = jdbc + "?" + suffix; } break; case DRDS: suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull"; if (jdbc.contains("?")) { result = jdbc + "&" + suffix; } else { result = jdbc + "?" + suffix; } break; case Oracle: break; case SQLServer: break; case DB2: break; case PostgreSQL: break; case ClickHouse: break; case RDBMS: break; case Databend: break; case KingbaseES: break; case Oscar: break; case OceanBase: suffix = "yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true"; if (jdbc.contains("?")) { result = jdbc + "&" + suffix; } else { result = jdbc + "?" + suffix; } break; case Sybase: break; case GaussDB: break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } return result; } public String formatPk(String splitPk) { String result = splitPk; switch (this) { case MySql: case Oracle: if (splitPk.length() >= 2 && splitPk.startsWith("`") && splitPk.endsWith("`")) { result = splitPk.substring(1, splitPk.length() - 1).toLowerCase(); } break; case SQLServer: if (splitPk.length() >= 2 && splitPk.startsWith("[") && splitPk.endsWith("]")) { result = splitPk.substring(1, splitPk.length() - 1).toLowerCase(); } break; case DB2: case PostgreSQL: case KingbaseES: case Oscar: break; case GaussDB: break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type."); } return result; } public String quoteColumnName(String columnName) { String result = columnName; switch (this) { case MySql: result = "`" + columnName.replace("`", "``") + "`"; break; case Oracle: break; case SQLServer: result = "[" + columnName + "]"; break; case DB2: case PostgreSQL: case KingbaseES: case Oscar: break; case GaussDB: break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type"); } return result; } public String quoteTableName(String tableName) { String result = tableName; switch (this) { case MySql: result = "`" + tableName.replace("`", "``") + "`"; break; case Oracle: break; case SQLServer: break; case DB2: break; case PostgreSQL: break; case KingbaseES: break; case Oscar: break; case GaussDB: break; default: throw DataXException.asDataXException(DBUtilErrorCode.UNSUPPORTED_TYPE, "unsupported database type"); } return result; } private static Pattern mysqlPattern = Pattern.compile("jdbc:mysql://(.+):\\d+/.+"); private static Pattern oraclePattern = Pattern.compile("jdbc:oracle:thin:@(.+):\\d+:.+"); /** * 注意:目前只实现了从 mysql/oracle 中识别出ip 信息.未识别到则返回 null. */ public static String parseIpFromJdbcUrl(String jdbcUrl) { Matcher mysql = mysqlPattern.matcher(jdbcUrl); if (mysql.matches()) { return mysql.group(1); } Matcher oracle = oraclePattern.matcher(jdbcUrl); if (oracle.matches()) { return oracle.group(1); } return null; } public String getTypeName() { return typeName; } public void setTypeName(String typeName) { this.typeName = typeName; } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/JdbcConnectionFactory.java ================================================ package com.alibaba.datax.plugin.rdbms.util; import java.sql.Connection; /** * Date: 15/3/16 下午3:12 */ public class JdbcConnectionFactory implements ConnectionFactory { private DataBaseType dataBaseType; private String jdbcUrl; private String userName; private String password; public JdbcConnectionFactory(DataBaseType dataBaseType, String jdbcUrl, String userName, String password) { this.dataBaseType = dataBaseType; this.jdbcUrl = jdbcUrl; this.userName = userName; this.password = password; } @Override public Connection getConnecttion() { return DBUtil.getConnection(dataBaseType, jdbcUrl, userName, password); } @Override public Connection getConnecttionWithoutRetry() { return DBUtil.getConnectionWithoutRetry(dataBaseType, jdbcUrl, userName, password); } @Override public String getConnectionInfo() { return "jdbcUrl:" + jdbcUrl; } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/RdbmsException.java ================================================ package com.alibaba.datax.plugin.rdbms.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by judy.lt on 2015/6/5. */ public class RdbmsException extends DataXException{ public RdbmsException(ErrorCode errorCode, String message){ super(errorCode,message); } public static DataXException asConnException(DataBaseType dataBaseType,Exception e,String userName,String dbName){ if (dataBaseType.equals(DataBaseType.MySql)){ DBUtilErrorCode dbUtilErrorCode = mySqlConnectionErrorAna(e.getMessage()); if (dbUtilErrorCode == DBUtilErrorCode.MYSQL_CONN_DB_ERROR && dbName !=null ){ return DataXException.asDataXException(dbUtilErrorCode,"该数据库名称为:"+dbName+" 具体错误信息为:"+e); } if (dbUtilErrorCode == DBUtilErrorCode.MYSQL_CONN_USERPWD_ERROR ){ return DataXException.asDataXException(dbUtilErrorCode,"该数据库用户名为:"+userName+" 具体错误信息为:"+e); } return DataXException.asDataXException(dbUtilErrorCode," 具体错误信息为:"+e); } if (dataBaseType.equals(DataBaseType.Oracle)){ DBUtilErrorCode dbUtilErrorCode = oracleConnectionErrorAna(e.getMessage()); if (dbUtilErrorCode == DBUtilErrorCode.ORACLE_CONN_DB_ERROR && dbName != null){ return DataXException.asDataXException(dbUtilErrorCode,"该数据库名称为:"+dbName+" 具体错误信息为:"+e); } if (dbUtilErrorCode == DBUtilErrorCode.ORACLE_CONN_USERPWD_ERROR ){ return DataXException.asDataXException(dbUtilErrorCode,"该数据库用户名为:"+userName+" 具体错误信息为:"+e); } return DataXException.asDataXException(dbUtilErrorCode," 具体错误信息为:"+e); } return DataXException.asDataXException(DBUtilErrorCode.CONN_DB_ERROR," 具体错误信息为:"+e); } public static DBUtilErrorCode mySqlConnectionErrorAna(String e){ if (e.contains(Constant.MYSQL_DATABASE)){ return DBUtilErrorCode.MYSQL_CONN_DB_ERROR; } if (e.contains(Constant.MYSQL_CONNEXP)){ return DBUtilErrorCode.MYSQL_CONN_IPPORT_ERROR; } if (e.contains(Constant.MYSQL_ACCDENIED)){ return DBUtilErrorCode.MYSQL_CONN_USERPWD_ERROR; } return DBUtilErrorCode.CONN_DB_ERROR; } public static DBUtilErrorCode oracleConnectionErrorAna(String e){ if (e.contains(Constant.ORACLE_DATABASE)){ return DBUtilErrorCode.ORACLE_CONN_DB_ERROR; } if (e.contains(Constant.ORACLE_CONNEXP)){ return DBUtilErrorCode.ORACLE_CONN_IPPORT_ERROR; } if (e.contains(Constant.ORACLE_ACCDENIED)){ return DBUtilErrorCode.ORACLE_CONN_USERPWD_ERROR; } return DBUtilErrorCode.CONN_DB_ERROR; } public static DataXException asQueryException(DataBaseType dataBaseType, Exception e,String querySql,String table,String userName){ if (dataBaseType.equals(DataBaseType.MySql)) { DBUtilErrorCode dbUtilErrorCode = mySqlQueryErrorAna(e.getMessage()); if (dbUtilErrorCode == DBUtilErrorCode.MYSQL_QUERY_TABLE_NAME_ERROR && table != null) { return DataXException.asDataXException(dbUtilErrorCode, "表名为:" + table + " 执行的SQL为:" + querySql + " 具体错误信息为:" + e, e); } if (dbUtilErrorCode == DBUtilErrorCode.MYSQL_QUERY_SELECT_PRI_ERROR && userName != null) { return DataXException.asDataXException(dbUtilErrorCode, "用户名为:" + userName + " 具体错误信息为:" + e, e); } return DataXException.asDataXException(dbUtilErrorCode, "执行的SQL为: " + querySql + " 具体错误信息为:" + e, e); } if (dataBaseType.equals(DataBaseType.Oracle)) { DBUtilErrorCode dbUtilErrorCode = oracleQueryErrorAna(e.getMessage()); if (dbUtilErrorCode == DBUtilErrorCode.ORACLE_QUERY_TABLE_NAME_ERROR && table != null) { return DataXException.asDataXException(dbUtilErrorCode, "表名为:" + table + " 执行的SQL为:" + querySql + " 具体错误信息为:" + e, e); } if (dbUtilErrorCode == DBUtilErrorCode.ORACLE_QUERY_SELECT_PRI_ERROR) { return DataXException.asDataXException(dbUtilErrorCode, "用户名为:" + userName + " 具体错误信息为:" + e, e); } return DataXException.asDataXException(dbUtilErrorCode, "执行的SQL为: " + querySql + " 具体错误信息为:" + e, e); } return DataXException.asDataXException(DBUtilErrorCode.SQL_EXECUTE_FAIL, "执行的SQL为: " + querySql + " 具体错误信息为:" + e, e); } public static DBUtilErrorCode mySqlQueryErrorAna(String e){ if (e.contains(Constant.MYSQL_TABLE_NAME_ERR1) && e.contains(Constant.MYSQL_TABLE_NAME_ERR2)){ return DBUtilErrorCode.MYSQL_QUERY_TABLE_NAME_ERROR; }else if (e.contains(Constant.MYSQL_SELECT_PRI)){ return DBUtilErrorCode.MYSQL_QUERY_SELECT_PRI_ERROR; }else if (e.contains(Constant.MYSQL_COLUMN1) && e.contains(Constant.MYSQL_COLUMN2)){ return DBUtilErrorCode.MYSQL_QUERY_COLUMN_ERROR; }else if (e.contains(Constant.MYSQL_WHERE)){ return DBUtilErrorCode.MYSQL_QUERY_SQL_ERROR; } return DBUtilErrorCode.READ_RECORD_FAIL; } public static DBUtilErrorCode oracleQueryErrorAna(String e){ if (e.contains(Constant.ORACLE_TABLE_NAME)){ return DBUtilErrorCode.ORACLE_QUERY_TABLE_NAME_ERROR; }else if (e.contains(Constant.ORACLE_SQL)){ return DBUtilErrorCode.ORACLE_QUERY_SQL_ERROR; }else if (e.contains(Constant.ORACLE_SELECT_PRI)){ return DBUtilErrorCode.ORACLE_QUERY_SELECT_PRI_ERROR; } return DBUtilErrorCode.READ_RECORD_FAIL; } public static DataXException asSqlParserException(DataBaseType dataBaseType, Exception e,String querySql){ if (dataBaseType.equals(DataBaseType.MySql)){ throw DataXException.asDataXException(DBUtilErrorCode.MYSQL_QUERY_SQL_PARSER_ERROR, "执行的SQL为:"+querySql+" 具体错误信息为:" + e); } if (dataBaseType.equals(DataBaseType.Oracle)){ throw DataXException.asDataXException(DBUtilErrorCode.ORACLE_QUERY_SQL_PARSER_ERROR,"执行的SQL为:"+querySql+" 具体错误信息为:" +e); } throw DataXException.asDataXException(DBUtilErrorCode.READ_RECORD_FAIL,"执行的SQL为:"+querySql+" 具体错误信息为:"+e); } public static DataXException asPreSQLParserException(DataBaseType dataBaseType, Exception e,String querySql){ if (dataBaseType.equals(DataBaseType.MySql)){ throw DataXException.asDataXException(DBUtilErrorCode.MYSQL_PRE_SQL_ERROR, "执行的SQL为:"+querySql+" 具体错误信息为:" + e); } if (dataBaseType.equals(DataBaseType.Oracle)){ throw DataXException.asDataXException(DBUtilErrorCode.ORACLE_PRE_SQL_ERROR,"执行的SQL为:"+querySql+" 具体错误信息为:" +e); } throw DataXException.asDataXException(DBUtilErrorCode.READ_RECORD_FAIL,"执行的SQL为:"+querySql+" 具体错误信息为:"+e); } public static DataXException asPostSQLParserException(DataBaseType dataBaseType, Exception e,String querySql){ if (dataBaseType.equals(DataBaseType.MySql)){ throw DataXException.asDataXException(DBUtilErrorCode.MYSQL_POST_SQL_ERROR, "执行的SQL为:"+querySql+" 具体错误信息为:" + e); } if (dataBaseType.equals(DataBaseType.Oracle)){ throw DataXException.asDataXException(DBUtilErrorCode.ORACLE_POST_SQL_ERROR,"执行的SQL为:"+querySql+" 具体错误信息为:" +e); } throw DataXException.asDataXException(DBUtilErrorCode.READ_RECORD_FAIL,"执行的SQL为:"+querySql+" 具体错误信息为:"+e); } public static DataXException asInsertPriException(DataBaseType dataBaseType, String userName,String jdbcUrl){ if (dataBaseType.equals(DataBaseType.MySql)){ throw DataXException.asDataXException(DBUtilErrorCode.MYSQL_INSERT_ERROR, "用户名为:"+userName+" jdbcURL为:"+jdbcUrl); } if (dataBaseType.equals(DataBaseType.Oracle)){ throw DataXException.asDataXException(DBUtilErrorCode.ORACLE_INSERT_ERROR,"用户名为:"+userName+" jdbcURL为:"+jdbcUrl); } throw DataXException.asDataXException(DBUtilErrorCode.NO_INSERT_PRIVILEGE,"用户名为:"+userName+" jdbcURL为:"+jdbcUrl); } public static DataXException asDeletePriException(DataBaseType dataBaseType, String userName,String jdbcUrl){ if (dataBaseType.equals(DataBaseType.MySql)){ throw DataXException.asDataXException(DBUtilErrorCode.MYSQL_DELETE_ERROR, "用户名为:"+userName+" jdbcURL为:"+jdbcUrl); } if (dataBaseType.equals(DataBaseType.Oracle)){ throw DataXException.asDataXException(DBUtilErrorCode.ORACLE_DELETE_ERROR,"用户名为:"+userName+" jdbcURL为:"+jdbcUrl); } throw DataXException.asDataXException(DBUtilErrorCode.NO_DELETE_PRIVILEGE,"用户名为:"+userName+" jdbcURL为:"+jdbcUrl); } public static DataXException asSplitPKException(DataBaseType dataBaseType, Exception e,String splitSql,String splitPkID){ if (dataBaseType.equals(DataBaseType.MySql)){ return DataXException.asDataXException(DBUtilErrorCode.MYSQL_SPLIT_PK_ERROR,"配置的SplitPK为: "+splitPkID+", 执行的SQL为: "+splitSql+" 具体错误信息为:"+e); } if (dataBaseType.equals(DataBaseType.Oracle)){ return DataXException.asDataXException(DBUtilErrorCode.ORACLE_SPLIT_PK_ERROR,"配置的SplitPK为: "+splitPkID+", 执行的SQL为: "+splitSql+" 具体错误信息为:"+e); } return DataXException.asDataXException(DBUtilErrorCode.READ_RECORD_FAIL,splitSql+e); } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/RdbmsRangeSplitWrap.java ================================================ package com.alibaba.datax.plugin.rdbms.util; import com.alibaba.datax.common.util.RangeSplitUtil; import org.apache.commons.lang3.StringUtils; import java.math.BigInteger; import java.util.ArrayList; import java.util.List; public final class RdbmsRangeSplitWrap { public static List splitAndWrap(String left, String right, int expectSliceNumber, String columnName, String quote, DataBaseType dataBaseType) { String[] tempResult = RangeSplitUtil.doAsciiStringSplit(left, right, expectSliceNumber); return RdbmsRangeSplitWrap.wrapRange(tempResult, columnName, quote, dataBaseType); } // warn: do not use this method long->BigInteger public static List splitAndWrap(long left, long right, int expectSliceNumber, String columnName) { long[] tempResult = RangeSplitUtil.doLongSplit(left, right, expectSliceNumber); return RdbmsRangeSplitWrap.wrapRange(tempResult, columnName); } public static List splitAndWrap(BigInteger left, BigInteger right, int expectSliceNumber, String columnName) { BigInteger[] tempResult = RangeSplitUtil.doBigIntegerSplit(left, right, expectSliceNumber); return RdbmsRangeSplitWrap.wrapRange(tempResult, columnName); } public static List wrapRange(long[] rangeResult, String columnName) { String[] rangeStr = new String[rangeResult.length]; for (int i = 0, len = rangeResult.length; i < len; i++) { rangeStr[i] = String.valueOf(rangeResult[i]); } return wrapRange(rangeStr, columnName, "", null); } public static List wrapRange(BigInteger[] rangeResult, String columnName) { String[] rangeStr = new String[rangeResult.length]; for (int i = 0, len = rangeResult.length; i < len; i++) { rangeStr[i] = rangeResult[i].toString(); } return wrapRange(rangeStr, columnName, "", null); } public static List wrapRange(String[] rangeResult, String columnName, String quote, DataBaseType dataBaseType) { if (null == rangeResult || rangeResult.length < 2) { throw new IllegalArgumentException(String.format( "Parameter rangeResult can not be null and its length can not <2. detail:rangeResult=[%s].", StringUtils.join(rangeResult, ","))); } List result = new ArrayList(); //TODO change to stringbuilder.append(..) if (2 == rangeResult.length) { result.add(String.format(" (%s%s%s <= %s AND %s <= %s%s%s) ", quote, quoteConstantValue(rangeResult[0], dataBaseType), quote, columnName, columnName, quote, quoteConstantValue(rangeResult[1], dataBaseType), quote)); return result; } else { for (int i = 0, len = rangeResult.length - 2; i < len; i++) { result.add(String.format(" (%s%s%s <= %s AND %s < %s%s%s) ", quote, quoteConstantValue(rangeResult[i], dataBaseType), quote, columnName, columnName, quote, quoteConstantValue(rangeResult[i + 1], dataBaseType), quote)); } result.add(String.format(" (%s%s%s <= %s AND %s <= %s%s%s) ", quote, quoteConstantValue(rangeResult[rangeResult.length - 2], dataBaseType), quote, columnName, columnName, quote, quoteConstantValue(rangeResult[rangeResult.length - 1], dataBaseType), quote)); return result; } } public static String wrapFirstLastPoint(String firstPoint, String lastPoint, String columnName, String quote, DataBaseType dataBaseType) { return String.format(" ((%s < %s%s%s) OR (%s%s%s < %s)) ", columnName, quote, quoteConstantValue(firstPoint, dataBaseType), quote, quote, quoteConstantValue(lastPoint, dataBaseType), quote, columnName); } public static String wrapFirstLastPoint(Long firstPoint, Long lastPoint, String columnName) { return wrapFirstLastPoint(firstPoint.toString(), lastPoint.toString(), columnName, "", null); } public static String wrapFirstLastPoint(BigInteger firstPoint, BigInteger lastPoint, String columnName) { return wrapFirstLastPoint(firstPoint.toString(), lastPoint.toString(), columnName, "", null); } private static String quoteConstantValue(String aString, DataBaseType dataBaseType) { if (null == dataBaseType) { return aString; } if (dataBaseType.equals(DataBaseType.MySql)) { return aString.replace("'", "''").replace("\\", "\\\\"); } else if (dataBaseType.equals(DataBaseType.Oracle) || dataBaseType.equals(DataBaseType.SQLServer)) { return aString.replace("'", "''"); } else { //TODO other type supported return aString; } } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/SplitedSlice.java ================================================ package com.alibaba.datax.plugin.rdbms.util; public class SplitedSlice { private String begin; private String end; private String range; public SplitedSlice(String begin, String end, String range) { this.begin = begin; this.end = end; this.range = range; } public String getBegin() { return begin; } public void setBegin(String begin) { this.begin = begin; } public String getEnd() { return end; } public void setEnd(String end) { this.end = end; } public String getRange() { return range; } public void setRange(String range) { this.range = range; } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/TableExpandUtil.java ================================================ package com.alibaba.datax.plugin.rdbms.util; import java.util.ArrayList; import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern; public final class TableExpandUtil { // schema.table[0-2]more // 1 2 3 4 5 public static Pattern pattern = Pattern .compile("(\\w+\\.)?(\\w+)\\[(\\d+)-(\\d+)\\](.*)"); private TableExpandUtil() { } /** * Split the table string(Usually contains names of some tables) to a List * that is formated. example: table[0-32] will be splitted into `table0`, * `table1`, `table2`, ... ,`table32` in {@link List} * * @param tables * a string contains table name(one or many). * @return a split result of table name. *

    * TODO 删除参数 DataBaseType */ public static List splitTables(DataBaseType dataBaseType, String tables) { List splittedTables = new ArrayList(); String[] tableArrays = tables.split(","); String tableName = null; for (String tableArray : tableArrays) { Matcher matcher = pattern.matcher(tableArray.trim()); if (!matcher.matches()) { tableName = tableArray.trim(); splittedTables.add(tableName); } else { String start = matcher.group(3).trim(); String end = matcher.group(4).trim(); String tmp = ""; if (Integer.valueOf(start) > Integer.valueOf(end)) { tmp = start; start = end; end = tmp; } int len = start.length(); String schema = null; for (int k = Integer.valueOf(start); k <= Integer.valueOf(end); k++) { schema = (null == matcher.group(1)) ? "" : matcher.group(1) .trim(); if (start.startsWith("0")) { tableName = schema + matcher.group(2).trim() + String.format("%0" + len + "d", k) + matcher.group(5).trim(); splittedTables.add(tableName); } else { tableName = schema + matcher.group(2).trim() + String.format("%d", k) + matcher.group(5).trim(); splittedTables.add(tableName); } } } } return splittedTables; } public static List expandTableConf(DataBaseType dataBaseType, List tables) { List parsedTables = new ArrayList(); for (String table : tables) { List splittedTables = splitTables(dataBaseType, table); parsedTables.addAll(splittedTables); } return parsedTables; } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/CommonRdbmsWriter.java ================================================ package com.alibaba.datax.plugin.rdbms.writer; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil; import com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil; import java.util.concurrent.atomic.AtomicLong; import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.tuple.Triple; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Types; import java.util.ArrayList; import java.util.List; public class CommonRdbmsWriter { public static class Job { private DataBaseType dataBaseType; private static final Logger LOG = LoggerFactory .getLogger(Job.class); public Job(DataBaseType dataBaseType) { this.dataBaseType = dataBaseType; OriginalConfPretreatmentUtil.DATABASE_TYPE = this.dataBaseType; } public void init(Configuration originalConfig) { OriginalConfPretreatmentUtil.doPretreatment(originalConfig, this.dataBaseType); LOG.debug("After job init(), originalConfig now is:[\n{}\n]", originalConfig.toJSON()); } /*目前只支持MySQL Writer跟Oracle Writer;检查PreSQL跟PostSQL语法以及insert,delete权限*/ public void writerPreCheck(Configuration originalConfig, DataBaseType dataBaseType) { /*检查PreSql跟PostSql语句*/ prePostSqlValid(originalConfig, dataBaseType); /*检查insert 跟delete权限*/ privilegeValid(originalConfig, dataBaseType); } public void prePostSqlValid(Configuration originalConfig, DataBaseType dataBaseType) { /*检查PreSql跟PostSql语句*/ WriterUtil.preCheckPrePareSQL(originalConfig, dataBaseType); WriterUtil.preCheckPostSQL(originalConfig, dataBaseType); } public void privilegeValid(Configuration originalConfig, DataBaseType dataBaseType) { /*检查insert 跟delete权限*/ String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); List connections = originalConfig.getList(Constant.CONN_MARK, Object.class); for (int i = 0, len = connections.size(); i < len; i++) { Configuration connConf = Configuration.from(connections.get(i).toString()); String jdbcUrl = connConf.getString(Key.JDBC_URL); List expandedTables = connConf.getList(Key.TABLE, String.class); boolean hasInsertPri = DBUtil.checkInsertPrivilege(dataBaseType, jdbcUrl, username, password, expandedTables); if (!hasInsertPri) { throw RdbmsException.asInsertPriException(dataBaseType, originalConfig.getString(Key.USERNAME), jdbcUrl); } if (DBUtil.needCheckDeletePrivilege(originalConfig)) { boolean hasDeletePri = DBUtil.checkDeletePrivilege(dataBaseType, jdbcUrl, username, password, expandedTables); if (!hasDeletePri) { throw RdbmsException.asDeletePriException(dataBaseType, originalConfig.getString(Key.USERNAME), jdbcUrl); } } } } // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) public void prepare(Configuration originalConfig) { int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); if (tableNumber == 1) { String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); Configuration connConf = Configuration.from(conns.get(0) .toString()); // 这里的 jdbcUrl 已经 append 了合适后缀参数 String jdbcUrl = connConf.getString(Key.JDBC_URL); originalConfig.set(Key.JDBC_URL, jdbcUrl); String table = connConf.getList(Key.TABLE, String.class).get(0); originalConfig.set(Key.TABLE, table); List preSqls = originalConfig.getList(Key.PRE_SQL, String.class); List renderedPreSqls = WriterUtil.renderPreOrPostSqls( preSqls, table); originalConfig.remove(Constant.CONN_MARK); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { // 说明有 preSql 配置,则此处删除掉 originalConfig.remove(Key.PRE_SQL); Connection conn = DBUtil.getConnection(dataBaseType, jdbcUrl, username, password); LOG.info("Begin to execute preSqls:[{}]. context info:{}.", StringUtils.join(renderedPreSqls, ";"), jdbcUrl); WriterUtil.executeSqls(conn, renderedPreSqls, jdbcUrl, dataBaseType); DBUtil.closeDBResources(null, null, conn); } } LOG.debug("After job prepare(), originalConfig now is:[\n{}\n]", originalConfig.toJSON()); } public List split(Configuration originalConfig, int mandatoryNumber) { return WriterUtil.doSplit(originalConfig, mandatoryNumber); } // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) public void post(Configuration originalConfig) { int tableNumber = originalConfig.getInt(Constant.TABLE_NUMBER_MARK); if (tableNumber == 1) { String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); // 已经由 prepare 进行了appendJDBCSuffix处理 String jdbcUrl = originalConfig.getString(Key.JDBC_URL); String table = originalConfig.getString(Key.TABLE); List postSqls = originalConfig.getList(Key.POST_SQL, String.class); List renderedPostSqls = WriterUtil.renderPreOrPostSqls( postSqls, table); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { // 说明有 postSql 配置,则此处删除掉 originalConfig.remove(Key.POST_SQL); Connection conn = DBUtil.getConnection(this.dataBaseType, jdbcUrl, username, password); LOG.info( "Begin to execute postSqls:[{}]. context info:{}.", StringUtils.join(renderedPostSqls, ";"), jdbcUrl); WriterUtil.executeSqls(conn, renderedPostSqls, jdbcUrl, dataBaseType); DBUtil.closeDBResources(null, null, conn); } } } public void destroy(Configuration originalConfig) { } } public static class Task { protected static final Logger LOG = LoggerFactory .getLogger(Task.class); protected DataBaseType dataBaseType; private static final String VALUE_HOLDER = "?"; protected String username; protected String password; protected String jdbcUrl; protected String table; protected List columns; protected List preSqls; protected List postSqls; protected int batchSize; protected int batchByteSize; protected int columnNumber = 0; protected TaskPluginCollector taskPluginCollector; // 作为日志显示信息时,需要附带的通用信息。比如信息所对应的数据库连接等信息,针对哪个表做的操作 protected static String BASIC_MESSAGE; protected static String INSERT_OR_REPLACE_TEMPLATE; protected String writeRecordSql; protected String writeMode; protected boolean emptyAsNull; protected Triple, List, List> resultSetMetaData; private int dumpRecordLimit = Constant.DEFAULT_DUMP_RECORD_LIMIT; private AtomicLong dumpRecordCount = new AtomicLong(0); public Task(DataBaseType dataBaseType) { this.dataBaseType = dataBaseType; } public void init(Configuration writerSliceConfig) { this.username = writerSliceConfig.getString(Key.USERNAME); this.password = writerSliceConfig.getString(Key.PASSWORD); this.jdbcUrl = writerSliceConfig.getString(Key.JDBC_URL); //ob10的处理 if (this.jdbcUrl.startsWith(Constant.OB10_SPLIT_STRING)) { String[] ss = this.jdbcUrl.split(Constant.OB10_SPLIT_STRING_PATTERN); if (ss.length != 3) { throw DataXException .asDataXException( DBUtilErrorCode.JDBC_OB10_ADDRESS_ERROR, "JDBC OB10格式错误,请联系askdatax"); } LOG.info("this is ob1_0 jdbc url."); this.username = ss[1].trim() + ":" + this.username; this.jdbcUrl = ss[2]; LOG.info("this is ob1_0 jdbc url. user=" + this.username + " :url=" + this.jdbcUrl); } this.table = writerSliceConfig.getString(Key.TABLE); this.columns = writerSliceConfig.getList(Key.COLUMN, String.class); this.columnNumber = this.columns.size(); this.preSqls = writerSliceConfig.getList(Key.PRE_SQL, String.class); this.postSqls = writerSliceConfig.getList(Key.POST_SQL, String.class); this.batchSize = writerSliceConfig.getInt(Key.BATCH_SIZE, Constant.DEFAULT_BATCH_SIZE); this.batchByteSize = writerSliceConfig.getInt(Key.BATCH_BYTE_SIZE, Constant.DEFAULT_BATCH_BYTE_SIZE); writeMode = writerSliceConfig.getString(Key.WRITE_MODE, "INSERT"); emptyAsNull = writerSliceConfig.getBool(Key.EMPTY_AS_NULL, true); INSERT_OR_REPLACE_TEMPLATE = writerSliceConfig.getString(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK); this.writeRecordSql = String.format(INSERT_OR_REPLACE_TEMPLATE, this.table); BASIC_MESSAGE = String.format("jdbcUrl:[%s], table:[%s]", this.jdbcUrl, this.table); } public void prepare(Configuration writerSliceConfig) { Connection connection = DBUtil.getConnection(this.dataBaseType, this.jdbcUrl, username, password); DBUtil.dealWithSessionConfig(connection, writerSliceConfig, this.dataBaseType, BASIC_MESSAGE); int tableNumber = writerSliceConfig.getInt( Constant.TABLE_NUMBER_MARK); if (tableNumber != 1) { LOG.info("Begin to execute preSqls:[{}]. context info:{}.", StringUtils.join(this.preSqls, ";"), BASIC_MESSAGE); WriterUtil.executeSqls(connection, this.preSqls, BASIC_MESSAGE, dataBaseType); } DBUtil.closeDBResources(null, null, connection); } public void startWriteWithConnection(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector, Connection connection) { this.taskPluginCollector = taskPluginCollector; // 用于写入数据的时候的类型根据目的表字段类型转换 this.resultSetMetaData = DBUtil.getColumnMetaData(connection, this.table, StringUtils.join(this.columns, ",")); // 写数据库的SQL语句 calcWriteRecordSql(); List writeBuffer = new ArrayList(this.batchSize); int bufferBytes = 0; try { Record record; while ((record = recordReceiver.getFromReader()) != null) { if (record.getColumnNumber() != this.columnNumber) { // 源头读取字段列数与目的表字段写入列数不相等,直接报错 throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "列配置信息有错误. 因为您配置的任务中,源头读取字段数:%s 与 目的表要写入的字段数:%s 不相等. 请检查您的配置并作出修改.", record.getColumnNumber(), this.columnNumber)); } writeBuffer.add(record); bufferBytes += record.getMemorySize(); if (writeBuffer.size() >= batchSize || bufferBytes >= batchByteSize) { doBatchInsert(connection, writeBuffer); writeBuffer.clear(); bufferBytes = 0; } } if (!writeBuffer.isEmpty()) { doBatchInsert(connection, writeBuffer); writeBuffer.clear(); bufferBytes = 0; } } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } finally { writeBuffer.clear(); bufferBytes = 0; DBUtil.closeDBResources(null, null, connection); } } // TODO 改用连接池,确保每次获取的连接都是可用的(注意:连接可能需要每次都初始化其 session) public void startWrite(RecordReceiver recordReceiver, Configuration writerSliceConfig, TaskPluginCollector taskPluginCollector) { Connection connection = DBUtil.getConnection(this.dataBaseType, this.jdbcUrl, username, password); DBUtil.dealWithSessionConfig(connection, writerSliceConfig, this.dataBaseType, BASIC_MESSAGE); startWriteWithConnection(recordReceiver, taskPluginCollector, connection); } public void post(Configuration writerSliceConfig) { int tableNumber = writerSliceConfig.getInt( Constant.TABLE_NUMBER_MARK); boolean hasPostSql = (this.postSqls != null && this.postSqls.size() > 0); if (tableNumber == 1 || !hasPostSql) { return; } Connection connection = DBUtil.getConnection(this.dataBaseType, this.jdbcUrl, username, password); LOG.info("Begin to execute postSqls:[{}]. context info:{}.", StringUtils.join(this.postSqls, ";"), BASIC_MESSAGE); WriterUtil.executeSqls(connection, this.postSqls, BASIC_MESSAGE, dataBaseType); DBUtil.closeDBResources(null, null, connection); } public void destroy(Configuration writerSliceConfig) { } protected void doBatchInsert(Connection connection, List buffer) throws SQLException { PreparedStatement preparedStatement = null; try { connection.setAutoCommit(false); preparedStatement = connection .prepareStatement(this.writeRecordSql); for (Record record : buffer) { preparedStatement = fillPreparedStatement( preparedStatement, record); preparedStatement.addBatch(); } preparedStatement.executeBatch(); connection.commit(); } catch (SQLException e) { LOG.warn("回滚此次写入, 采用每次写入一行方式提交. 因为:" + e.getMessage()); connection.rollback(); doOneInsert(connection, buffer); } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } finally { DBUtil.closeDBResources(preparedStatement, null); } } public boolean needToDumpRecord() { return dumpRecordCount.incrementAndGet() <= dumpRecordLimit; } public void doOneInsert(Connection connection, List buffer) { PreparedStatement preparedStatement = null; try { connection.setAutoCommit(true); preparedStatement = connection .prepareStatement(this.writeRecordSql); for (Record record : buffer) { try { preparedStatement = fillPreparedStatement( preparedStatement, record); preparedStatement.execute(); } catch (SQLException e) { if (needToDumpRecord()) { LOG.warn("ERROR : record {}", record); LOG.warn("Insert fatal error SqlState ={}, errorCode = {}, {}", e.getSQLState(), e.getErrorCode(), e); } this.taskPluginCollector.collectDirtyRecord(record, e); } finally { // 最后不要忘了关闭 preparedStatement preparedStatement.clearParameters(); } } } catch (Exception e) { throw DataXException.asDataXException( DBUtilErrorCode.WRITE_DATA_ERROR, e); } finally { DBUtil.closeDBResources(preparedStatement, null); } } // 直接使用了两个类变量:columnNumber,resultSetMetaData protected PreparedStatement fillPreparedStatement(PreparedStatement preparedStatement, Record record) throws SQLException { for (int i = 0; i < this.columnNumber; i++) { int columnSqltype = this.resultSetMetaData.getMiddle().get(i); String typeName = this.resultSetMetaData.getRight().get(i); preparedStatement = fillPreparedStatementColumnType(preparedStatement, i, columnSqltype, typeName, record.getColumn(i)); } return preparedStatement; } protected PreparedStatement fillPreparedStatementColumnType(PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException { java.util.Date utilDate; switch (columnSqltype) { case Types.CHAR: case Types.NCHAR: case Types.CLOB: case Types.NCLOB: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: preparedStatement.setString(columnIndex + 1, column .asString()); break; case Types.SMALLINT: case Types.INTEGER: case Types.BIGINT: case Types.NUMERIC: case Types.DECIMAL: case Types.FLOAT: case Types.REAL: case Types.DOUBLE: String strValue = column.asString(); if (emptyAsNull && "".equals(strValue)) { preparedStatement.setString(columnIndex + 1, null); } else { preparedStatement.setString(columnIndex + 1, strValue); } break; //tinyint is a little special in some database like mysql {boolean->tinyint(1)} case Types.TINYINT: Long longValue = column.asLong(); if (null == longValue) { preparedStatement.setString(columnIndex + 1, null); } else { preparedStatement.setString(columnIndex + 1, longValue.toString()); } break; // for mysql bug, see http://bugs.mysql.com/bug.php?id=35115 case Types.DATE: if (typeName == null) { typeName = this.resultSetMetaData.getRight().get(columnIndex); } if (typeName.equalsIgnoreCase("year")) { if (column.asBigInteger() == null) { preparedStatement.setString(columnIndex + 1, null); } else { preparedStatement.setInt(columnIndex + 1, column.asBigInteger().intValue()); } } else { java.sql.Date sqlDate = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "Date 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlDate = new java.sql.Date(utilDate.getTime()); } preparedStatement.setDate(columnIndex + 1, sqlDate); } break; case Types.TIME: java.sql.Time sqlTime = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "TIME 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlTime = new java.sql.Time(utilDate.getTime()); } preparedStatement.setTime(columnIndex + 1, sqlTime); break; case Types.TIMESTAMP: java.sql.Timestamp sqlTimestamp = null; try { utilDate = column.asDate(); } catch (DataXException e) { throw new SQLException(String.format( "TIMESTAMP 类型转换错误:[%s]", column)); } if (null != utilDate) { sqlTimestamp = new java.sql.Timestamp( utilDate.getTime()); } preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); break; case Types.BINARY: case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: preparedStatement.setBytes(columnIndex + 1, column .asBytes()); break; case Types.BOOLEAN: preparedStatement.setBoolean(columnIndex + 1, column.asBoolean()); break; // warn: bit(1) -> Types.BIT 可使用setBoolean // warn: bit(>1) -> Types.VARBINARY 可使用setBytes case Types.BIT: if (this.dataBaseType == DataBaseType.MySql) { preparedStatement.setBoolean(columnIndex + 1, column.asBoolean()); } else { preparedStatement.setString(columnIndex + 1, column.asString()); } break; default: throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库写入这种字段类型. 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s]. 请修改表中该字段的类型或者不同步该字段.", this.resultSetMetaData.getLeft() .get(columnIndex), this.resultSetMetaData.getMiddle() .get(columnIndex), this.resultSetMetaData.getRight() .get(columnIndex))); } return preparedStatement; } private void calcWriteRecordSql() { if (!VALUE_HOLDER.equals(calcValueHolder(""))) { List valueHolders = new ArrayList(columnNumber); for (int i = 0; i < columns.size(); i++) { String type = resultSetMetaData.getRight().get(i); valueHolders.add(calcValueHolder(type)); } boolean forceUseUpdate = false; //ob10的处理 if (dataBaseType != null && dataBaseType == DataBaseType.MySql && OriginalConfPretreatmentUtil.isOB10(jdbcUrl)) { forceUseUpdate = true; } INSERT_OR_REPLACE_TEMPLATE = WriterUtil.getWriteTemplate(columns, valueHolders, writeMode, dataBaseType, forceUseUpdate); writeRecordSql = String.format(INSERT_OR_REPLACE_TEMPLATE, this.table); } } protected String calcValueHolder(String columnType) { return VALUE_HOLDER; } } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Constant.java ================================================ package com.alibaba.datax.plugin.rdbms.writer; /** * 用于插件解析用户配置时,需要进行标识(MARK)的常量的声明. */ public final class Constant { public static final int DEFAULT_BATCH_SIZE = 2048; public static final int DEFAULT_BATCH_BYTE_SIZE = 32 * 1024 * 1024; public static String TABLE_NAME_PLACEHOLDER = "@table"; public static String CONN_MARK = "connection"; public static String TABLE_NUMBER_MARK = "tableNumber"; public static String INSERT_OR_REPLACE_TEMPLATE_MARK = "insertOrReplaceTemplate"; public static final String OB10_SPLIT_STRING = "||_dsc_ob10_dsc_||"; public static final String OB10_SPLIT_STRING_PATTERN = "\\|\\|_dsc_ob10_dsc_\\|\\|"; public static final int DEFAULT_DUMP_RECORD_LIMIT = 10; } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/Key.java ================================================ package com.alibaba.datax.plugin.rdbms.writer; public final class Key { public final static String JDBC_URL = "jdbcUrl"; public final static String USERNAME = "username"; public final static String PASSWORD = "password"; public final static String TABLE = "table"; public final static String COLUMN = "column"; public final static String ONCONFLICT_COLUMN = "onConflictColumn"; //可选值为:insert,replace,默认为 insert (mysql 支持,oracle 没用 replace 机制,只能 insert,oracle 可以不暴露这个参数) public final static String WRITE_MODE = "writeMode"; public final static String PRE_SQL = "preSql"; public final static String POST_SQL = "postSql"; public final static String TDDL_APP_NAME = "appName"; //默认值:256 public final static String BATCH_SIZE = "batchSize"; //默认值:32m public final static String BATCH_BYTE_SIZE = "batchByteSize"; public final static String EMPTY_AS_NULL = "emptyAsNull"; public final static String DB_NAME_PATTERN = "dbNamePattern"; public final static String DB_RULE = "dbRule"; public final static String TABLE_NAME_PATTERN = "tableNamePattern"; public final static String TABLE_RULE = "tableRule"; public final static String DRYRUN = "dryRun"; } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/MysqlWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.rdbms.writer; import com.alibaba.datax.common.spi.ErrorCode; //TODO 后续考虑与 util 包种的 DBUTilErrorCode 做合并.(区分读和写的错误码) public enum MysqlWriterErrorCode implements ErrorCode { ; private final String code; private final String describe; private MysqlWriterErrorCode(String code, String describe) { this.code = code; this.describe = describe; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.describe; } @Override public String toString() { return String.format("Code:[%s], Describe:[%s]. ", this.code, this.describe); } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/OriginalConfPretreatmentUtil.java ================================================ package com.alibaba.datax.plugin.rdbms.writer.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.ListUtil; import com.alibaba.datax.plugin.rdbms.util.*; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.datax.plugin.rdbms.writer.Key; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.List; public final class OriginalConfPretreatmentUtil { private static final Logger LOG = LoggerFactory .getLogger(OriginalConfPretreatmentUtil.class); public static DataBaseType DATABASE_TYPE; // public static void doPretreatment(Configuration originalConfig) { // doPretreatment(originalConfig,null); // } public static void doPretreatment(Configuration originalConfig, DataBaseType dataBaseType) { // 检查 username/password 配置(必填) originalConfig.getNecessaryValue(Key.USERNAME, DBUtilErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.PASSWORD, DBUtilErrorCode.REQUIRED_VALUE); doCheckBatchSize(originalConfig); simplifyConf(originalConfig); dealColumnConf(originalConfig); dealWriteMode(originalConfig, dataBaseType); } public static void doCheckBatchSize(Configuration originalConfig) { // 检查batchSize 配置(选填,如果未填写,则设置为默认值) int batchSize = originalConfig.getInt(Key.BATCH_SIZE, Constant.DEFAULT_BATCH_SIZE); if (batchSize < 1) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, String.format( "您的batchSize配置有误. 您所配置的写入数据库表的 batchSize:%s 不能小于1. 推荐配置范围为:[100-1000], 该值越大, 内存溢出可能性越大. 请检查您的配置并作出修改.", batchSize)); } originalConfig.set(Key.BATCH_SIZE, batchSize); } public static void simplifyConf(Configuration originalConfig) { List connections = originalConfig.getList(Constant.CONN_MARK, Object.class); int tableNum = 0; for (int i = 0, len = connections.size(); i < len; i++) { Configuration connConf = Configuration.from(connections.get(i).toString()); String jdbcUrl = connConf.getString(Key.JDBC_URL); if (StringUtils.isBlank(jdbcUrl)) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, "您未配置的写入数据库表的 jdbcUrl."); } jdbcUrl = DATABASE_TYPE.appendJDBCSuffixForWriter(jdbcUrl); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.JDBC_URL), jdbcUrl); List tables = connConf.getList(Key.TABLE, String.class); if (null == tables || tables.isEmpty()) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, "您未配置写入数据库表的表名称. 根据配置DataX找不到您配置的表. 请检查您的配置并作出修改."); } // 对每一个connection 上配置的table 项进行解析 List expandedTables = TableExpandUtil .expandTableConf(DATABASE_TYPE, tables); if (null == expandedTables || expandedTables.isEmpty()) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, "您配置的写入数据库表名称错误. DataX找不到您配置的表,请检查您的配置并作出修改."); } tableNum += expandedTables.size(); originalConfig.set(String.format("%s[%d].%s", Constant.CONN_MARK, i, Key.TABLE), expandedTables); } originalConfig.set(Constant.TABLE_NUMBER_MARK, tableNum); } public static void dealColumnConf(Configuration originalConfig, ConnectionFactory connectionFactory, String oneTable) { List userConfiguredColumns = originalConfig.getList(Key.COLUMN, String.class); if (null == userConfiguredColumns || userConfiguredColumns.isEmpty()) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, "您的配置文件中的列配置信息有误. 因为您未配置写入数据库表的列名称,DataX获取不到列信息. 请检查您的配置并作出修改."); } else { boolean isPreCheck = originalConfig.getBool(Key.DRYRUN, false); List allColumns; if (isPreCheck){ allColumns = DBUtil.getTableColumnsByConn(DATABASE_TYPE,connectionFactory.getConnecttionWithoutRetry(), oneTable, connectionFactory.getConnectionInfo()); }else{ allColumns = DBUtil.getTableColumnsByConn(DATABASE_TYPE,connectionFactory.getConnecttion(), oneTable, connectionFactory.getConnectionInfo()); } LOG.info("table:[{}] all columns:[\n{}\n].", oneTable, StringUtils.join(allColumns, ",")); if (1 == userConfiguredColumns.size() && "*".equals(userConfiguredColumns.get(0))) { LOG.warn("您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改."); // 回填其值,需要以 String 的方式转交后续处理 originalConfig.set(Key.COLUMN, allColumns); } else if (userConfiguredColumns.size() > allColumns.size()) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, String.format("您的配置文件中的列配置信息有误. 因为您所配置的写入数据库表的字段个数:%s 大于目的表的总字段总个数:%s. 请检查您的配置并作出修改.", userConfiguredColumns.size(), allColumns.size())); } else { // 确保用户配置的 column 不重复 ListUtil.makeSureNoValueDuplicate(userConfiguredColumns, false); Connection connection = null; try { connection = connectionFactory.getConnecttion(); // 检查列是否都为数据库表中正确的列(通过执行一次 select column from table 进行判断) DBUtil.getColumnMetaData(connection, oneTable,StringUtils.join(userConfiguredColumns, ",")); } finally { DBUtil.closeDBResources(null, connection); } } } } public static void dealColumnConf(Configuration originalConfig) { String jdbcUrl = originalConfig.getString(String.format("%s[0].%s", Constant.CONN_MARK, Key.JDBC_URL)); String username = originalConfig.getString(Key.USERNAME); String password = originalConfig.getString(Key.PASSWORD); String oneTable = originalConfig.getString(String.format( "%s[0].%s[0]", Constant.CONN_MARK, Key.TABLE)); JdbcConnectionFactory jdbcConnectionFactory = new JdbcConnectionFactory(DATABASE_TYPE, jdbcUrl, username, password); dealColumnConf(originalConfig, jdbcConnectionFactory, oneTable); } public static void dealWriteMode(Configuration originalConfig, DataBaseType dataBaseType) { List columns = originalConfig.getList(Key.COLUMN, String.class); String jdbcUrl = originalConfig.getString(String.format("%s[0].%s", Constant.CONN_MARK, Key.JDBC_URL, String.class)); // 默认为:insert 方式 String writeMode = originalConfig.getString(Key.WRITE_MODE, "INSERT"); List valueHolders = new ArrayList(columns.size()); for (int i = 0; i < columns.size(); i++) { valueHolders.add("?"); } boolean forceUseUpdate = false; //ob10的处理 if (dataBaseType == DataBaseType.MySql && isOB10(jdbcUrl)) { forceUseUpdate = true; } String writeDataSqlTemplate = WriterUtil.getWriteTemplate(columns, valueHolders, writeMode,dataBaseType, forceUseUpdate); LOG.info("Write data [\n{}\n], which jdbcUrl like:[{}]", writeDataSqlTemplate, jdbcUrl); originalConfig.set(Constant.INSERT_OR_REPLACE_TEMPLATE_MARK, writeDataSqlTemplate); } public static boolean isOB10(String jdbcUrl) { //ob10的处理 if (jdbcUrl.startsWith(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING)) { String[] ss = jdbcUrl.split(com.alibaba.datax.plugin.rdbms.writer.Constant.OB10_SPLIT_STRING_PATTERN); if (ss.length != 3) { throw DataXException .asDataXException( DBUtilErrorCode.JDBC_OB10_ADDRESS_ERROR, "JDBC OB10格式错误,请联系askdatax"); } return true; } return false; } } ================================================ FILE: plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/writer/util/WriterUtil.java ================================================ package com.alibaba.datax.plugin.rdbms.writer.util; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.druid.sql.parser.ParserException; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.Statement; import java.util.*; public final class WriterUtil { private static final Logger LOG = LoggerFactory.getLogger(WriterUtil.class); //TODO 切分报错 public static List doSplit(Configuration simplifiedConf, int adviceNumber) { List splitResultConfigs = new ArrayList(); int tableNumber = simplifiedConf.getInt(Constant.TABLE_NUMBER_MARK); //处理单表的情况 if (tableNumber == 1) { //由于在之前的 master prepare 中已经把 table,jdbcUrl 提取出来,所以这里处理十分简单 for (int j = 0; j < adviceNumber; j++) { splitResultConfigs.add(simplifiedConf.clone()); } return splitResultConfigs; } if (tableNumber != adviceNumber) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, String.format("您的配置文件中的列配置信息有误. 您要写入的目的端的表个数是:%s , 但是根据系统建议需要切分的份数是:%s. 请检查您的配置并作出修改.", tableNumber, adviceNumber)); } String jdbcUrl; List preSqls = simplifiedConf.getList(Key.PRE_SQL, String.class); List postSqls = simplifiedConf.getList(Key.POST_SQL, String.class); List conns = simplifiedConf.getList(Constant.CONN_MARK, Object.class); for (Object conn : conns) { Configuration sliceConfig = simplifiedConf.clone(); Configuration connConf = Configuration.from(conn.toString()); jdbcUrl = connConf.getString(Key.JDBC_URL); sliceConfig.set(Key.JDBC_URL, jdbcUrl); sliceConfig.remove(Constant.CONN_MARK); List tables = connConf.getList(Key.TABLE, String.class); for (String table : tables) { Configuration tempSlice = sliceConfig.clone(); tempSlice.set(Key.TABLE, table); tempSlice.set(Key.PRE_SQL, renderPreOrPostSqls(preSqls, table)); tempSlice.set(Key.POST_SQL, renderPreOrPostSqls(postSqls, table)); splitResultConfigs.add(tempSlice); } } return splitResultConfigs; } public static List renderPreOrPostSqls(List preOrPostSqls, String tableName) { if (null == preOrPostSqls) { return Collections.emptyList(); } List renderedSqls = new ArrayList(); for (String sql : preOrPostSqls) { //preSql为空时,不加入执行队列 if (StringUtils.isNotBlank(sql)) { renderedSqls.add(sql.replace(Constant.TABLE_NAME_PLACEHOLDER, tableName)); } } return renderedSqls; } public static void executeSqls(Connection conn, List sqls, String basicMessage,DataBaseType dataBaseType) { Statement stmt = null; String currentSql = null; try { stmt = conn.createStatement(); for (String sql : sqls) { currentSql = sql; DBUtil.executeSqlWithoutResultSet(stmt, sql); } } catch (Exception e) { throw RdbmsException.asQueryException(dataBaseType,e,currentSql,null,null); } finally { DBUtil.closeDBResources(null, stmt, null); } } public static String getWriteTemplate(List columnHolders, List valueHolders, String writeMode, DataBaseType dataBaseType, boolean forceUseUpdate) { boolean isWriteModeLegal = writeMode.trim().toLowerCase().startsWith("insert") || writeMode.trim().toLowerCase().startsWith("replace") || writeMode.trim().toLowerCase().startsWith("update"); if (!isWriteModeLegal) { throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE, String.format("您所配置的 writeMode:%s 错误. 因为DataX 目前仅支持replace,update 或 insert 方式. 请检查您的配置并作出修改.", writeMode)); } // && writeMode.trim().toLowerCase().startsWith("replace") String writeDataSqlTemplate; if (forceUseUpdate || ((dataBaseType == DataBaseType.MySql || dataBaseType == DataBaseType.Tddl) && writeMode.trim().toLowerCase().startsWith("update")) ) { //update只在mysql下使用 writeDataSqlTemplate = new StringBuilder() .append("INSERT INTO %s (").append(StringUtils.join(columnHolders, ",")) .append(") VALUES(").append(StringUtils.join(valueHolders, ",")) .append(")") .append(onDuplicateKeyUpdateString(columnHolders)) .toString(); } else { //这里是保护,如果其他错误的使用了update,需要更换为replace if (writeMode.trim().toLowerCase().startsWith("update")) { writeMode = "replace"; } writeDataSqlTemplate = new StringBuilder().append(writeMode) .append(" INTO %s (").append(StringUtils.join(columnHolders, ",")) .append(") VALUES(").append(StringUtils.join(valueHolders, ",")) .append(")").toString(); } return writeDataSqlTemplate; } public static String onDuplicateKeyUpdateString(List columnHolders){ if (columnHolders == null || columnHolders.size() < 1) { return ""; } StringBuilder sb = new StringBuilder(); sb.append(" ON DUPLICATE KEY UPDATE "); boolean first = true; for(String column:columnHolders){ if(!first){ sb.append(","); }else{ first = false; } sb.append(column); sb.append("=VALUES("); sb.append(column); sb.append(")"); } return sb.toString(); } public static void preCheckPrePareSQL(Configuration originalConfig, DataBaseType type) { List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); Configuration connConf = Configuration.from(conns.get(0).toString()); String table = connConf.getList(Key.TABLE, String.class).get(0); List preSqls = originalConfig.getList(Key.PRE_SQL, String.class); List renderedPreSqls = WriterUtil.renderPreOrPostSqls( preSqls, table); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { LOG.info("Begin to preCheck preSqls:[{}].", StringUtils.join(renderedPreSqls, ";")); for(String sql : renderedPreSqls) { try{ DBUtil.sqlValid(sql, type); }catch(ParserException e) { throw RdbmsException.asPreSQLParserException(type,e,sql); } } } } public static void preCheckPostSQL(Configuration originalConfig, DataBaseType type) { List conns = originalConfig.getList(Constant.CONN_MARK, Object.class); Configuration connConf = Configuration.from(conns.get(0).toString()); String table = connConf.getList(Key.TABLE, String.class).get(0); List postSqls = originalConfig.getList(Key.POST_SQL, String.class); List renderedPostSqls = WriterUtil.renderPreOrPostSqls( postSqls, table); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { LOG.info("Begin to preCheck postSqls:[{}].", StringUtils.join(renderedPostSqls, ";")); for(String sql : renderedPostSqls) { try{ DBUtil.sqlValid(sql, type); }catch(ParserException e){ throw RdbmsException.asPostSQLParserException(type,e,sql); } } } } } ================================================ FILE: plugin-unstructured-storage-util/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT plugin-unstructured-storage-util plugin-unstructured-storage-util plugin-unstructured-storage-util通用的文件类型的读取写入方法, 供TxtFileReader/Writer, OSSReader/Writer ,FtpReader/Writer, HdfsReader/Writer使用。 jar 2.7.1 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 net.sourceforge.javacsv javacsv 2.0 org.apache.commons commons-compress 1.9 org.anarres.lzo lzo-core 1.0.5 com.aliyun.oss aliyun-sdk-oss 2.0.2 test io.airlift aircompressor 0.3 com.facebook.presto.hadoop hadoop-apache2 0.3 provided junit junit test commons-beanutils commons-beanutils 1.9.2 org.apache.hadoop hadoop-common ${hadoop.version} org.apache.commons commons-compress ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/FileFormat.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.Constant; import com.alibaba.datax.plugin.unstructuredstorage.reader.Key; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import org.apache.commons.lang3.StringUtils; import java.util.Arrays; /** * @Author: guxuan * @Date 2022-05-17 16:04 */ public enum FileFormat { TEXT("text"), CSV("csv"), EXCEL("excel"), BINARY("binary"); private String fileFormat; private boolean isText; private boolean isCsv; private boolean isExcel; private boolean isBinary; FileFormat(String fileFormat) { this.fileFormat = fileFormat.toLowerCase(); } /** * 获取文件类型: 目前支持text,csv,excel,binary * @param configuration * @return */ public static FileFormat getFileFormatByConfiguration(Configuration configuration) { String fileFormat = configuration.getString(Key.FILE_FORMAT, Constant.DEFAULT_FILE_FORMAT); return FileFormat.getByTypeName(fileFormat); } public String getFileFormat() { return this.fileFormat; } public static FileFormat getByTypeName(String fileFormat) { for (FileFormat fFormat : values()) { if (fFormat.fileFormat.equalsIgnoreCase(fileFormat)) { return fFormat; } } throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("DataX 不支持该 fileFormat 类型:%s, 目前支持的 fileFormat 类型是:%s", fileFormat, Arrays.asList(values()))); } public boolean equalsIgnoreCase(String fileFormat){ return StringUtils.equalsIgnoreCase(fileFormat, this.fileFormat); } public boolean isText() { return this.equalsIgnoreCase(Constant.FILE_FORMAT_TEXT); } public void setText(boolean text) { isText = text; } public boolean isCsv() { return this.equalsIgnoreCase(Constant.FILE_FORMAT_CSV); } public void setCsv(boolean csv) { isCsv = csv; } public boolean isExcel() { return this.equalsIgnoreCase(Constant.FILE_FORMAT_EXCEL); } public void setExcel(boolean excel) { isExcel = excel; } public boolean isBinary() { return this.equalsIgnoreCase(Constant.FILE_FORMAT_BINARY); } public void setBinary(boolean binary) { isBinary = binary; } @Override public String toString(){ return this.fileFormat; } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/LocalStrings.properties ================================================ fileformaterror.1=DataX \u4E0D\u652F\u6301\u8BE5 fileFormat \u7C7B\u578B:{0}, \u76EE\u524D\u652F\u6301\u7684 fileFormat \u7C7B\u578B\u662F:{1} ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/LocalStrings_en_US.properties ================================================ fileformaterror.1=DataX \u4E0D\u652F\u6301\u8BE5 fileFormat \u7C7B\u578B:{0}, \u76EE\u524D\u652F\u6301\u7684 fileFormat \u7C7B\u578B\u662F:{1} ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/LocalStrings_ja_JP.properties ================================================ fileformaterror.1=DataX \u4E0D\u652F\u6301\u8BE5 fileFormat \u7C7B\u578B:{0}, \u76EE\u524D\u652F\u6301\u7684 fileFormat \u7C7B\u578B\u662F:{1} ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/LocalStrings_zh_CN.properties ================================================ fileformaterror.1=DataX \u4E0D\u652F\u6301\u8BE5 fileFormat \u7C7B\u578B:{0}, \u76EE\u524D\u652F\u6301\u7684 fileFormat \u7C7B\u578B\u662F:{1} ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/LocalStrings_zh_HK.properties ================================================ fileformaterror.1=DataX \u4E0D\u652F\u6301\u8BE5 fileFormat \u7C7B\u578B:{0}, \u76EE\u524D\u652F\u6301\u7684 fileFormat \u7C7B\u578B\u662F:{1}fileformaterror.1=DataX不支持該fileFormat類型:{0},現時支持的fileFormat類型是:{1} ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/LocalStrings_zh_TW.properties ================================================ fileformaterror.1=DataX \u4E0D\u652F\u6301\u8BE5 fileFormat \u7C7B\u578B:{0}, \u76EE\u524D\u652F\u6301\u7684 fileFormat \u7C7B\u578B\u662F:{1}fileformaterror.1=DataX不支持該fileFormat類型:{0},現時支持的fileFormat類型是:{1} ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/ColumnEntry.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import java.text.DateFormat; import java.text.SimpleDateFormat; public class ColumnEntry { private Integer index; private String type; private String value; private String format; private DateFormat dateParse; private String name; public String getName() { return name; } public void setName(String name) { this.name = name; } public Integer getIndex() { return index; } public void setIndex(Integer index) { this.index = index; } public String getType() { return type; } public void setType(String type) { this.type = type; } public String getValue() { return value; } public void setValue(String value) { this.value = value; } public String getFormat() { return format; } public void setFormat(String format) { this.format = format; if (StringUtils.isNotBlank(this.format)) { this.dateParse = new SimpleDateFormat(this.format); } } public DateFormat getDateFormat() { return this.dateParse; } public String toJSONString() { return ColumnEntry.toJSONString(this); } public static String toJSONString(ColumnEntry columnEntry) { return JSON.toJSONString(columnEntry); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/Constant.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader; public class Constant { public static final String DEFAULT_ENCODING = "UTF-8"; public static final char DEFAULT_FIELD_DELIMITER = ','; public static final boolean DEFAULT_SKIP_HEADER = false; public static final String DEFAULT_NULL_FORMAT = "\\N"; public static final Integer DEFAULT_BUFFER_SIZE = 8192; public static final String FILE_FORMAT_CSV = "csv"; public static final String FILE_FORMAT_TEXT = "text"; public static final String FILE_FORMAT_EXCEL = "excel"; public static final String FILE_FORMAT_BINARY = "binary"; public static final String DEFAULT_FILE_FORMAT = "csv"; public static final Boolean DEFAULE_SKIP_TEXT_EMPTY_RECORDS = true; public static final String EXCEL_VERSION_03_OR_EARLIER = "03_OR_EARLIER"; public static final String EXCEL_VERSION_07_OR_LATER = "07_OR_LATER"; /** * 文件全限定名 * */ public static final String SOURCE_FILE = "sourceFile"; /** * 单纯的文件名 * */ public static final String SOURCE_FILE_NAME = "sourceFileName"; public static final boolean DEFAULT_OUTPUT_SHEET_NAME = false; /** * TODO 暂时先不考虑整个文件夹同步 * 在同步音视频等二进制文件的情况下: * 半结构读插件(txtfilreader, ftpreader, hdfsreader, ossreader)需要将相对文件路径注入 RELATIVE_SOURCE_FILE 属性 * 目的是半结构化写插件可以统一使用 RELATIVE_SOURCE_FILE 获取到读端插件的所有二进制文件名及其相对路径。 * 举个栗子: * 读端插件PATH配置了/home/admin/myapp/ */ public static final String RELATIVE_SOURCE_FILE = "relativeSourceFile"; /** * 默认读取二进制文件一次性读取的Byte数目: 1048576 Byte [1MB] */ public static final int DEFAULT_BLOCK_SIZE_IN_BYTE = 1048576; } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/ExpandLzopInputStream.java ================================================ /* * description: * * 使用了shevek在github上开源的lzo解压缩代码(https://github.com/shevek/lzo-java) * * 继承LzopInputStream的原因是因为开源版本代码中LZO_LIBRARY_VERSION是这样定义的: * public static final short LZO_LIBRARY_VERSION = 0x2050; * 而很多lzo文件LZO_LIBRARY_VERSION是0x2060,要解压这种version的lzo文件,必须要更改 * LZO_LIBRARY_VERSION的值,才不会抛异常,而LZO_LIBRARY_VERSION是final类型的,无法更改 * 其值,于是继承了LzopInputStream的类,重新定义了LZO_LIBRARY_VERSION的值。 * */ package com.alibaba.datax.plugin.unstructuredstorage.reader; import org.anarres.lzo.LzoVersion; import org.anarres.lzo.LzopConstants; import org.anarres.lzo.LzopInputStream; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import javax.annotation.Nonnegative; import javax.annotation.Nonnull; import java.io.IOException; import java.io.InputStream; import java.util.Arrays; import java.util.zip.Adler32; import java.util.zip.CRC32; /** * Created by mingya.wmy on 16/8/26. */ public class ExpandLzopInputStream extends LzopInputStream { public ExpandLzopInputStream(@Nonnull InputStream in) throws IOException { super(in); } /** * Read and verify an lzo header, setting relevant block checksum options * and ignoring most everything else. */ @Override protected int readHeader() throws IOException { short LZO_LIBRARY_VERSION = 0x2060; Log LOG = LogFactory.getLog(LzopInputStream.class); byte[] LZOP_MAGIC = new byte[]{ -119, 'L', 'Z', 'O', 0, '\r', '\n', '\032', '\n'}; byte[] buf = new byte[9]; readBytes(buf, 0, 9); if (!Arrays.equals(buf, LZOP_MAGIC)) throw new IOException("Invalid LZO header"); Arrays.fill(buf, (byte) 0); Adler32 adler = new Adler32(); CRC32 crc32 = new CRC32(); int hitem = readHeaderItem(buf, 2, adler, crc32); // lzop version if (hitem > LzopConstants.LZOP_VERSION) { LOG.debug("Compressed with later version of lzop: " + Integer.toHexString(hitem) + " (expected 0x" + Integer.toHexString(LzopConstants.LZOP_VERSION) + ")"); } hitem = readHeaderItem(buf, 2, adler, crc32); // lzo library version if (hitem > LZO_LIBRARY_VERSION) { throw new IOException("Compressed with incompatible lzo version: 0x" + Integer.toHexString(hitem) + " (expected 0x" + Integer.toHexString(LzoVersion.LZO_LIBRARY_VERSION) + ")"); } hitem = readHeaderItem(buf, 2, adler, crc32); // lzop extract version if (hitem > LzopConstants.LZOP_VERSION) { throw new IOException("Compressed with incompatible lzop version: 0x" + Integer.toHexString(hitem) + " (expected 0x" + Integer.toHexString(LzopConstants.LZOP_VERSION) + ")"); } hitem = readHeaderItem(buf, 1, adler, crc32); // method switch (hitem) { case LzopConstants.M_LZO1X_1: case LzopConstants.M_LZO1X_1_15: case LzopConstants.M_LZO1X_999: break; default: throw new IOException("Invalid strategy " + Integer.toHexString(hitem)); } readHeaderItem(buf, 1, adler, crc32); // ignore level // flags int flags = readHeaderItem(buf, 4, adler, crc32); boolean useCRC32 = (flags & LzopConstants.F_H_CRC32) != 0; boolean extraField = (flags & LzopConstants.F_H_EXTRA_FIELD) != 0; if ((flags & LzopConstants.F_MULTIPART) != 0) throw new IOException("Multipart lzop not supported"); if ((flags & LzopConstants.F_H_FILTER) != 0) throw new IOException("lzop filter not supported"); if ((flags & LzopConstants.F_RESERVED) != 0) throw new IOException("Unknown flags in header"); // known !F_H_FILTER, so no optional block readHeaderItem(buf, 4, adler, crc32); // ignore mode readHeaderItem(buf, 4, adler, crc32); // ignore mtime readHeaderItem(buf, 4, adler, crc32); // ignore gmtdiff hitem = readHeaderItem(buf, 1, adler, crc32); // fn len if (hitem > 0) { byte[] tmp = (hitem > buf.length) ? new byte[hitem] : buf; readHeaderItem(tmp, hitem, adler, crc32); // skip filename } int checksum = (int) (useCRC32 ? crc32.getValue() : adler.getValue()); hitem = readHeaderItem(buf, 4, adler, crc32); // read checksum if (hitem != checksum) { throw new IOException("Invalid header checksum: " + Long.toHexString(checksum) + " (expected 0x" + Integer.toHexString(hitem) + ")"); } if (extraField) { // lzop 1.08 ultimately ignores this LOG.debug("Extra header field not processed"); adler.reset(); crc32.reset(); hitem = readHeaderItem(buf, 4, adler, crc32); readHeaderItem(new byte[hitem], hitem, adler, crc32); checksum = (int) (useCRC32 ? crc32.getValue() : adler.getValue()); if (checksum != readHeaderItem(buf, 4, adler, crc32)) { throw new IOException("Invalid checksum for extra header field"); } } return flags; } private int readHeaderItem(@Nonnull byte[] buf, @Nonnegative int len, @Nonnull Adler32 adler, @Nonnull CRC32 crc32) throws IOException { int ret = readInt(buf, len); adler.update(buf, 0, len); crc32.update(buf, 0, len); Arrays.fill(buf, (byte) 0); return ret; } /** * Read len bytes into buf, st LSB of int returned is the last byte of the * first word read. */ // @Nonnegative ? private int readInt(@Nonnull byte[] buf, @Nonnegative int len) throws IOException { readBytes(buf, 0, len); int ret = (0xFF & buf[0]) << 24; ret |= (0xFF & buf[1]) << 16; ret |= (0xFF & buf[2]) << 8; ret |= (0xFF & buf[3]); return (len > 3) ? ret : (ret >>> (8 * (4 - len))); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/Key.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader; /** * Created by haiwei.luo on 14-12-5. */ public class Key { public static final String COLUMN = "column"; public static final String ENCODING = "encoding"; public static final String FIELD_DELIMITER = "fieldDelimiter"; public static final String SKIP_HEADER = "skipHeader"; public static final String TYPE = "type"; public static final String FORMAT = "format"; public static final String INDEX = "index"; public static final String VALUE = "value"; public static final String COMPRESS = "compress"; public static final String NULL_FORMAT = "nullFormat"; public static final String BUFFER_SIZE = "bufferSize"; public static final String CSV_READER_CONFIG = "csvReaderConfig"; public static final String MARK_DONE_FILE_NAME = "markDoneFileName"; public static final String MARK_DOING_FILE_NAME = "markDoingFileName"; // public static final String RETRY_TIME = "retryTime"; public final static String MAX_RETRY_TIME = "maxRetryTime"; public final static String RETRY_INTERVAL = "retryInterval"; public static final String TEXT_READER_CONFIG = "textReaderConfig"; public static final String SKIP_EMPTY_RECORDS = "skipEmptyRecords"; public static final String EXCEL_READER_CONFIG = "excelReaderConfig"; public static final String EXCEL_SHEET_NAME = "excelSheetName"; public static final String VERSION = "version"; public static final String OUTPUT_SHEET_NAME = "outputSheetName"; /** * csv or text or excel */ public static final String FILE_FORMAT = "fileFormat"; /** * 是否把一个file当做一个column */ public static final String FILE_AS_COLUMN = "fileAsColumn"; /** * 读取二进制文件一次性读取的Byte数目 */ public static final String BLOCK_SIZE_IN_BYTE = "blockSizeInByte"; /** * 半结构化标示一个Record来源的绝对文件路径名,可以是ftp文件,oss的object等 * */ public static final String META_KEY_FILE_PATH = "filePath"; /** * 多文件切分的工作项,Task通过此配置项表示工作内容, 文件内部切分相关key */ public static final String SPLIT_SLICE_CONFIG = "__splitSliceConfig"; public static final String SPLIT_SLICE_FILE_PATH = "filePath"; public static final String SPLIT_SLICE_START_POINT = "startPoint"; public static final String SPLIT_SLICE_END_POINT = "endPoint"; /** * tar.gz压缩包,支持配置 tarFileFilterPattern 参数,来过滤要同步的文件 * For Example: * "tarFileFilterPattern" : "*.dat" * * 同步的时候,只同步 tar.gz 里面文件名后缀为 .dat 的文件 */ public static final String TAR_FILE_FILTER_PATTERN = "tarFileFilterPattern"; public static final String ENABLE_INNER_SPLIT = "enableInnerSplit"; public static final String HIVE_PARTION_COLUMN = "hivePartitionColumn"; } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/UnstructuredStorageReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by haiwei.luo on 14-9-20. */ public enum UnstructuredStorageReaderErrorCode implements ErrorCode { CONFIG_INVALID_EXCEPTION("UnstructuredStorageReader-00", "您的参数配置错误."), NOT_SUPPORT_TYPE("UnstructuredStorageReader-01","您配置的列类型暂不支持."), REQUIRED_VALUE("UnstructuredStorageReader-02", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("UnstructuredStorageReader-03", "您填写的参数值不合法."), MIXED_INDEX_VALUE("UnstructuredStorageReader-04", "您的列信息配置同时包含了index,value."), NO_INDEX_VALUE("UnstructuredStorageReader-05","您明确的配置列信息,但未填写相应的index,value."), FILE_NOT_EXISTS("UnstructuredStorageReader-06", "您配置的源路径不存在."), OPEN_FILE_WITH_CHARSET_ERROR("UnstructuredStorageReader-07", "您配置的编码和实际存储编码不符合."), OPEN_FILE_ERROR("UnstructuredStorageReader-08", "您配置的源在打开时异常,建议您检查源源是否有隐藏实体,管道文件等特殊文件."), READ_FILE_IO_ERROR("UnstructuredStorageReader-09", "您配置的文件在读取时出现IO异常."), SECURITY_NOT_ENOUGH("UnstructuredStorageReader-10", "您缺少权限执行相应的文件读取操作."), RUNTIME_EXCEPTION("UnstructuredStorageReader-11", "出现运行时异常, 请联系我们"); private final String code; private final String description; private UnstructuredStorageReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/UnstructuredStorageReaderUtil.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONObject; import com.alibaba.fastjson2.TypeReference; import com.csvreader.CsvReader; import org.apache.commons.beanutils.BeanUtils; import io.airlift.compress.snappy.SnappyCodec; import io.airlift.compress.snappy.SnappyFramedInputStream; import org.anarres.lzo.*; import org.apache.commons.compress.compressors.CompressorInputStream; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; import org.apache.commons.io.Charsets; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.apache.hadoop.io.compress.CompressionCodec; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.*; import java.nio.charset.UnsupportedCharsetException; import java.text.DateFormat; import java.util.*; public class UnstructuredStorageReaderUtil { private static final Logger LOG = LoggerFactory .getLogger(UnstructuredStorageReaderUtil.class); public static HashMap csvReaderConfigMap; private UnstructuredStorageReaderUtil() { } /** * @param inputLine * 输入待分隔字符串 * @param delimiter * 字符串分割符 * @return 分隔符分隔后的字符串数组,出现异常时返回为null 支持转义,即数据中可包含分隔符 * */ public static String[] splitOneLine(String inputLine, char delimiter) { String[] splitedResult = null; if (null != inputLine) { try { CsvReader csvReader = new CsvReader(new StringReader(inputLine)); csvReader.setDelimiter(delimiter); setCsvReaderConfig(csvReader); if (csvReader.readRecord()) { splitedResult = csvReader.getValues(); } } catch (IOException e) { // nothing to do } } return splitedResult; } public static String[] splitBufferedReader(CsvReader csvReader) throws IOException { String[] splitedResult = null; if (csvReader.readRecord()) { splitedResult = csvReader.getValues(); } return splitedResult; } /** * 不支持转义 * * @return 分隔符分隔后的字符串数, * */ public static String[] splitOneLine(String inputLine, String delimiter) { String[] splitedResult = StringUtils.split(inputLine, delimiter); return splitedResult; } public static void readFromStream(InputStream inputStream, String context, Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector) { String compress = readerSliceConfig.getString(Key.COMPRESS, null); if (StringUtils.isBlank(compress)) { compress = null; } String encoding = readerSliceConfig.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); // handle blank encoding if (StringUtils.isBlank(encoding)) { encoding = Constant.DEFAULT_ENCODING; LOG.warn(String.format("您配置的encoding为[%s], 使用默认值[%s]", encoding, Constant.DEFAULT_ENCODING)); } List column = readerSliceConfig .getListConfiguration(Key.COLUMN); // handle ["*"] -> [], null if (null != column && 1 == column.size() && "\"*\"".equals(column.get(0).toString())) { readerSliceConfig.set(Key.COLUMN, null); column = null; } BufferedReader reader = null; int bufferSize = readerSliceConfig.getInt(Key.BUFFER_SIZE, Constant.DEFAULT_BUFFER_SIZE); // compress logic try { if (null == compress) { reader = new BufferedReader(new InputStreamReader(inputStream, encoding), bufferSize); } else { // TODO compress if ("lzo_deflate".equalsIgnoreCase(compress)) { LzoInputStream lzoInputStream = new LzoInputStream( inputStream, new LzoDecompressor1x_safe()); reader = new BufferedReader(new InputStreamReader( lzoInputStream, encoding)); } else if ("lzo".equalsIgnoreCase(compress)) { LzoInputStream lzopInputStream = new ExpandLzopInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( lzopInputStream, encoding)); } else if ("gzip".equalsIgnoreCase(compress)) { CompressorInputStream compressorInputStream = new GzipCompressorInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( compressorInputStream, encoding), bufferSize); } else if ("bzip2".equalsIgnoreCase(compress)) { CompressorInputStream compressorInputStream = new BZip2CompressorInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( compressorInputStream, encoding), bufferSize); } else if ("hadoop-snappy".equalsIgnoreCase(compress)) { CompressionCodec snappyCodec = new SnappyCodec(); InputStream snappyInputStream = snappyCodec.createInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( snappyInputStream, encoding)); } else if ("framing-snappy".equalsIgnoreCase(compress)) { InputStream snappyInputStream = new SnappyFramedInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( snappyInputStream, encoding)); }/* else if ("lzma".equalsIgnoreCase(compress)) { CompressorInputStream compressorInputStream = new LZMACompressorInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( compressorInputStream, encoding)); } *//*else if ("pack200".equalsIgnoreCase(compress)) { CompressorInputStream compressorInputStream = new Pack200CompressorInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( compressorInputStream, encoding)); } *//*else if ("xz".equalsIgnoreCase(compress)) { CompressorInputStream compressorInputStream = new XZCompressorInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( compressorInputStream, encoding)); } else if ("ar".equalsIgnoreCase(compress)) { ArArchiveInputStream arArchiveInputStream = new ArArchiveInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( arArchiveInputStream, encoding)); } else if ("arj".equalsIgnoreCase(compress)) { ArjArchiveInputStream arjArchiveInputStream = new ArjArchiveInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( arjArchiveInputStream, encoding)); } else if ("cpio".equalsIgnoreCase(compress)) { CpioArchiveInputStream cpioArchiveInputStream = new CpioArchiveInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( cpioArchiveInputStream, encoding)); } else if ("dump".equalsIgnoreCase(compress)) { DumpArchiveInputStream dumpArchiveInputStream = new DumpArchiveInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( dumpArchiveInputStream, encoding)); } else if ("jar".equalsIgnoreCase(compress)) { JarArchiveInputStream jarArchiveInputStream = new JarArchiveInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( jarArchiveInputStream, encoding)); } else if ("tar".equalsIgnoreCase(compress)) { TarArchiveInputStream tarArchiveInputStream = new TarArchiveInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( tarArchiveInputStream, encoding)); }*/ else if ("zip".equalsIgnoreCase(compress)) { ZipCycleInputStream zipCycleInputStream = new ZipCycleInputStream( inputStream); reader = new BufferedReader(new InputStreamReader( zipCycleInputStream, encoding), bufferSize); } else { throw DataXException .asDataXException( UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("仅支持 gzip, bzip2, zip, lzo, lzo_deflate, hadoop-snappy, framing-snappy" + "文件压缩格式 , 不支持您配置的文件压缩格式: [%s]", compress)); } } UnstructuredStorageReaderUtil.doReadFromStream(reader, context, readerSliceConfig, recordSender, taskPluginCollector); } catch (UnsupportedEncodingException uee) { throw DataXException .asDataXException( UnstructuredStorageReaderErrorCode.OPEN_FILE_WITH_CHARSET_ERROR, String.format("不支持的编码格式 : [%s]", encoding), uee); } catch (NullPointerException e) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.RUNTIME_EXCEPTION, "运行时错误, 请联系我们", e); }/* catch (ArchiveException e) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.READ_FILE_IO_ERROR, String.format("压缩文件流读取错误 : [%s]", context), e); } */catch (IOException e) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.READ_FILE_IO_ERROR, String.format("流读取错误 : [%s]", context), e); } finally { IOUtils.closeQuietly(reader); } } public static void doReadFromStream(BufferedReader reader, String context, Configuration readerSliceConfig, RecordSender recordSender, TaskPluginCollector taskPluginCollector) { String encoding = readerSliceConfig.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); Character fieldDelimiter = null; String delimiterInStr = readerSliceConfig .getString(Key.FIELD_DELIMITER); if (null != delimiterInStr && 1 != delimiterInStr.length()) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("仅仅支持单字符切分, 您配置的切分为 : [%s]", delimiterInStr)); } if (null == delimiterInStr) { LOG.warn(String.format("您没有配置列分隔符, 使用默认值[%s]", Constant.DEFAULT_FIELD_DELIMITER)); } // warn: default value ',', fieldDelimiter could be \n(lineDelimiter) // for no fieldDelimiter fieldDelimiter = readerSliceConfig.getChar(Key.FIELD_DELIMITER, Constant.DEFAULT_FIELD_DELIMITER); Boolean skipHeader = readerSliceConfig.getBool(Key.SKIP_HEADER, Constant.DEFAULT_SKIP_HEADER); // warn: no default value '\N' String nullFormat = readerSliceConfig.getString(Key.NULL_FORMAT); // warn: Configuration -> List for performance // List column = readerSliceConfig // .getListConfiguration(Key.COLUMN); List column = UnstructuredStorageReaderUtil .getListColumnEntry(readerSliceConfig, Key.COLUMN); CsvReader csvReader = null; // every line logic try { // TODO lineDelimiter if (skipHeader) { String fetchLine = reader.readLine(); LOG.info(String.format("Header line %s has been skiped.", fetchLine)); } csvReader = new CsvReader(reader); csvReader.setDelimiter(fieldDelimiter); setCsvReaderConfig(csvReader); String[] parseRows; while ((parseRows = UnstructuredStorageReaderUtil .splitBufferedReader(csvReader)) != null) { UnstructuredStorageReaderUtil.transportOneRecord(recordSender, column, parseRows, nullFormat, taskPluginCollector); } } catch (UnsupportedEncodingException uee) { throw DataXException .asDataXException( UnstructuredStorageReaderErrorCode.OPEN_FILE_WITH_CHARSET_ERROR, String.format("不支持的编码格式 : [%s]", encoding), uee); } catch (FileNotFoundException fnfe) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.FILE_NOT_EXISTS, String.format("无法找到文件 : [%s]", context), fnfe); } catch (IOException ioe) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.READ_FILE_IO_ERROR, String.format("读取文件错误 : [%s]", context), ioe); } catch (Exception e) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.RUNTIME_EXCEPTION, String.format("运行时异常 : %s", e.getMessage()), e); } finally { csvReader.close(); IOUtils.closeQuietly(reader); } } public static Record transportOneRecord(RecordSender recordSender, Configuration configuration, TaskPluginCollector taskPluginCollector, String line){ List column = UnstructuredStorageReaderUtil .getListColumnEntry(configuration, Key.COLUMN); // 注意: nullFormat 没有默认值 String nullFormat = configuration.getString(Key.NULL_FORMAT); String delimiterInStr = configuration.getString(Key.FIELD_DELIMITER); if (null != delimiterInStr && 1 != delimiterInStr.length()) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("仅仅支持单字符切分, 您配置的切分为 : [%s]", delimiterInStr)); } if (null == delimiterInStr) { LOG.warn(String.format("您没有配置列分隔符, 使用默认值[%s]", Constant.DEFAULT_FIELD_DELIMITER)); } // warn: default value ',', fieldDelimiter could be \n(lineDelimiter) // for no fieldDelimiter Character fieldDelimiter = configuration.getChar(Key.FIELD_DELIMITER, Constant.DEFAULT_FIELD_DELIMITER); String[] sourceLine = StringUtils.split(line, fieldDelimiter); return transportOneRecord(recordSender, column, sourceLine, nullFormat, taskPluginCollector); } public static Record transportOneRecord(RecordSender recordSender, List columnConfigs, String[] sourceLine, String nullFormat, TaskPluginCollector taskPluginCollector) { Record record = recordSender.createRecord(); Column columnGenerated = null; // 创建都为String类型column的record if (null == columnConfigs || columnConfigs.size() == 0) { for (String columnValue : sourceLine) { // not equalsIgnoreCase, it's all ok if nullFormat is null if (columnValue.equals(nullFormat)) { columnGenerated = new StringColumn(null); } else { columnGenerated = new StringColumn(columnValue); } record.addColumn(columnGenerated); } recordSender.sendToWriter(record); } else { try { for (ColumnEntry columnConfig : columnConfigs) { String columnType = columnConfig.getType(); Integer columnIndex = columnConfig.getIndex(); String columnConst = columnConfig.getValue(); String columnValue = null; if (null == columnIndex && null == columnConst) { throw DataXException .asDataXException( UnstructuredStorageReaderErrorCode.NO_INDEX_VALUE, "由于您配置了type, 则至少需要配置 index 或 value"); } if (null != columnIndex && null != columnConst) { throw DataXException .asDataXException( UnstructuredStorageReaderErrorCode.MIXED_INDEX_VALUE, "您混合配置了index, value, 每一列同时仅能选择其中一种"); } if (null != columnIndex) { if (columnIndex >= sourceLine.length) { String message = String .format("您尝试读取的列越界,源文件该行有 [%s] 列,您尝试读取第 [%s] 列, 数据详情[%s]", sourceLine.length, columnIndex + 1, StringUtils.join(sourceLine, ",")); LOG.warn(message); throw new IndexOutOfBoundsException(message); } columnValue = sourceLine[columnIndex]; } else { columnValue = columnConst; } Type type = Type.valueOf(columnType.toUpperCase()); // it's all ok if nullFormat is null if (columnValue.equals(nullFormat)) { columnValue = null; } switch (type) { case STRING: columnGenerated = new StringColumn(columnValue); break; case LONG: try { columnGenerated = new LongColumn(columnValue); } catch (Exception e) { throw new IllegalArgumentException(String.format( "类型转换错误, 无法将[%s] 转换为[%s]", columnValue, "LONG")); } break; case DOUBLE: try { columnGenerated = new DoubleColumn(columnValue); } catch (Exception e) { throw new IllegalArgumentException(String.format( "类型转换错误, 无法将[%s] 转换为[%s]", columnValue, "DOUBLE")); } break; case BOOLEAN: try { columnGenerated = new BoolColumn(columnValue); } catch (Exception e) { throw new IllegalArgumentException(String.format( "类型转换错误, 无法将[%s] 转换为[%s]", columnValue, "BOOLEAN")); } break; case DATE: try { if (columnValue == null) { Date date = null; columnGenerated = new DateColumn(date); } else { String formatString = columnConfig.getFormat(); //if (null != formatString) { if (StringUtils.isNotBlank(formatString)) { // 用户自己配置的格式转换, 脏数据行为出现变化 DateFormat format = columnConfig .getDateFormat(); columnGenerated = new DateColumn( format.parse(columnValue)); } else { // 框架尝试转换 columnGenerated = new DateColumn( new StringColumn(columnValue) .asDate()); } } } catch (Exception e) { throw new IllegalArgumentException(String.format( "类型转换错误, 无法将[%s] 转换为[%s]", columnValue, "DATE")); } break; default: String errorMessage = String.format( "您配置的列类型暂不支持 : [%s]", columnType); LOG.error(errorMessage); throw DataXException .asDataXException( UnstructuredStorageReaderErrorCode.NOT_SUPPORT_TYPE, errorMessage); } record.addColumn(columnGenerated); } recordSender.sendToWriter(record); } catch (IllegalArgumentException iae) { taskPluginCollector .collectDirtyRecord(record, iae.getMessage()); } catch (IndexOutOfBoundsException ioe) { taskPluginCollector .collectDirtyRecord(record, ioe.getMessage()); } catch (Exception e) { if (e instanceof DataXException) { throw (DataXException) e; } // 每一种转换失败都是脏数据处理,包括数字格式 & 日期格式 taskPluginCollector.collectDirtyRecord(record, e.getMessage()); } } return record; } public static List getListColumnEntry( Configuration configuration, final String path) { List lists = configuration.getList(path, JSONObject.class); if (lists == null) { return null; } List result = new ArrayList(); for (final JSONObject object : lists) { result.add(JSON.parseObject(object.toJSONString(), ColumnEntry.class)); } return result; } private enum Type { STRING, LONG, BOOLEAN, DOUBLE, DATE, ; } /** * check parameter:encoding, compress, filedDelimiter * */ public static void validateParameter(Configuration readerConfiguration) { // encoding check validateEncoding(readerConfiguration); //only support compress types validateCompress(readerConfiguration); //fieldDelimiter check validateFieldDelimiter(readerConfiguration); // column: 1. index type 2.value type 3.when type is Date, may have format validateColumn(readerConfiguration); } public static void validateEncoding(Configuration readerConfiguration) { // encoding check String encoding = readerConfiguration .getString( com.alibaba.datax.plugin.unstructuredstorage.reader.Key.ENCODING, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.DEFAULT_ENCODING); try { encoding = encoding.trim(); readerConfiguration.set(Key.ENCODING, encoding); Charsets.toCharset(encoding); } catch (UnsupportedCharsetException uce) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("不支持您配置的编码格式 : [%s]", encoding), uce); } catch (Exception e) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.CONFIG_INVALID_EXCEPTION, String.format("编码配置异常, 请联系我们: %s", e.getMessage()), e); } } public static void validateCompress(Configuration readerConfiguration) { String compress =readerConfiguration .getUnnecessaryValue(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COMPRESS,null,null); if(StringUtils.isNotBlank(compress)){ compress = compress.toLowerCase().trim(); boolean compressTag = "gzip".equals(compress) || "bzip2".equals(compress) || "zip".equals(compress) || "lzo".equals(compress) || "lzo_deflate".equals(compress) || "hadoop-snappy".equals(compress) || "framing-snappy".equals(compress); if (!compressTag) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("仅支持 gzip, bzip2, zip, lzo, lzo_deflate, hadoop-snappy, framing-snappy " + "文件压缩格式, 不支持您配置的文件压缩格式: [%s]", compress)); } }else{ // 用户可能配置的是 compress:"",空字符串,需要将compress设置为null compress = null; } readerConfiguration.set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COMPRESS, compress); } public static void validateFieldDelimiter(Configuration readerConfiguration) { //fieldDelimiter check String delimiterInStr = readerConfiguration.getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.FIELD_DELIMITER,null); if(null == delimiterInStr){ throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.REQUIRED_VALUE, String.format("您提供配置文件有误,[%s]是必填参数.", com.alibaba.datax.plugin.unstructuredstorage.reader.Key.FIELD_DELIMITER)); }else if(1 != delimiterInStr.length()){ // warn: if have, length must be one throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("仅仅支持单字符切分, 您配置的切分为 : [%s]", delimiterInStr)); } } public static void validateColumn(Configuration readerConfiguration) { // column: 1. index type 2.value type 3.when type is Date, may have // format List columns = readerConfiguration .getListConfiguration(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN); if (null == columns || columns.size() == 0) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.REQUIRED_VALUE, "您需要指定 columns"); } // handle ["*"] if (null != columns && 1 == columns.size()) { String columnsInStr = columns.get(0).toString(); if ("\"*\"".equals(columnsInStr) || "'*'".equals(columnsInStr)) { readerConfiguration.set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN, null); columns = null; } } if (null != columns && columns.size() != 0) { for (Configuration eachColumnConf : columns) { eachColumnConf.getNecessaryValue(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.TYPE, UnstructuredStorageReaderErrorCode.REQUIRED_VALUE); Integer columnIndex = eachColumnConf .getInt(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.INDEX); String columnValue = eachColumnConf .getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.VALUE); if (null == columnIndex && null == columnValue) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.NO_INDEX_VALUE, "由于您配置了type, 则至少需要配置 index 或 value"); } if (null != columnIndex && null != columnValue) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.MIXED_INDEX_VALUE, "您混合配置了index, value, 每一列同时仅能选择其中一种"); } if (null != columnIndex && columnIndex < 0) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("index需要大于等于0, 您配置的index为[%s]", columnIndex)); } } } } public static void validateCsvReaderConfig(Configuration readerConfiguration) { String csvReaderConfig = readerConfiguration.getString(Key.CSV_READER_CONFIG); if(StringUtils.isNotBlank(csvReaderConfig)){ try{ UnstructuredStorageReaderUtil.csvReaderConfigMap = JSON.parseObject(csvReaderConfig, new TypeReference>() {}); }catch (Exception e) { LOG.info(String.format("WARN!!!!忽略csvReaderConfig配置! 配置错误,值只能为空或者为Map结构,您配置的值为: %s", csvReaderConfig)); } } } /** * * @Title: getRegexPathParent * @Description: 获取正则表达式目录的父目录 * @param @param regexPath * @param @return * @return String * @throws */ public static String getRegexPathParent(String regexPath){ int endMark; for (endMark = 0; endMark < regexPath.length(); endMark++) { if ('*' != regexPath.charAt(endMark) && '?' != regexPath.charAt(endMark)) { continue; } else { break; } } int lastDirSeparator = regexPath.substring(0, endMark).lastIndexOf(IOUtils.DIR_SEPARATOR); String parentPath = regexPath.substring(0,lastDirSeparator + 1); return parentPath; } /** * * @Title: getRegexPathParentPath * @Description: 获取含有通配符路径的父目录,目前只支持在最后一级目录使用通配符*或者?. * (API jcraft.jsch.ChannelSftp.ls(String path)函数限制) http://epaul.github.io/jsch-documentation/javadoc/ * @param @param regexPath * @param @return * @return String * @throws */ public static String getRegexPathParentPath(String regexPath){ int lastDirSeparator = regexPath.lastIndexOf(IOUtils.DIR_SEPARATOR); String parentPath = ""; parentPath = regexPath.substring(0,lastDirSeparator + 1); if(parentPath.contains("*") || parentPath.contains("?")){ throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("配置项目path中:[%s]不合法,目前只支持在最后一级目录使用通配符*或者?", regexPath)); } return parentPath; } public static void setCsvReaderConfig(CsvReader csvReader){ if(null != UnstructuredStorageReaderUtil.csvReaderConfigMap && !UnstructuredStorageReaderUtil.csvReaderConfigMap.isEmpty()){ try { BeanUtils.populate(csvReader,UnstructuredStorageReaderUtil.csvReaderConfigMap); LOG.info(String.format("csvReaderConfig设置成功,设置后CsvReader:%s", JSON.toJSONString(csvReader))); } catch (Exception e) { LOG.info(String.format("WARN!!!!忽略csvReaderConfig配置!通过BeanUtils.populate配置您的csvReaderConfig发生异常,您配置的值为: %s;请检查您的配置!CsvReader使用默认值[%s]", JSON.toJSONString(UnstructuredStorageReaderUtil.csvReaderConfigMap),JSON.toJSONString(csvReader))); } }else { //默认关闭安全模式, 放开10W字节的限制 csvReader.setSafetySwitch(false); LOG.info(String.format("CsvReader使用默认值[%s],csvReaderConfig值为[%s]",JSON.toJSONString(csvReader),JSON.toJSONString(UnstructuredStorageReaderUtil.csvReaderConfigMap))); } } public static Map buildRecordMeta(String filePath) { Map meta = new HashMap(); // 上下文filePath元数据注入, 目前传递的是纯文件名 // File file = new File(filePath); // meta.put(Key.META_KEY_FILE_PATH, file.getName()); meta.put(Key.META_KEY_FILE_PATH, filePath); return meta; } public static void setSourceFileName(Configuration configuration, List sourceFiles){ List sourceFilesName = new ArrayList(); File file; for (String sourceFile: sourceFiles){ file = new File(sourceFile); sourceFilesName.add(file.getName()); } configuration.set(Constant.SOURCE_FILE_NAME, sourceFilesName); } public static void setSourceFile(Configuration configuration, List sourceFiles){ configuration.set(Constant.SOURCE_FILE, sourceFiles); } public static ArrayList getHivePartitionColumns(String filePath, List hivePartitionColumnEntrys) { ArrayList hivePartitionColumns = new ArrayList<>(); if (null == hivePartitionColumnEntrys) { return hivePartitionColumns; } // 对于分区列pt,则从path中找/pt=xxx/,xxx即分区列的值,另外确认在path中只有一次出现 for (ColumnEntry columnEntry : hivePartitionColumnEntrys) { String parColName = columnEntry.getValue(); String patten = String.format("/%s=", parColName); int index = filePath.indexOf(patten); if (index != filePath.lastIndexOf(patten)) { throw new DataXException(String.format("Found multiple partition folder in filePath %s, partition: %s", filePath, parColName)); } String subPath = filePath.substring(index + 1); int firstSeparatorIndex = subPath.indexOf(File.separator); if (firstSeparatorIndex > 0) { subPath = subPath.substring(0, firstSeparatorIndex); } if (subPath.split("=").length != 2) { throw new DataXException(String.format("Found partition column value in filePath %s failed, partition: %s", filePath, parColName)); } String parColVal = subPath.split("=")[1]; String colType = columnEntry.getType().toUpperCase(); Type type = Type.valueOf(colType); Column generateColumn; switch (type) { case STRING: generateColumn = new StringColumn(parColVal); break; case DOUBLE: generateColumn = new DoubleColumn(parColVal); break; case LONG: generateColumn = new LongColumn(parColVal); break; case BOOLEAN: generateColumn = new BoolColumn(parColVal); break; case DATE: generateColumn = new DateColumn(new StringColumn(parColVal.toString()).asDate()); break; default: String errorMessage = String.format("The column type you configured is not currently supported: %s", parColVal); LOG.error(errorMessage); throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.NOT_SUPPORT_TYPE, errorMessage); } hivePartitionColumns.add(generateColumn); } return hivePartitionColumns; } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/ZipCycleInputStream.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader; import java.io.IOException; import java.io.InputStream; import java.util.zip.ZipEntry; import java.util.zip.ZipInputStream; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class ZipCycleInputStream extends InputStream { private static final Logger LOG = LoggerFactory .getLogger(ZipCycleInputStream.class); private ZipInputStream zipInputStream; private ZipEntry currentZipEntry; public ZipCycleInputStream(InputStream in) { this.zipInputStream = new ZipInputStream(in); } @Override public int read() throws IOException { // 定位一个Entry数据流的开头 if (null == this.currentZipEntry) { this.currentZipEntry = this.zipInputStream.getNextEntry(); if (null == this.currentZipEntry) { return -1; } else { LOG.info(String.format("Validate zipEntry with name: %s", this.currentZipEntry.getName())); } } // 不支持zip下的嵌套, 对于目录跳过 if (this.currentZipEntry.isDirectory()) { LOG.warn(String.format("meet a directory %s, ignore...", this.currentZipEntry.getName())); this.currentZipEntry = null; return this.read(); } // 读取一个Entry数据流 int result = this.zipInputStream.read(); // 当前Entry数据流结束了, 需要尝试下一个Entry if (-1 == result) { this.currentZipEntry = null; return this.read(); } else { return result; } } @Override public void close() throws IOException { this.zipInputStream.close(); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/binaryFileUtil/BinaryFileReaderUtil.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader.binaryFileUtil; import com.alibaba.datax.common.element.BytesColumn; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.unstructuredstorage.reader.Key; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.io.InputStream; import java.util.Arrays; import java.util.HashMap; import java.util.Map; /** * @Author: guxuan * @Date 2022-05-17 15:59 */ public class BinaryFileReaderUtil { private static final Logger LOG = LoggerFactory.getLogger(BinaryFileReaderUtil.class); public static void readFromStream(InputStream inputStream, String filePath, RecordSender recordSender, int blockSizeInByte) { try { Map meta = UnstructuredStorageReaderUtil.buildRecordMeta(filePath); byte[] tmp = new byte[blockSizeInByte]; int len; ByteUtils byteUtils = new ByteUtils(); while ((len = inputStream.read(tmp)) != -1) { /**如果len小于blockSizeInByte,说明已经读到了最后一个byte数组 * 此时需要将byte数组长度调整为实际读到的字节数, * 否则会导致写入目的文件字节数大于实际文件字节数, 有可能会导致文件损坏(比如pptx, docx等文件) */ // warn: 这里可以优化掉,没必要做一次数组拷贝,直接复用byte[] tmp即可 byte[] readBytesArray = Arrays.copyOf(tmp, len); byteUtils.append(readBytesArray); if (byteUtils.getSize() >= blockSizeInByte) { recordSenderBytesColumn(recordSender, byteUtils.getBuffer(), meta); byteUtils.clear(); } } recordSenderBytesColumn(recordSender, byteUtils.getBuffer(), meta); LOG.info("End read!!!"); } catch (IOException e) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.READ_FILE_IO_ERROR, e); } } private static void recordSenderBytesColumn(RecordSender recordSender, byte[] tmp, Map meta){ Record record = recordSender.createRecord(); Column column = new BytesColumn(tmp); record.addColumn(column); record.setMeta(meta); recordSender.sendToWriter(record); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/binaryFileUtil/ByteUtils.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader.binaryFileUtil; import java.util.Arrays; /** * @Author: guxuan * @Date 2022-05-17 16:00 */ public class ByteUtils { private int size; private int kDefaultBufferSize = 0; private byte[] buffer; public byte[] getBuffer() { return buffer; } public ByteUtils() { buffer = new byte[0]; size = 0; } public long getSize() { return size; } public void setSize(int size) { this.size = size; } public ByteUtils append(byte[] buf) { if (buf == null){ return this; } buffer = Arrays.copyOf(buffer, buffer.length + buf.length); System.arraycopy(buf, 0, buffer, size, buf.length); size += buf.length; return this; } public void clear() { buffer = new byte[kDefaultBufferSize]; size = 0; } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/split/StartEndPair.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader.split; /** * @Author: guxuan * @Date 2022-05-17 15:50 */ public class StartEndPair { private Long start; private Long end; private String filePath; public StartEndPair() { } public StartEndPair(Long start, Long end, String filePath) { this.start = start; this.end = end; this.filePath = filePath; } public Long getEnd() { return end; } public void setEnd(Long end) { this.end = end; } public Long getStart() { return start; } public void setStart(Long start) { this.start = start; } public String getFilePath() { return filePath; } public void setFilePath(String filePath) { this.filePath = filePath; } @Override public String toString() { return "StartEndPair [start=" + start + ", end=" + end + ", filePath=" + filePath + "]"; } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/reader/split/UnstructuredSplitUtil.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.reader.split; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.RangeSplitUtil; import com.alibaba.datax.plugin.unstructuredstorage.reader.Key; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import com.alibaba.fastjson2.JSON; import org.apache.commons.io.FileUtils; import org.apache.commons.lang3.tuple.ImmutableTriple; import org.apache.commons.lang3.tuple.Triple; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.List; /** * @Author: guxuan * @Date 2022-05-17 15:49 */ public abstract class UnstructuredSplitUtil { private static final Logger LOG = LoggerFactory.getLogger(UnstructuredSplitUtil.class); private boolean needInnerSplit; // 对每个文件进行切分的块大小是 64MB // warn: 这个最好弄成可配置的, 用户配置channel为2但是有10个文件,不一定需要文件内部切分; // 弄成可配置的有些情况下可以避免文件内部切分切分的task太碎 private static final Long BLOCK_BYTE_CAPACITY = 64 * FileUtils.ONE_MB; public UnstructuredSplitUtil(boolean needInnerSplit) { this.needInnerSplit = needInnerSplit; } public List getSplitConfiguration(Configuration originConfiguration, List sourceObjectList, int adviceNumber) { List splitConfiguration = new ArrayList(); List regulateSplitStartEndPairList = new ArrayList(); for (String object : sourceObjectList) { boolean realNeedInnerSplit = false; Long contentTotalLength = -1L; if (this.needInnerSplit) { // 减少不必要的oss接口调用 contentTotalLength = this.getFileTotalLength(object); if (isNeedSplit(contentTotalLength)) { realNeedInnerSplit = true; } } // warn: 数据读模式允许文件内部切分,并且文件大小满足 if (realNeedInnerSplit) { List startEndPairList = getSplitStartEndPairList(contentTotalLength, object); List> startEndInputStreamTripleList = new ArrayList>(); for (int i = 0; i < startEndPairList.size(); i++) { StartEndPair startEndPair = startEndPairList.get(i); InputStream inputStream = this.getFileInputStream(startEndPair); Triple startEndInputStreamTriple = new ImmutableTriple( startEndPair.getStart(), startEndPair.getEnd(), inputStream); startEndInputStreamTripleList.add(startEndInputStreamTriple); } regulateSplitStartEndPairList.addAll(regulateSplitStartEndPair(startEndInputStreamTripleList, object)); } else { // 如果指定的Range无效(比如开始位置、结束位置为负数,大于文件大小),则会下载整个文件; StartEndPair startEndPair = new StartEndPair(0L, -1L, object); regulateSplitStartEndPairList.add(startEndPair); } } // merge task 将多个文件merge到一个task中执行 List> splitResult = RangeSplitUtil.doListSplit(regulateSplitStartEndPairList, adviceNumber); // at here this.objects is not null and not empty for (List eachSlice : splitResult) { Configuration splitedConfig = originConfiguration.clone(); splitedConfig.set(Key.SPLIT_SLICE_CONFIG, eachSlice); splitConfiguration.add(splitedConfig); LOG.info(String.format("File to be read:%s", JSON.toJSONString(eachSlice))); } return splitConfiguration; } /** * 对原始的切分点位进行调节校准, 将点位落在每一行数据的换行符处 * * @param startEndInputStreamTripleList * 原始的切分点位及inputstream (start, end, inputStream) * @return */ private List regulateSplitStartEndPair( List> startEndInputStreamTripleList, String filePath) { List regulatedStartEndPairList = new ArrayList(); for (int i = 0; i < startEndInputStreamTripleList.size(); i++) { if (i == 0) { Triple firstBlock = startEndInputStreamTripleList.get(i); StartEndPair startEndPair = new StartEndPair(firstBlock.getLeft(), null, filePath); regulatedStartEndPairList.add(startEndPair); continue; } Triple block = startEndInputStreamTripleList.get(i); long start = block.getLeft(); long offset = 0; // 对切分点位进行调节,将切分起始点移动到行尾(即'\n'上) if (i < startEndInputStreamTripleList.size()) { offset = getLFIndex(block.getRight()); } // 调节正确的切分点位 long regulatedPoint = start + offset; // 将上一个block的末尾点位调节成行尾 regulatedStartEndPairList.get(i - 1).setEnd(regulatedPoint); if (i < startEndInputStreamTripleList.size() - 1) { // 将本block起始点位进行调节, 结束点位暂不调节 regulatedStartEndPairList.add(new StartEndPair(regulatedPoint + 1, null, filePath)); } else { // 调节最后一个block, 调节起始点位, 结束点位就用文件的字节总长度 regulatedStartEndPairList.add(new StartEndPair(regulatedPoint + 1, block.getMiddle(), filePath)); } } return regulatedStartEndPairList; } /** * 获取到输入流开始的第一个'\n'偏移量, 如果向后偏移了ByteCapacity个字节,还是没有找到'\n'的话,则抛出异常 注: * 对文件切分的最后一个分块不会调用该方法 * * @param inputStream * 输入流 * @return */ private Long getLFIndex(InputStream inputStream) { Long hasReadByteIndex = -1L; int ch = 0; while (ch != -1) { try { ch = inputStream.read(); } catch (IOException e) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.READ_FILE_IO_ERROR, String.format("inputstream read Byte has exception: %s", e.getMessage()), e); } hasReadByteIndex++; if (ch == '\n') { return hasReadByteIndex; } } return hasReadByteIndex; } /** * 得到一个文件最多能拆分成的份数 * * @param fileTotalLength * @return */ private List getSplitStartEndPairList(Long fileTotalLength, String filePath) { long splitNum = (long) Math.ceil(fileTotalLength * 1.0 / BLOCK_BYTE_CAPACITY); List startEndPairList = new ArrayList(); long start, end; for (int i = 1; i <= splitNum; i++) { if (i == 1) { start = (i - 1) * BLOCK_BYTE_CAPACITY; end = i * BLOCK_BYTE_CAPACITY; } else if (i < splitNum) { start = (i - 1) * BLOCK_BYTE_CAPACITY + 1; end = i * BLOCK_BYTE_CAPACITY; } else { start = (i - 1) * BLOCK_BYTE_CAPACITY + 1; end = fileTotalLength - 1; } StartEndPair startEndPair = new StartEndPair(start, end, filePath); startEndPairList.add(startEndPair); } return startEndPairList; } /** * 判断文件是否需要切分, 切分的条件是必须要大于 transport.channel.byteCapacity * * @param fileTotalLength: * 文件总字节数 * @return */ private boolean isNeedSplit(Long fileTotalLength) { boolean fileSizeCouldSplit = fileTotalLength > BLOCK_BYTE_CAPACITY ? true : false; return fileSizeCouldSplit && this.needInnerSplit; } public abstract Long getFileTotalLength(String filePath); public abstract InputStream getFileInputStream(StartEndPair startEndPair); } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/util/ColumnTypeUtil.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.util; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.ColumnEntry; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONObject; import java.util.ArrayList; import java.util.List; import java.util.Objects; /** * @Author: guxuan * @Date 2022-05-17 16:40 */ public class ColumnTypeUtil { private static final String TYPE_NAME = "decimal"; private static final String LEFT_BRACKETS = "("; private static final String RIGHT_BRACKETS = ")"; private static final String DELIM = ","; public static boolean isDecimalType(String typeName){ return typeName.toLowerCase().startsWith(TYPE_NAME); } public static DecimalInfo getDecimalInfo(String typeName, DecimalInfo defaultInfo){ if(!isDecimalType(typeName)){ throw new IllegalArgumentException("Unsupported column type:" + typeName); } if (typeName.contains(LEFT_BRACKETS) && typeName.contains(RIGHT_BRACKETS)){ int precision = Integer.parseInt(typeName.substring(typeName.indexOf(LEFT_BRACKETS) + 1,typeName.indexOf(DELIM)).trim()); int scale = Integer.parseInt(typeName.substring(typeName.indexOf(DELIM) + 1,typeName.indexOf(RIGHT_BRACKETS)).trim()); return new DecimalInfo(precision, scale); } else { return defaultInfo; } } public static class DecimalInfo { private int precision; private int scale; public DecimalInfo(int precision, int scale) { this.precision = precision; this.scale = scale; } public int getPrecision() { return precision; } public int getScale() { return scale; } @Override public boolean equals(Object o) { if (this == o) { return true; } if (o == null || getClass() != o.getClass()){ return false; } DecimalInfo that = (DecimalInfo) o; return precision == that.precision && scale == that.scale; } @Override public int hashCode() { return Objects.hash(precision, scale); } } public static List getListColumnEntry( Configuration configuration, final String path) { List lists = configuration.getList(path, JSONObject.class); if (lists == null) { return null; } List result = new ArrayList<>(); for (final JSONObject object : lists) { result.add(JSON.parseObject(object.toJSONString(), ColumnEntry.class)); } return result; } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/util/HdfsUtil.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.util; public class HdfsUtil { private static final double SCALE_TWO = 2.0; private static final double SCALE_TEN = 10.0; private static final int BIT_SIZE = 8; public static int computeMinBytesForPrecision(int precision){ int numBytes = 1; while (Math.pow(SCALE_TWO, BIT_SIZE * numBytes - 1.0) < Math.pow(SCALE_TEN, precision)) { numBytes += 1; } return numBytes; } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Constant.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer; public class Constant { public static final String DEFAULT_ENCODING = "UTF-8"; public static final char DEFAULT_FIELD_DELIMITER = ','; public static final String DEFAULT_NULL_FORMAT = "\\N"; public static final String FILE_FORMAT_CSV = "csv"; public static final String FILE_FORMAT_TEXT = "text"; public static final String FILE_FORMAT_SQL = "sql"; //每个分块10MB,最大10000个分块, MAX_FILE_SIZE 单位: MB public static final Long MAX_FILE_SIZE = 10 * 10000L; public static final int DEFAULT_COMMIT_SIZE = 2000; public static final String DEFAULT_SUFFIX = ""; public static final String TRUNCATE = "truncate"; public static final String APPEND = "append"; public static final String NOCONFLICT = "nonConflict"; /** * 在同步音视频等二进制文件的情况下: * 半结构化写插件可以统一使用 SOURCE_FILE 获取到读端插件的split file路径 */ public static final String SOURCE_FILE = "sourceFile"; public static final String SOURCE_FILE_NAME = "sourceFileName"; /** * 是否是音视频等无结构化文件 */ public static final String BINARY = "binary"; /** * 文件同步模式, 如果是copy表示纯文件拷贝 * */ public static final String SYNC_MODE_VALUE_COPY = "copy"; } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/DataXCsvWriter.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer; import org.apache.commons.io.IOUtils; import java.io.IOException; import java.io.Writer; /** * @Author: guxuan * @Date 2022-05-19 10:44 */ public class DataXCsvWriter { private Writer writer; @SuppressWarnings("unused") private String fileName; private boolean firstColumn; private boolean useCustomRecordDelimiter; private UserSettings userSettings; private boolean initialized; private boolean closed; public static final int ESCAPE_MODE_DOUBLED = 1; public static final int ESCAPE_MODE_BACKSLASH = 2; public DataXCsvWriter(Writer writer, char delimiter) { this.writer = null; this.fileName = null; this.firstColumn = true; this.useCustomRecordDelimiter = false; this.userSettings = new UserSettings(); this.initialized = false; this.closed = false; if(writer == null) { throw new IllegalArgumentException("Parameter writer can not be null."); } else { this.writer = writer; this.userSettings.Delimiter = delimiter; this.initialized = true; } } public char getDelimiter() { return this.userSettings.Delimiter; } public void setDelimiter(char var1) { this.userSettings.Delimiter = var1; } public char getRecordDelimiter() { return this.userSettings.RecordDelimiter; } public void setRecordDelimiter(char var1) { this.useCustomRecordDelimiter = true; this.userSettings.RecordDelimiter = var1; } public char getTextQualifier() { return this.userSettings.TextQualifier; } public void setTextQualifier(char var1) { this.userSettings.TextQualifier = var1; } public boolean getUseTextQualifier() { return this.userSettings.UseTextQualifier; } public void setUseTextQualifier(boolean var1) { this.userSettings.UseTextQualifier = var1; } public int getEscapeMode() { return this.userSettings.EscapeMode; } public void setEscapeMode(int var1) { this.userSettings.EscapeMode = var1; } public void setComment(char var1) { this.userSettings.Comment = var1; } public char getComment() { return this.userSettings.Comment; } public boolean getForceQualifier() { return this.userSettings.ForceQualifier; } public void setForceQualifier(boolean var1) { this.userSettings.ForceQualifier = var1; } public void write(String var1, boolean var2) throws IOException { this.checkClosed(); if(var1 == null) { var1 = ""; } if(!this.firstColumn) { this.writer.write(this.userSettings.Delimiter); } boolean var3 = this.userSettings.ForceQualifier; if(!var2 && var1.length() > 0) { var1 = var1.trim(); } if(!var3 && this.userSettings.UseTextQualifier && (var1.indexOf(this.userSettings.TextQualifier) > -1 || var1.indexOf(this.userSettings.Delimiter) > -1 || !this.useCustomRecordDelimiter && (var1.indexOf(10) > -1 || var1.indexOf(13) > -1) || this.useCustomRecordDelimiter && var1.indexOf(this.userSettings.RecordDelimiter) > -1 || this.firstColumn && var1.length() > 0 && var1.charAt(0) == this.userSettings.Comment || this.firstColumn && var1.length() == 0)) { var3 = true; } if(this.userSettings.UseTextQualifier && !var3 && var1.length() > 0 && var2) { char var4 = var1.charAt(0); if(var4 == 32 || var4 == 9) { var3 = true; } if(!var3 && var1.length() > 1) { char var5 = var1.charAt(var1.length() - 1); if(var5 == 32 || var5 == 9) { var3 = true; } } } if(var3) { this.writer.write(this.userSettings.TextQualifier); if(this.userSettings.EscapeMode == 2) { var1 = replace(var1, "\\", "\\\\"); var1 = replace(var1, "" + this.userSettings.TextQualifier, "\\" + this.userSettings.TextQualifier); } else { var1 = replace(var1, "" + this.userSettings.TextQualifier, "" + this.userSettings.TextQualifier + this.userSettings.TextQualifier); } } else if(this.userSettings.EscapeMode == 2) { var1 = replace(var1, "\\", "\\\\"); var1 = replace(var1, "" + this.userSettings.Delimiter, "\\" + this.userSettings.Delimiter); if(this.useCustomRecordDelimiter) { var1 = replace(var1, "" + this.userSettings.RecordDelimiter, "\\" + this.userSettings.RecordDelimiter); } else { var1 = replace(var1, "\r", "\\\r"); var1 = replace(var1, "\n", "\\\n"); } if(this.firstColumn && var1.length() > 0 && var1.charAt(0) == this.userSettings.Comment) { if(var1.length() > 1) { var1 = "\\" + this.userSettings.Comment + var1.substring(1); } else { var1 = "\\" + this.userSettings.Comment; } } } this.writer.write(var1); if(var3) { this.writer.write(this.userSettings.TextQualifier); } this.firstColumn = false; } public void write(String var1) throws IOException { this.write(var1, false); } public void writeComment(String var1) throws IOException { this.checkClosed(); this.writer.write(this.userSettings.Comment); this.writer.write(var1); if(this.useCustomRecordDelimiter) { this.writer.write(this.userSettings.RecordDelimiter); } else { this.writer.write(IOUtils.LINE_SEPARATOR); } this.firstColumn = true; } public void writeRecord(String[] var1, boolean var2) throws IOException { if(var1 != null && var1.length > 0) { for(int var3 = 0; var3 < var1.length; ++var3) { this.write(var1[var3], var2); } this.endRecord(); } } public void writeRecord(String[] var1) throws IOException { this.writeRecord(var1, false); } public void endRecord() throws IOException { this.checkClosed(); if(this.useCustomRecordDelimiter) { this.writer.write(this.userSettings.RecordDelimiter); } else { this.writer.write(IOUtils.LINE_SEPARATOR); } this.firstColumn = true; } public void flush() throws IOException { this.writer.flush(); } public void close() { if(!this.closed) { this.close(true); this.closed = true; } } private void close(boolean var1) { if(!this.closed) { try { if(this.initialized) { this.writer.close(); } } catch (Exception var3) { ; } this.writer = null; this.closed = true; } } private void checkClosed() throws IOException { if(this.closed) { throw new IOException("This instance of the CsvWriter class has already been closed."); } } @Override protected void finalize() { this.close(false); } public static String replace(String var0, String var1, String var2) { int var3 = var1.length(); int var4 = var0.indexOf(var1); if(var4 <= -1) { return var0; } else { StringBuffer var5 = new StringBuffer(); int var6; for(var6 = 0; var4 != -1; var4 = var0.indexOf(var1, var6)) { var5.append(var0.substring(var6, var4)); var5.append(var2); var6 = var4 + var3; } var5.append(var0.substring(var6)); return var5.toString(); } } private class UserSettings { public char TextQualifier = 34; public boolean UseTextQualifier = true; public char Delimiter = 44; public char RecordDelimiter = 0; public char Comment = 35; public int EscapeMode = 1; public boolean ForceQualifier = false; public UserSettings() { } } @SuppressWarnings("unused") private class Letters { public static final char LF = '\n'; public static final char CR = '\r'; public static final char QUOTE = '\"'; public static final char COMMA = ','; public static final char SPACE = ' '; public static final char TAB = '\t'; public static final char POUND = '#'; public static final char BACKSLASH = '\\'; public static final char NULL = '\u0000'; private Letters() { } } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/Key.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer; public class Key { public static final String PATH = "path"; // must have public static final String FILE_NAME = "fileName"; public static final String TABLE_NAME = "table"; // must have public static final String WRITE_MODE = "writeMode"; // not must , not default , public static final String FIELD_DELIMITER = "fieldDelimiter"; public static final String QUOTE_CHARACTER = "quoteChar"; // not must , default os's line delimiter public static final String LINE_DELIMITER = "lineDelimiter"; public static final String CSV_WRITER_CONFIG = "csvWriterConfig"; // not must, default UTF-8 public static final String ENCODING = "encoding"; // not must, default no compress public static final String COMPRESS = "compress"; // not must, not default \N public static final String NULL_FORMAT = "nullFormat"; // not must, date format old style, do not use this public static final String FORMAT = "format"; // for writers ' data format public static final String DATE_FORMAT = "dateFormat"; // csv or plain text public static final String FILE_FORMAT = "fileFormat"; // writer headers public static final String HEADER = "header"; // writer maxFileSize public static final String MAX_FILE_SIZE = "maxFileSize"; public static final String COMMIT_SIZE = "commitSize"; // writer file type suffix, like .txt .csv public static final String SUFFIX = "suffix"; public static final String MARK_DONE_FILE_NAME = "markDoneFileName"; public static final String MARK_DOING_FILE_NAME = "markDoingFileName"; // public static final String RETRY_TIME = "retryTime"; public final static String MAX_RETRY_TIME = "maxRetryTime"; /** * 半结构化标示一个Record来源的绝对文件路径名,可以是ftp文件,oss的object等 * */ public static final String META_KEY_FILE_PATH = "filePath"; /** * 多文件切分的工作项,Task通过此配置项表示工作内容, 文件内部切分相关key */ public static final String SPLIT_SLICE_CONFIG = "__splitSliceConfig"; public static final String SPLIT_SLICE_FILE_PATH = "filePath"; public static final String SPLIT_SLICE_START_POINT = "startPoint"; public static final String SPLIT_SLICE_END_POINT = "endPoint"; /** * 文件同步模式, 如果是copy表示纯文件拷贝 * */ public static final String SYNC_MODE = "syncMode"; public static final String BYTE_ENCODING = "byteEncoding"; } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/SqlWriter.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.io.Writer; import java.util.List; import java.util.stream.Collectors; public class SqlWriter implements UnstructuredWriter { private static final Logger LOG = LoggerFactory.getLogger(SqlWriter.class); private Writer sqlWriter; private String quoteChar; private String lineSeparator; private String tableName; private String nullFormat; private StringBuilder insertPrefix; public SqlWriter(Writer writer, String quoteChar, String tableName, String lineSeparator, List columnNames, String nullFormat) { this.sqlWriter = writer; this.quoteChar = quoteChar; this.lineSeparator = lineSeparator; this.tableName = quoteChar + tableName + quoteChar; this.nullFormat = nullFormat; buildInsertPrefix(columnNames); } @Override public void writeOneRecord(List splitedRows) throws IOException { if (splitedRows.isEmpty()) { LOG.info("Found one record line which is empty."); return; } StringBuilder sqlPatten = new StringBuilder(4096).append(insertPrefix); sqlPatten.append(splitedRows.stream().map(e -> { if (nullFormat.equals(e)) { return "NULL"; } return "'" + DataXCsvWriter.replace(e, "'", "''") + "'"; }).collect(Collectors.joining(","))); sqlPatten.append(");").append(lineSeparator); this.sqlWriter.write(sqlPatten.toString()); } private void buildInsertPrefix(List columnNames) { StringBuilder sb = new StringBuilder(columnNames.size() * 32); for (String columnName : columnNames) { if (sb.length() > 0) { sb.append(","); } sb.append(quoteChar).append(columnName).append(quoteChar); } int capacity = 16 + tableName.length() + sb.length(); this.insertPrefix = new StringBuilder(capacity); this.insertPrefix.append("INSERT INTO ").append(tableName).append(" (").append(sb).append(")").append(" VALUES("); } public void appendCommit() throws IOException { this.sqlWriter.write("commit;" + lineSeparator); } @Override public void flush() throws IOException { this.sqlWriter.flush(); } @Override public void close() throws IOException { this.sqlWriter.close(); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/TextCsvWriterManager.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer; import java.io.IOException; import java.io.Writer; import java.util.HashMap; import java.util.List; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.TypeReference; import org.apache.commons.beanutils.BeanUtils; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.csvreader.CsvWriter; public class TextCsvWriterManager { public static UnstructuredWriter produceTextWriter( Writer writer, String fieldDelimiter, Configuration config) { return new TextWriterImpl(writer, fieldDelimiter, config); } public static UnstructuredWriter produceCsvWriter( Writer writer, char fieldDelimiter, Configuration config) { return new CsvWriterImpl(writer, fieldDelimiter, config); } } class CsvWriterImpl implements UnstructuredWriter { private static final Logger LOG = LoggerFactory .getLogger(CsvWriterImpl.class); // csv 严格符合csv语法, 有标准的转义等处理 private char fieldDelimiter; private String lineDelimiter; private DataXCsvWriter csvWriter; public CsvWriterImpl(Writer writer, char fieldDelimiter, Configuration config) { this.fieldDelimiter = fieldDelimiter; this.lineDelimiter = config.getString(Key.LINE_DELIMITER, IOUtils.LINE_SEPARATOR); this.csvWriter = new DataXCsvWriter(writer, this.fieldDelimiter); this.csvWriter.setTextQualifier('"'); this.csvWriter.setUseTextQualifier(true); // warn: in linux is \n , in windows is \r\n this.csvWriter.setRecordDelimiter(this.lineDelimiter.charAt(0)); String csvWriterConfig = config.getString(Key.CSV_WRITER_CONFIG); if (StringUtils.isNotBlank(csvWriterConfig)) { try { HashMap csvWriterConfigMap = JSON.parseObject(csvWriterConfig, new TypeReference>() { }); if (!csvWriterConfigMap.isEmpty()) { // this.csvWriter.setComment(var1); // this.csvWriter.setDelimiter(var1); // this.csvWriter.setEscapeMode(var1); // this.csvWriter.setForceQualifier(var1); // this.csvWriter.setRecordDelimiter(var1); // this.csvWriter.setTextQualifier(var1); // this.csvWriter.setUseTextQualifier(var1); BeanUtils.populate(this.csvWriter, csvWriterConfigMap); LOG.info(String.format("csvwriterConfig is set successfully. After setting, csvwriter:%s", JSON.toJSONString(this.csvWriter))); } } catch (Exception e) { LOG.warn(String.format("invalid csvWriterConfig config: %s, DataX will ignore it.", csvWriterConfig), e); } } } @Override public void writeOneRecord(List splitedRows) throws IOException { if (splitedRows.isEmpty()) { LOG.info("Found one record line which is empty."); } this.csvWriter.writeRecord(splitedRows.toArray(new String[0])); } @Override public void flush() throws IOException { this.csvWriter.flush(); } @Override public void close() throws IOException { this.csvWriter.close(); } } class TextWriterImpl implements UnstructuredWriter { private static final Logger LOG = LoggerFactory .getLogger(TextWriterImpl.class); // text StringUtils的join方式, 简单的字符串拼接 private String fieldDelimiter; private Writer textWriter; private String lineDelimiter; public TextWriterImpl(Writer writer, String fieldDelimiter, Configuration config) { this.fieldDelimiter = fieldDelimiter; this.textWriter = writer; this.lineDelimiter = config.getString(Key.LINE_DELIMITER, IOUtils.LINE_SEPARATOR); } @Override public void writeOneRecord(List splitedRows) throws IOException { if (splitedRows.isEmpty()) { LOG.info("Found one record line which is empty."); } this.textWriter.write(String.format("%s%s", StringUtils.join(splitedRows, this.fieldDelimiter), this.lineDelimiter)); } @Override public void flush() throws IOException { this.textWriter.flush(); } @Override public void close() throws IOException { this.textWriter.close(); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/UnstructuredStorageWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer; import com.alibaba.datax.common.spi.ErrorCode; public enum UnstructuredStorageWriterErrorCode implements ErrorCode { ILLEGAL_VALUE("UnstructuredStorageWriter-00", "您填写的参数值不合法."), Write_FILE_WITH_CHARSET_ERROR("UnstructuredStorageWriter-01", "您配置的编码未能正常写入."), Write_FILE_IO_ERROR("UnstructuredStorageWriter-02", "您配置的文件在写入时出现IO异常."), RUNTIME_EXCEPTION("UnstructuredStorageWriter-03", "出现运行时异常, 请联系我们"), REQUIRED_VALUE("UnstructuredStorageWriter-04", "您缺失了必须填写的参数值."), Write_ERROR("UnstructuredStorageWriter-05", "errorcode.write_error"),; private final String code; private final String description; private UnstructuredStorageWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/UnstructuredStorageWriterUtil.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer; import java.io.*; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.Set; import java.util.UUID; import com.alibaba.datax.common.element.BytesColumn; import com.google.common.base.Preconditions; import org.apache.commons.codec.binary.Base64; import org.apache.commons.collections.CollectionUtils; import org.apache.commons.compress.compressors.CompressorOutputStream; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; import org.apache.commons.io.Charsets; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.google.common.collect.Sets; public class UnstructuredStorageWriterUtil { private UnstructuredStorageWriterUtil() { } private static final Logger LOG = LoggerFactory .getLogger(UnstructuredStorageWriterUtil.class); /** * check parameter: writeMode, encoding, compress, filedDelimiter * */ public static void validateParameter(Configuration writerConfiguration) { // writeMode check String writeMode = writerConfiguration.getNecessaryValue( Key.WRITE_MODE, UnstructuredStorageWriterErrorCode.REQUIRED_VALUE); writeMode = writeMode.trim(); Set supportedWriteModes = Sets.newHashSet("truncate", "append", "nonConflict"); if (!supportedWriteModes.contains(writeMode)) { throw DataXException .asDataXException( UnstructuredStorageWriterErrorCode.ILLEGAL_VALUE, writeMode); } writerConfiguration.set(Key.WRITE_MODE, writeMode); // encoding check String encoding = writerConfiguration.getString(Key.ENCODING); if (StringUtils.isBlank(encoding)) { // like " ", null writerConfiguration.set(Key.ENCODING, Constant.DEFAULT_ENCODING); } else { try { encoding = encoding.trim(); writerConfiguration.set(Key.ENCODING, encoding); Charsets.toCharset(encoding); } catch (Exception e) { throw DataXException.asDataXException( UnstructuredStorageWriterErrorCode.ILLEGAL_VALUE, e); } } // only support compress types String compress = writerConfiguration.getString(Key.COMPRESS); if (StringUtils.isBlank(compress)) { writerConfiguration.set(Key.COMPRESS, null); } else { Set supportedCompress = Sets.newHashSet("gzip", "bzip2"); if (!supportedCompress.contains(compress.toLowerCase().trim())) { throw DataXException.asDataXException( UnstructuredStorageWriterErrorCode.ILLEGAL_VALUE, String.format("unsupported commpress format %s ", compress)); } } // fileFormat check String fileFormat = writerConfiguration.getString(Key.FILE_FORMAT); if (StringUtils.isBlank(fileFormat)) { fileFormat = Constant.FILE_FORMAT_TEXT; writerConfiguration.set(Key.FILE_FORMAT, fileFormat); } if (!Constant.FILE_FORMAT_CSV.equals(fileFormat) && !Constant.FILE_FORMAT_TEXT.equals(fileFormat) && !Constant.FILE_FORMAT_SQL.equals(fileFormat)) { throw DataXException.asDataXException( UnstructuredStorageWriterErrorCode.ILLEGAL_VALUE, String.format("unsupported fileFormat %s ", fileFormat)); } // fieldDelimiter check String delimiterInStr = writerConfiguration.getString(Key.FIELD_DELIMITER); if (StringUtils.equalsIgnoreCase(fileFormat, Constant.FILE_FORMAT_CSV) && null != delimiterInStr && 1 != delimiterInStr.length()) { throw DataXException.asDataXException( UnstructuredStorageWriterErrorCode.ILLEGAL_VALUE, String.format("unsupported delimiterInStr %s ", delimiterInStr)); } if (null == delimiterInStr) { delimiterInStr = String.valueOf(Constant.DEFAULT_FIELD_DELIMITER); writerConfiguration.set(Key.FIELD_DELIMITER, delimiterInStr); } } public static List split(Configuration writerSliceConfig, Set originAllFileExists, int mandatoryNumber) { LOG.info("begin do split..."); Set allFileExists = new HashSet(); allFileExists.addAll(originAllFileExists); List writerSplitConfigs = new ArrayList(); String filePrefix = writerSliceConfig.getString(Key.FILE_NAME); String fileSuffix; for (int i = 0; i < mandatoryNumber; i++) { // handle same file name Configuration splitedTaskConfig = writerSliceConfig.clone(); String fullFileName = null; fileSuffix = UUID.randomUUID().toString().replace('-', '_'); fullFileName = String.format("%s__%s", filePrefix, fileSuffix); while (allFileExists.contains(fullFileName)) { fileSuffix = UUID.randomUUID().toString().replace('-', '_'); fullFileName = String.format("%s__%s", filePrefix, fileSuffix); } allFileExists.add(fullFileName); splitedTaskConfig.set(Key.FILE_NAME, fullFileName); LOG.info(String .format("splited write file name:[%s]", fullFileName)); writerSplitConfigs.add(splitedTaskConfig); } LOG.info("end do split."); return writerSplitConfigs; } public static String buildFilePath(String path, String fileName, String suffix) { boolean isEndWithSeparator = false; switch (IOUtils.DIR_SEPARATOR) { case IOUtils.DIR_SEPARATOR_UNIX: isEndWithSeparator = path.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR)); break; case IOUtils.DIR_SEPARATOR_WINDOWS: isEndWithSeparator = path.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR_WINDOWS)); break; default: break; } if (!isEndWithSeparator) { path = path + IOUtils.DIR_SEPARATOR; } if (null == suffix) { suffix = ""; } else { suffix = suffix.trim(); } return String.format("%s%s%s", path, fileName, suffix); } public static void writeToStream(RecordReceiver lineReceiver, OutputStream outputStream, Configuration config, String context, TaskPluginCollector taskPluginCollector) { String encoding = config.getString(Key.ENCODING, Constant.DEFAULT_ENCODING); // handle blank encoding if (StringUtils.isBlank(encoding)) { encoding = Constant.DEFAULT_ENCODING; } String compress = config.getString(Key.COMPRESS); BufferedWriter writer = null; // compress logic try { if (null == compress) { writer = new BufferedWriter(new OutputStreamWriter( outputStream, encoding)); } else { // TODO more compress if ("gzip".equalsIgnoreCase(compress)) { CompressorOutputStream compressorOutputStream = new GzipCompressorOutputStream( outputStream); writer = new BufferedWriter(new OutputStreamWriter( compressorOutputStream, encoding)); } else if ("bzip2".equalsIgnoreCase(compress)) { CompressorOutputStream compressorOutputStream = new BZip2CompressorOutputStream( outputStream); writer = new BufferedWriter(new OutputStreamWriter( compressorOutputStream, encoding)); } else { throw DataXException .asDataXException( UnstructuredStorageWriterErrorCode.ILLEGAL_VALUE, compress); } } UnstructuredStorageWriterUtil.doWriteToStream(lineReceiver, writer, context, config, taskPluginCollector); } catch (UnsupportedEncodingException uee) { throw DataXException .asDataXException( UnstructuredStorageWriterErrorCode.Write_FILE_WITH_CHARSET_ERROR, uee); } catch (NullPointerException e) { throw DataXException.asDataXException( UnstructuredStorageWriterErrorCode.RUNTIME_EXCEPTION,e); } catch (IOException e) { throw DataXException.asDataXException( UnstructuredStorageWriterErrorCode.Write_FILE_IO_ERROR, e); } finally { IOUtils.closeQuietly(writer); } } private static void doWriteToStream(RecordReceiver lineReceiver, BufferedWriter writer, String contex, Configuration config, TaskPluginCollector taskPluginCollector) throws IOException { String nullFormat = config.getString(Key.NULL_FORMAT); // 兼容format & dataFormat String dateFormat = config.getString(Key.DATE_FORMAT); DateFormat dateParse = null; // warn: 可能不兼容 if (StringUtils.isNotBlank(dateFormat)) { dateParse = new SimpleDateFormat(dateFormat); } // warn: default false String fileFormat = config.getString(Key.FILE_FORMAT, Constant.FILE_FORMAT_TEXT); boolean isSqlFormat = Constant.FILE_FORMAT_SQL.equalsIgnoreCase(fileFormat); int commitSize = config.getInt(Key.COMMIT_SIZE, Constant.DEFAULT_COMMIT_SIZE); UnstructuredWriter unstructuredWriter = produceUnstructuredWriter(fileFormat, config, writer); List headers = config.getList(Key.HEADER, String.class); if (null != headers && !headers.isEmpty() && !isSqlFormat) { unstructuredWriter.writeOneRecord(headers); } Record record = null; int receivedCount = 0; String byteEncoding = config.getString(Key.BYTE_ENCODING); while ((record = lineReceiver.getFromReader()) != null) { UnstructuredStorageWriterUtil.transportOneRecord(record, nullFormat, dateParse, taskPluginCollector, unstructuredWriter, byteEncoding); receivedCount++; if (isSqlFormat && receivedCount % commitSize == 0) { ((SqlWriter) unstructuredWriter).appendCommit(); } } if (isSqlFormat) { ((SqlWriter)unstructuredWriter).appendCommit(); } // warn:由调用方控制流的关闭 // IOUtils.closeQuietly(unstructuredWriter); } public static UnstructuredWriter produceUnstructuredWriter(String fileFormat, Configuration config, Writer writer){ UnstructuredWriter unstructuredWriter = null; if (StringUtils.equalsIgnoreCase(fileFormat, Constant.FILE_FORMAT_CSV)) { Character fieldDelimiter = config.getChar(Key.FIELD_DELIMITER, Constant.DEFAULT_FIELD_DELIMITER); unstructuredWriter = TextCsvWriterManager.produceCsvWriter(writer, fieldDelimiter, config); } else if (StringUtils.equalsIgnoreCase(fileFormat, Constant.FILE_FORMAT_TEXT)) { String fieldDelimiter = config.getString(Key.FIELD_DELIMITER, String.valueOf(Constant.DEFAULT_FIELD_DELIMITER)); unstructuredWriter = TextCsvWriterManager.produceTextWriter(writer, fieldDelimiter, config); } else if (StringUtils.equalsIgnoreCase(fileFormat, Constant.FILE_FORMAT_SQL)) { String tableName = config.getString(Key.TABLE_NAME); Preconditions.checkArgument(StringUtils.isNotEmpty(tableName), "table name is empty"); String quoteChar = config.getString(Key.QUOTE_CHARACTER); Preconditions.checkArgument(StringUtils.isNotEmpty(quoteChar), "quote character is empty"); String lineSeparator = config.getString(Key.LINE_DELIMITER, IOUtils.LINE_SEPARATOR); List headers = config.getList(Key.HEADER, String.class); Preconditions.checkArgument(CollectionUtils.isNotEmpty(headers), "column names are empty"); String nullFormat = config.getString(Key.NULL_FORMAT, Constant.DEFAULT_NULL_FORMAT); unstructuredWriter = new SqlWriter(writer, quoteChar, tableName, lineSeparator, headers, nullFormat); } return unstructuredWriter; } /** * 异常表示脏数据 * */ public static void transportOneRecord(Record record, String nullFormat, DateFormat dateParse, TaskPluginCollector taskPluginCollector, UnstructuredWriter unstructuredWriter, String byteEncoding) { // warn: default is null if (null == nullFormat) { nullFormat = "null"; } try { List splitedRows = new ArrayList(); int recordLength = record.getColumnNumber(); if (0 != recordLength) { Column column; for (int i = 0; i < recordLength; i++) { column = record.getColumn(i); if (null != column.getRawData()) { boolean isDateColumn = column instanceof DateColumn; if (!isDateColumn) { if (column instanceof BytesColumn) { if ("base64".equalsIgnoreCase(byteEncoding)) { splitedRows.add(Base64.encodeBase64String(column.asBytes())); } else { splitedRows.add(column.asString()); } } else { splitedRows.add(column.asString()); } } else { if (null != dateParse) { splitedRows.add(dateParse.format(column .asDate())); } else { splitedRows.add(column.asString()); } } } else { // warn: it's all ok if nullFormat is null splitedRows.add(nullFormat); } } } unstructuredWriter.writeOneRecord(splitedRows); } catch (IllegalArgumentException e){ // warn: dirty data taskPluginCollector.collectDirtyRecord(record, e); } catch (DataXException e){ // warn: dirty data taskPluginCollector.collectDirtyRecord(record, e); } catch (Exception e) { // throw exception, it is not dirty data, // may be network unreachable and the other problem throw DataXException.asDataXException( UnstructuredStorageWriterErrorCode.Write_ERROR, e.getMessage(),e); } } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/UnstructuredWriter.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer; import java.io.Closeable; import java.io.IOException; import java.util.List; public interface UnstructuredWriter extends Closeable { public void writeOneRecord(List splitedRows) throws IOException; public void flush() throws IOException; public void close() throws IOException; } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/binaryFileUtil/BinaryFileWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer.binaryFileUtil; import com.alibaba.datax.common.spi.ErrorCode; public enum BinaryFileWriterErrorCode implements ErrorCode { ILLEGAL_VALUE("UnstructuredStorageWriter-00", "errorcode.illegal_value"), REPEATED_FILE_NAME("UnstructuredStorageWriter-01", "errorcode.repeated_file_name"), REQUIRED_VALUE("UnstructuredStorageWriter-02","errorcode.required_value"),; private final String code; private final String description; private BinaryFileWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: plugin-unstructured-storage-util/src/main/java/com/alibaba/datax/plugin/unstructuredstorage/writer/binaryFileUtil/BinaryFileWriterUtil.java ================================================ package com.alibaba.datax.plugin.unstructuredstorage.writer.binaryFileUtil; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import com.alibaba.datax.plugin.unstructuredstorage.writer.Key; import com.alibaba.datax.plugin.unstructuredstorage.writer.UnstructuredStorageWriterErrorCode; import com.google.common.collect.Sets; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.File; import java.io.IOException; import java.io.OutputStream; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.Set; import static com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.*; /** * @Author: guxuan * @Date 2022-05-17 17:01 */ public class BinaryFileWriterUtil { private static final Logger LOG = LoggerFactory.getLogger(BinaryFileWriterUtil.class); /** * 从RecordReceiver获取源文件Bytes数组, 写到目的端 * * @param outputStream: 写文件流 * @param recordReceiver: RecordReceiver */ public static void writeFileFromRecordReceiver(OutputStream outputStream, RecordReceiver recordReceiver) { try { Record record; while ((record = recordReceiver.getFromReader()) != null) { Column column = record.getColumn(0); outputStream.write(column.asBytes()); } outputStream.flush(); LOG.info("End write!!!"); } catch (IOException e) { throw DataXException.asDataXException(UnstructuredStorageReaderErrorCode.READ_FILE_IO_ERROR, e); } } /** * 校验同步二进制文件的参数 * * @param writerConfiguration: writer的配置 */ public static void validateParameter(Configuration writerConfiguration) { // writeMode check String writeMode = writerConfiguration.getNecessaryValue( Key.WRITE_MODE, UnstructuredStorageWriterErrorCode.REQUIRED_VALUE); writeMode = writeMode.trim(); Set supportedWriteModes = Sets.newHashSet(TRUNCATE, NOCONFLICT); if (!supportedWriteModes.contains(writeMode)) { throw DataXException .asDataXException( BinaryFileWriterErrorCode.ILLEGAL_VALUE, String.format("Synchronous binary format file, only supports truncate and nonConflict modes, does not support the writeMode mode you configured: %s", writeMode)); } writerConfiguration.set(Key.WRITE_MODE, writeMode); } /** * 校验文件名是否有重复的,如果有重复的文件名则抛出异常 * @param fileNameList */ public static void checkFileNameIfRepeatedThrowException(List fileNameList) { Set sourceFileNameSet = new HashSet(); for (String fileName : fileNameList) { if (!sourceFileNameSet.contains(fileName)) { sourceFileNameSet.add(fileName); } else { throw DataXException.asDataXException(BinaryFileWriterErrorCode.REPEATED_FILE_NAME, String.format("Source File Name [%s] is repeated!", fileName)); } } } /** * * @param readerSplitConfigs * @param writerSliceConfig * @return 切分后的结果 */ public static List split(List readerSplitConfigs, Configuration writerSliceConfig) { List writerSplitConfigs = new ArrayList(); for (Configuration readerSliceConfig : readerSplitConfigs) { Configuration splitedTaskConfig = writerSliceConfig.clone(); String fileName = getFileName(readerSliceConfig.getString(SOURCE_FILE)); splitedTaskConfig .set(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME, fileName); splitedTaskConfig. set(com.alibaba.datax.plugin.unstructuredstorage.writer.Constant.BINARY, true); writerSplitConfigs.add(splitedTaskConfig); } LOG.info("end do split."); return writerSplitConfigs; } /** * 根据文件路径获取到文件名, filePath必定包含了文件名 * * @param filePath: 文件路径 */ public static String getFileName(String filePath) { if (StringUtils.isBlank(filePath)) { return null; } File file = new File(filePath); return file.getName(); } } ================================================ FILE: pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT org.hamcrest hamcrest-core 1.3 datax-all pom 1.8 0.0.1-SNAPSHOT 3.3.2 1.10 1.2 2.0.23 16.0.1 3.7.2.1-SNAPSHOT 1.7.10 1.0.13 2.4 4.13.1 5.1.22-1 1.0.0 UTF-8 UTF-8 UTF-8 UTF-8 5.1.47 common core transformer mysqlreader drdsreader sqlserverreader postgresqlreader kingbaseesreader oraclereader cassandrareader oceanbasev10reader obhbasereader rdbmsreader odpsreader otsreader otsstreamreader hbase11xreader hbase094xreader hbase11xsqlreader hbase20xsqlreader ossreader hdfsreader ftpreader txtfilereader streamreader clickhousereader mongodbreader tdenginereader gdbreader tsdbreader opentsdbreader loghubreader datahubreader starrocksreader sybasereader dorisreader mysqlwriter starrockswriter drdswriter databendwriter oraclewriter sqlserverwriter postgresqlwriter kingbaseeswriter adswriter oceanbasev10writer obhbasewriter adbpgwriter hologresjdbcwriter rdbmswriter odpswriter osswriter otswriter hbase11xwriter hbase094xwriter hbase11xsqlwriter hbase20xsqlwriter kuduwriter ftpwriter hdfswriter txtfilewriter streamwriter elasticsearchwriter mongodbwriter tdenginewriter ocswriter tsdbwriter gdbwriter oscarwriter loghubwriter datahubwriter cassandrawriter clickhousewriter doriswriter selectdbwriter adbmysqlwriter sybasewriter neo4jwriter milvuswriter plugin-rdbms-util plugin-unstructured-storage-util gaussdbreader gaussdbwriter datax-example org.apache.commons commons-lang3 ${commons-lang3-version} com.alibaba.fastjson2 fastjson2 ${fastjson-version} commons-io commons-io ${commons-io-version} org.slf4j slf4j-api ${slf4j-api-version} ch.qos.logback logback-classic ${logback-classic-version} com.taobao.tddl tddl-client ${tddl.version} com.google.guava guava com.taobao.diamond diamond-client com.taobao.diamond diamond-client ${diamond.version} com.alibaba.search.swift swift_client ${swift-version} junit junit ${junit-version} org.mockito mockito-all 1.9.5 test org.apache.logging.log4j log4j-api 2.17.1 org.apache.logging.log4j log4j-core 2.17.1 central Nexus aliyun https://maven.aliyun.com/repository/central true true spring spring https://maven.aliyun.com/repository/spring true true central Nexus aliyun https://maven.aliyun.com/repository/central true true src/main/java **/*.properties maven-assembly-plugin datax package.xml make-assembly package org.apache.maven.plugins maven-compiler-plugin 2.3.2 ${jdk-version} ${jdk-version} ${project-sourceEncoding} ================================================ FILE: postgresqlreader/doc/postgresqlreader.md ================================================ # PostgresqlReader 插件文档 ___ ## 1 快速介绍 PostgresqlReader插件实现了从PostgreSQL读取数据。在底层实现上,PostgresqlReader通过JDBC连接远程PostgreSQL数据库,并执行相应的sql语句将数据从PostgreSQL库中SELECT出来。 ## 2 实现原理 简而言之,PostgresqlReader通过JDBC连接器连接到远程的PostgreSQL数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程PostgreSQL数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,PostgresqlReader将其拼接为SQL语句发送到PostgreSQL数据库;对于用户配置querySql信息,PostgresqlReader直接将其发送到PostgreSQL数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从PostgreSQL数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度,单位为byte/s,DataX运行会尽可能达到该速度但是不超过它. "byte": 1048576 }, //出错限制 "errorLimit": { //出错的record条数上限,当大于该值即报错。 "record": 0, //出错的record百分比上限 1.0表示100%,0.02表示2% "percentage": 0.02 } }, "content": [ { "reader": { "name": "postgresqlreader", "parameter": { // 数据库连接用户名 "username": "xx", // 数据库连接密码 "password": "xx", "column": [ "id","name" ], //切分主键 "splitPk": "id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:postgresql://host:port/database" ] } ] } }, "writer": { //writer类型 "name": "streamwriter", //是否打印内容 "parameter": { "print":true, } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "speed": 1048576 }, "content": [ { "reader": { "name": "postgresqlreader", "parameter": { "username": "xx", "password": "xx", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;" ], "jdbcUrl": [ "jdbc:postgresql://host:port/database", "jdbc:postgresql://host:port/database" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,PostgresqlReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,PostgresqlReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 jdbcUrl按照PostgreSQL官方规范,并可以填写连接附件控制信息。具体请参看[PostgreSQL官方文档](http://jdbc.postgresql.org/documentation/93/connect.html)。 * 必选:是
    * 默认值:无
    * **username** * 描述:数据源的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:数据源指定用户名的密码
    * 必选:是
    * 默认值:无
    * **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,PostgresqlReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
    * 必选:是
    * 默认值:无
    * **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照PostgreSQL语法格式: ["id", "'hello'::varchar", "true", "2.5::real", "power(2,3)"] id为普通列名,'hello'::varchar为字符串常量,true为布尔值,2.5为浮点数, power(2,3)为函数。 **column必须用户显示指定同步的列集合,不允许为空!** * 必选:是
    * 默认值:无
    * **splitPk** * 描述:PostgresqlReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提高数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形数据切分,`不支持浮点、字符串型、日期等其他类型`。如果用户指定其他非支持类型,PostgresqlReader将报错! splitPk设置为空,底层将视作用户不允许对单表进行切分,因此使用单通道进行抽取。 * 必选:否
    * 默认值:空
    * **where** * 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
    where条件可以有效地进行业务增量同步。 where条件不配置或者为空,视作全表同步数据。 * 必选:否
    * 默认值:无
    * **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
    `当用户配置querySql时,PostgresqlReader直接忽略table、column、where条件的配置`。 * 必选:否
    * 默认值:无
    * **fetchSize** * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
    `注意,该值过大(>2048)可能造成DataX进程OOM。`。 * 必选:否
    * 默认值:1024
    ### 3.3 类型转换 目前PostgresqlReader支持大部分PostgreSQL类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出PostgresqlReader针对PostgreSQL类型转换列表: | DataX 内部类型| PostgreSQL 数据类型 | | -------- | ----- | | Long |bigint, bigserial, integer, smallint, serial | | Double |double precision, money, numeric, real | | String |varchar, char, text, bit, inet| | Date |date, time, timestamp | | Boolean |bool| | Bytes |bytea| 请注意: * `除上述罗列字段类型外,其他类型均不支持; money,inet,bit需用户使用a_inet::varchar类似的语法转换`。 ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: create table pref_test( id serial, a_bigint bigint, a_bit bit(10), a_boolean boolean, a_char character(5), a_date date, a_double double precision, a_integer integer, a_money money, a_num numeric(10,2), a_real real, a_smallint smallint, a_text text, a_time time, a_timestamp timestamp ) #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 16核 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 2. mem: MemTotal: 24676836kB MemFree: 6365080kB 3. net: 百兆双网卡 * PostgreSQL数据库机器参数为: D12 24逻辑核 192G内存 12*480G SSD 阵列 ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数 | 是否按照主键切分 | DataX速度(Rec/s) | DataX流量(MB/s) | DataX机器运行负载 | |--------|--------| --------|--------|--------| |1| 否 | 10211 | 0.63 | 0.2 | |1| 是 | 10211 | 0.63 | 0.2 | |4| 否 | 10211 | 0.63 | 0.2 | |4| 是 | 40000 | 2.48 | 0.5 | |8| 否 | 10211 | 0.63 | 0.2 | |8| 是 | 78048 | 4.84 | 0.8 | 说明: 1. 这里的单表,主键类型为 serial,数据分布均匀。 2. 对单表如果没有按照主键切分,那么配置通道个数不会提升速度,效果与1个通道一样。 ================================================ FILE: postgresqlreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 postgresqlreader postgresqlreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} org.postgresql postgresql 42.3.3 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: postgresqlreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/postgresqlreader target/ postgresqlreader-0.0.1-SNAPSHOT.jar plugin/reader/postgresqlreader false plugin/reader/postgresqlreader/libs runtime ================================================ FILE: postgresqlreader/src/main/java/com/alibaba/datax/plugin/reader/postgresqlreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.postgresqlreader; public class Constant { public static final int DEFAULT_FETCH_SIZE = 1000; } ================================================ FILE: postgresqlreader/src/main/java/com/alibaba/datax/plugin/reader/postgresqlreader/PostgresqlReader.java ================================================ package com.alibaba.datax.plugin.reader.postgresqlreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class PostgresqlReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.PostgreSQL; public static class Job extends Reader.Job { private Configuration originalConfig; private CommonRdbmsReader.Job commonRdbmsReaderMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); int fetchSize = this.originalConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constant.DEFAULT_FETCH_SIZE); if (fetchSize < 1) { throw DataXException.asDataXException(DBUtilErrorCode.REQUIRED_VALUE, String.format("您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1.", fetchSize)); } this.originalConfig.set(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); this.commonRdbmsReaderMaster = new CommonRdbmsReader.Job(DATABASE_TYPE); this.commonRdbmsReaderMaster.init(this.originalConfig); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderMaster.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderSlave; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderSlave = new CommonRdbmsReader.Task(DATABASE_TYPE,super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderSlave.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderSlave.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig); } } } ================================================ FILE: postgresqlreader/src/main/resources/plugin.json ================================================ { "name": "postgresqlreader", "class": "com.alibaba.datax.plugin.reader.postgresqlreader.PostgresqlReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: postgresqlreader/src/main/resources/plugin_job_template.json ================================================ { "name": "postgresqlreader", "parameter": { "username": "", "password": "", "connection": [ { "table": [], "jdbcUrl": [] } ] } } ================================================ FILE: postgresqlwriter/doc/postgresqlwriter.md ================================================ # DataX PostgresqlWriter --- ## 1 快速介绍 PostgresqlWriter插件实现了写入数据到 PostgreSQL主库目的表的功能。在底层实现上,PostgresqlWriter通过JDBC连接远程 PostgreSQL 数据库,并执行相应的 insert into ... sql 语句将数据写入 PostgreSQL,内部会分批次提交入库。 PostgresqlWriter面向ETL开发工程师,他们使用PostgresqlWriter从数仓导入数据到PostgreSQL。同时 PostgresqlWriter亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 PostgresqlWriter通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL插入语句 * `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行)
    注意: 1. 目的表所在数据库必须是主库才能写入数据;整个任务至少需具备 insert into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 2. PostgresqlWriter和MysqlWriter不同,不支持配置writeMode参数。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 PostgresqlWriter导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "postgresqlwriter", "parameter": { "username": "xx", "password": "xx", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl": "jdbc:postgresql://127.0.0.1:3002/datax", "table": [ "test" ] } ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。 注意:1、在一个数据库上只能配置一个值。 2、jdbcUrl按照PostgreSQL官方规范,并可以填写连接附加参数信息。具体请参看PostgreSQL官方文档或者咨询对应 DBA。 * 必选:是
    * 默认值:无
    * **username** * 描述:目的数据库的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:目的数据库的密码
    * 必选:是
    * 默认值:无
    * **table** * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
    * 默认值:无
    * **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用\*表示, 例如: "column": ["\*"] 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 * 必选:是
    * 默认值:否
    * **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from @table"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
    * 必选:否
    * 默认值:无
    * **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
    * 必选:否
    * 默认值:无
    * **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与PostgreSql的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
    * 必选:否
    * 默认值:1024
    ### 3.3 类型转换 目前 PostgresqlWriter支持大部分 PostgreSQL类型,但也存在部分没有支持的情况,请注意检查你的类型。 下面列出 PostgresqlWriter针对 PostgreSQL类型转换列表: | DataX 内部类型| PostgreSQL 数据类型 | | -------- | ----- | | Long |bigint, bigserial, integer, smallint, serial | | Double |double precision, money, numeric, real | | String |varchar, char, text, bit| | Date |date, time, timestamp | | Boolean |bool| | Bytes |bytea| ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: create table pref_test( id serial, a_bigint bigint, a_bit bit(10), a_boolean boolean, a_char character(5), a_date date, a_double double precision, a_integer integer, a_money money, a_num numeric(10,2), a_real real, a_smallint smallint, a_text text, a_time time, a_timestamp timestamp ) #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 16核 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 2. mem: MemTotal: 24676836kB MemFree: 6365080kB 3. net: 百兆双网卡 * PostgreSQL数据库机器参数为: D12 24逻辑核 192G内存 12*480G SSD 阵列 ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数| 批量提交batchSize | DataX速度(Rec/s)| DataX流量(M/s) | DataX机器运行负载 |--------|--------| --------|--------|--------|--------| |1| 128 | 9259 | 0.55 | 0.3 |1| 512 | 10869 | 0.653 | 0.3 |1| 2048 | 9803 | 0.589 | 0.8 |4| 128 | 30303 | 1.82 | 1 |4| 512 | 36363 | 2.18 | 1 |4| 2048 | 36363 | 2.18 | 1 |8| 128 | 57142 | 3.43 | 2 |8| 512 | 66666 | 4.01 | 1.5 |8| 2048 | 66666 | 4.01 | 1.1 |16| 128 | 88888 | 5.34 | 1.8 |16| 2048 | 94117 | 5.65 | 2.5 |32| 512 | 76190 | 4.58 | 3 #### 4.2.2 性能测试小结 1. `channel数对性能影响很大` 2. `通常不建议写入数据库时,通道个数 > 32` ## FAQ *** **Q: PostgresqlWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 *** **Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。 第二种,向临时表导入数据,完成后再 rename 到线上表。 *** ================================================ FILE: postgresqlwriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT postgresqlwriter postgresqlwriter jar writer data into postgresql database com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} org.postgresql postgresql 42.3.3 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: postgresqlwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/postgresqlwriter target/ postgresqlwriter-0.0.1-SNAPSHOT.jar plugin/writer/postgresqlwriter false plugin/writer/postgresqlwriter/libs runtime ================================================ FILE: postgresqlwriter/src/main/java/com/alibaba/datax/plugin/writer/postgresqlwriter/PostgresqlWriter.java ================================================ package com.alibaba.datax.plugin.writer.postgresqlwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.util.List; public class PostgresqlWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.PostgreSQL; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); // warn:not like mysql, PostgreSQL only support insert mode, don't use String writeMode = this.originalConfig.getString(Key.WRITE_MODE); if (null != writeMode) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, String.format("写入模式(writeMode)配置有误. 因为PostgreSQL不支持配置参数项 writeMode: %s, PostgreSQL仅使用insert sql 插入数据. 请检查您的配置并作出修改.", writeMode)); } this.commonRdbmsWriterMaster = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterMaster.init(this.originalConfig); } @Override public void prepare() { this.commonRdbmsWriterMaster.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterMaster.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterSlave; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterSlave = new CommonRdbmsWriter.Task(DATABASE_TYPE){ @Override public String calcValueHolder(String columnType){ if("serial".equalsIgnoreCase(columnType)){ return "?::int"; }else if("bigserial".equalsIgnoreCase(columnType)){ return "?::int8"; }else if("bit".equalsIgnoreCase(columnType)){ return "?::bit varying"; } return "?::" + columnType; } }; this.commonRdbmsWriterSlave.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); } public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterSlave.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); } } } ================================================ FILE: postgresqlwriter/src/main/resources/plugin.json ================================================ { "name": "postgresqlwriter", "class": "com.alibaba.datax.plugin.writer.postgresqlwriter.PostgresqlWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: postgresqlwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "postgresqlwriter", "parameter": { "username": "", "password": "", "column": [], "preSql": [], "connection": [ { "jdbcUrl": "", "table": [] } ], "preSql": [], "postSql": [] } } ================================================ FILE: rdbmsreader/doc/rdbmsreader.md ================================================ # RDBMSReader 插件文档 ___ ## 1 快速介绍 RDBMSReader插件实现了从RDBMS读取数据。在底层实现上,RDBMSReader通过JDBC连接远程RDBMS数据库,并执行相应的sql语句将数据从RDBMS库中SELECT出来。目前支持达梦、db2、PPAS、Sybase数据库的读取。RDBMSReader是一个通用的关系数据库读插件,您可以通过注册数据库驱动等方式增加任意多样的关系数据库读支持。 ## 2 实现原理 简而言之,RDBMSReader通过JDBC连接器连接到远程的RDBMS数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程RDBMS数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,RDBMSReader将其拼接为SQL语句发送到RDBMS数据库;对于用户配置querySql信息,RDBMS直接将其发送到RDBMS数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从RDBMS数据库同步抽取数据作业: ``` { "job": { "setting": { "speed": { "byte": 1048576 }, "errorLimit": { "record": 0, "percentage": 0.02 } }, "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "xxx", "password": "xxx", "column": [ "id", "name" ], "splitPk": "pk", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:dm://ip:port/database" ] } ], "fetchSize": 1024, "where": "1 = 1" } }, "writer": { "name": "streamwriter", "parameter": { "print": true } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到ODPS的作业: ``` { "job": { "setting": { "speed": { "byte": 1048576 }, "errorLimit": { "record": 0, "percentage": 0.02 } }, "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "xxx", "password": "xxx", "column": [ "id", "name" ], "splitPk": "pk", "connection": [ { "querySql": [ "SELECT * from dual" ], "jdbcUrl": [ "jdbc:dm://ip:port/database" ] } ], "fetchSize": 1024, "where": "1 = 1" } }, "writer": { "name": "streamwriter", "parameter": { "print": true } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,jdbcUrl按照RDBMS官方规范,并可以填写连接附件控制信息。请注意不同的数据库jdbc的格式是不同的,DataX会根据具体jdbc的格式选择合适的数据库驱动完成数据读取。 - 达梦 jdbc:dm://ip:port/database - db2格式 jdbc:db2://ip:port/database - PPAS格式 jdbc:edb://ip:port/database **rdbmsreader如何增加新的数据库支持:** - 进入rdbmsreader对应目录,这里${DATAX_HOME}为DataX主目录,即: ${DATAX_HOME}/plugin/reader/rdbmsreader - 在rdbmsreader插件目录下有plugin.json配置文件,在此文件中注册您具体的数据库驱动,具体放在drivers数组中。rdbmsreader插件在任务执行时会动态选择合适的数据库驱动连接数据库。 ``` { "name": "rdbmsreader", "class": "com.alibaba.datax.plugin.reader.rdbmsreader.RdbmsReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba", "drivers": [ "dm.jdbc.driver.DmDriver", "com.ibm.db2.jcc.DB2Driver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver" ] } ``` - 在rdbmsreader插件目录下有libs子目录,您需要将您具体的数据库驱动放到libs目录下。 ``` $tree . |-- libs | |-- Dm7JdbcDriver16.jar | |-- commons-collections-3.0.jar | |-- commons-io-2.4.jar | |-- commons-lang3-3.3.2.jar | |-- commons-math3-3.1.1.jar | |-- datax-common-0.0.1-SNAPSHOT.jar | |-- datax-service-face-1.0.23-20160120.024328-1.jar | |-- db2jcc4.jar | |-- druid-1.0.15.jar | |-- edb-jdbc16.jar | |-- fastjson-1.1.46.sec01.jar | |-- guava-r05.jar | |-- hamcrest-core-1.3.jar | |-- jconn3-1.0.0-SNAPSHOT.jar | |-- logback-classic-1.0.13.jar | |-- logback-core-1.0.13.jar | |-- plugin-rdbms-util-0.0.1-SNAPSHOT.jar | `-- slf4j-api-1.7.10.jar |-- plugin.json |-- plugin_job_template.json `-- rdbmsreader-0.0.1-SNAPSHOT.jar ``` * 必选:是
    * 默认值:无
    * **username** * 描述:数据源的用户名。
    * 必选:是
    * 默认值:无
    * **password** * 描述:数据源指定用户名的密码。
    * 必选:是
    * 默认值:无
    * **table** * 描述:所选取的需要同步的表名。
    * 必选:是
    * 默认值:无
    * **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用*代表默认使用所有列配置,例如['*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照JSON格式: ["id", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] id为普通列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 Column必须显示填写,不允许为空! * 必选:是
    * 默认值:无
    * **splitPk** * 描述:RDBMSReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形数据切分,`不支持浮点、字符串型、日期等其他类型`。如果用户指定其他非支持类型,RDBMSReader将报错! splitPk如果不填写,将视作用户不对单表进行切分,RDBMSReader使用单通道同步全量数据。 * 必选:否
    * 默认值:空
    * **where** * 描述:筛选条件,RDBMSReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。例如在做测试时,可以将where条件指定为limit 10;在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。
    。 where条件可以有效地进行业务增量同步。where条件不配置或者为空,视作全表同步数据。 * 必选:否
    * 默认值:无
    * **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
    `当用户配置querySql时,RDBMSReader直接忽略table、column、where条件的配置`。 * 必选:否
    * 默认值:无
    * **fetchSize** * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
    `注意,该值过大(>2048)可能造成DataX进程OOM。`。 * 必选:否
    * 默认值:1024
    ### 3.3 类型转换 目前RDBMSReader支持大部分通用得关系数据库类型如数字、字符等,但也存在部分个别类型没有支持的情况,请注意检查你的类型,根据具体的数据库做选择。 ================================================ FILE: rdbmsreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 rdbmsreader rdbmsreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.dameng Dm7JdbcDriver17 7.6.0.142 com.sybase jconn3 1.0.0-SNAPSHOT system ${basedir}/src/main/libs/jconn3-1.0.0-SNAPSHOT.jar ppas ppas 16 system ${basedir}/src/main/libs/edb-jdbc16.jar com.ibm.db2.jcc db2jcc db2jcc4 org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: rdbmsreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/rdbmsreader target/ rdbmsreader-0.0.1-SNAPSHOT.jar plugin/reader/rdbmsreader src/main/libs *.* plugin/reader/rdbmsreader/libs false plugin/reader/rdbmsreader/libs runtime ================================================ FILE: rdbmsreader/src/main/java/com/alibaba/datax/plugin/reader/rdbmsreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.rdbmsreader; public class Constant { public static final int DEFAULT_FETCH_SIZE = 1000; } ================================================ FILE: rdbmsreader/src/main/java/com/alibaba/datax/plugin/reader/rdbmsreader/RdbmsReader.java ================================================ package com.alibaba.datax.plugin.reader.rdbmsreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class RdbmsReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.RDBMS; static { //加载插件下面配置的驱动类 DBUtil.loadDriverClass("reader", "rdbms"); } public static class Job extends Reader.Job { private Configuration originalConfig; private CommonRdbmsReader.Job commonRdbmsReaderMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); int fetchSize = this.originalConfig.getInt( com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constant.DEFAULT_FETCH_SIZE); if (fetchSize < 1) { throw DataXException .asDataXException( DBUtilErrorCode.REQUIRED_VALUE, String.format( "您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1.", fetchSize)); } this.originalConfig.set( com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); this.commonRdbmsReaderMaster = new SubCommonRdbmsReader.Job( DATABASE_TYPE); this.commonRdbmsReaderMaster.init(this.originalConfig); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderMaster.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderMaster.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderSlave; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderSlave = new SubCommonRdbmsReader.Task( DATABASE_TYPE); this.commonRdbmsReaderSlave.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig .getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this.commonRdbmsReaderSlave.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderSlave.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderSlave.destroy(this.readerSliceConfig); } } } ================================================ FILE: rdbmsreader/src/main/java/com/alibaba/datax/plugin/reader/rdbmsreader/SubCommonRdbmsReader.java ================================================ package com.alibaba.datax.plugin.reader.rdbmsreader; import com.alibaba.datax.common.element.BoolColumn; import com.alibaba.datax.common.element.BytesColumn; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.DoubleColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.Types; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class SubCommonRdbmsReader extends CommonRdbmsReader { static { DBUtil.loadDriverClass("reader", "rdbms"); } public static class Job extends CommonRdbmsReader.Job { public Job(DataBaseType dataBaseType) { super(dataBaseType); } } public static class Task extends CommonRdbmsReader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private static final boolean IS_DEBUG = LOG.isDebugEnabled(); public Task(DataBaseType dataBaseType) { super(dataBaseType); } @Override protected Record transportOneRecord(RecordSender recordSender, ResultSet rs, ResultSetMetaData metaData, int columnNumber, String mandatoryEncoding, TaskPluginCollector taskPluginCollector) { Record record = recordSender.createRecord(); try { for (int i = 1; i <= columnNumber; i++) { switch (metaData.getColumnType(i)) { case Types.CHAR: case Types.NCHAR: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: String rawData; if (StringUtils.isBlank(mandatoryEncoding)) { rawData = rs.getString(i); } else { rawData = new String( (rs.getBytes(i) == null ? EMPTY_CHAR_ARRAY : rs.getBytes(i)), mandatoryEncoding); } record.addColumn(new StringColumn(rawData)); break; case Types.CLOB: case Types.NCLOB: record.addColumn(new StringColumn(rs.getString(i))); break; case Types.SMALLINT: case Types.TINYINT: case Types.INTEGER: case Types.BIGINT: record.addColumn(new LongColumn(rs.getString(i))); break; case Types.NUMERIC: case Types.DECIMAL: record.addColumn(new DoubleColumn(rs.getString(i))); break; case Types.FLOAT: case Types.REAL: case Types.DOUBLE: record.addColumn(new DoubleColumn(rs.getString(i))); break; case Types.TIME: record.addColumn(new DateColumn(rs.getTime(i))); break; // for mysql bug, see http://bugs.mysql.com/bug.php?id=35115 case Types.DATE: if (metaData.getColumnTypeName(i).equalsIgnoreCase( "year")) { record.addColumn(new LongColumn(rs.getInt(i))); } else { record.addColumn(new DateColumn(rs.getDate(i))); } break; case Types.TIMESTAMP: record.addColumn(new DateColumn(rs.getTimestamp(i))); break; case Types.BINARY: case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: record.addColumn(new BytesColumn(rs.getBytes(i))); break; // warn: bit(1) -> Types.BIT 可使用BoolColumn // warn: bit(>1) -> Types.VARBINARY 可使用BytesColumn case Types.BOOLEAN: case Types.BIT: record.addColumn(new BoolColumn(rs.getBoolean(i))); break; case Types.NULL: String stringData = null; if (rs.getObject(i) != null) { stringData = rs.getObject(i).toString(); } record.addColumn(new StringColumn(stringData)); break; //case Types.TIME_WITH_TIMEZONE: //case Types.TIMESTAMP_WITH_TIMEZONE: // record.addColumn(new StringColumn(rs.getString(i))); // break; default: // warn:not support INTERVAL etc: Types.JAVA_OBJECT throw DataXException .asDataXException( DBUtilErrorCode.UNSUPPORTED_TYPE, String.format( "您的配置文件中的列配置信息有误. 因为DataX 不支持数据库读取这种字段类型. 字段名:[%s], 字段名称:[%s], 字段Java类型:[%s]. 请尝试使用数据库函数将其转换datax支持的类型 或者不同步该字段 .", metaData.getColumnName(i), metaData.getColumnType(i), metaData.getColumnClassName(i))); } } } catch (Exception e) { if (IS_DEBUG) { LOG.debug("read data " + record.toString() + " occur exception:", e); } // TODO 这里识别为脏数据靠谱吗? taskPluginCollector.collectDirtyRecord(record, e); if (e instanceof DataXException) { throw (DataXException) e; } } recordSender.sendToWriter(record); return record; } } } ================================================ FILE: rdbmsreader/src/main/resources/plugin.json ================================================ { "name": "rdbmsreader", "class": "com.alibaba.datax.plugin.reader.rdbmsreader.RdbmsReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba", "drivers":["dm.jdbc.driver.DmDriver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver", "com.ibm.db2.jcc.DB2Driver"] } ================================================ FILE: rdbmsreader/src/main/resources/plugin_job_template.json ================================================ { "name": "rdbmsreader", "parameter": { "username": "", "password": "", "column": [], "connection": [ { "jdbcUrl": [], "table": [] } ], "where": "" } } ================================================ FILE: rdbmswriter/doc/rdbmswriter.md ================================================ # RDBMSWriter 插件文档 --- ## 1 快速介绍 RDBMSWriter 插件实现了写入数据到 RDBMS 主库的目的表的功能。在底层实现上, RDBMSWriter 通过 JDBC 连接远程 RDBMS 数据库,并执行相应的 insert into ... 的 sql 语句将数据写入 RDBMS。 RDBMSWriter是一个通用的关系数据库写插件,您可以通过注册数据库驱动等方式增加任意多样的关系数据库写支持。 RDBMSWriter 面向ETL开发工程师,他们使用 RDBMSWriter 从数仓导入数据到 RDBMS。同时 RDBMSWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 RDBMSWriter 通过 DataX 框架获取 Reader 生成的协议数据,RDBMSWriter 通过 JDBC 连接远程 RDBMS 数据库,并执行相应的 insert into ... 的 sql 语句将数据写入 RDBMS。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个写入RDBMS的作业。 ``` { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "rdbmswriter", "parameter": { "connection": [ { "jdbcUrl": "jdbc:dm://ip:port/database", "table": [ "table" ] } ], "username": "username", "password": "password", "table": "table", "column": [ "*" ], "preSql": [ "delete from XXX;" ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,jdbcUrl按照RDBMS官方规范,并可以填写连接附件控制信息。请注意不同的数据库jdbc的格式是不同的,DataX会根据具体jdbc的格式选择合适的数据库驱动完成数据读取。 - 达梦 jdbc:dm://ip:port/database - db2格式 jdbc:db2://ip:port/database - PPAS格式 jdbc:edb://ip:port/database **rdbmswriter如何增加新的数据库支持:** - 进入rdbmswriter对应目录,这里${DATAX_HOME}为DataX主目录,即: ${DATAX_HOME}/plugin/writer/rdbmswriter - 在rdbmswriter插件目录下有plugin.json配置文件,在此文件中注册您具体的数据库驱动,具体放在drivers数组中。rdbmswriter插件在任务执行时会动态选择合适的数据库驱动连接数据库。 ```json { "name": "rdbmswriter", "class": "com.alibaba.datax.plugin.reader.rdbmswriter.RdbmsWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba", "drivers": [ "dm.jdbc.driver.DmDriver", "com.ibm.db2.jcc.DB2Driver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver" ] } ``` - 在rdbmswriter插件目录下有libs子目录,您需要将您具体的数据库驱动放到libs目录下。 ``` $tree . |-- libs | |-- Dm7JdbcDriver16.jar | |-- commons-collections-3.0.jar | |-- commons-io-2.4.jar | |-- commons-lang3-3.3.2.jar | |-- commons-math3-3.1.1.jar | |-- datax-common-0.0.1-SNAPSHOT.jar | |-- datax-service-face-1.0.23-20160120.024328-1.jar | |-- db2jcc4.jar | |-- druid-1.0.15.jar | |-- edb-jdbc16.jar | |-- fastjson-1.1.46.sec01.jar | |-- guava-r05.jar | |-- hamcrest-core-1.3.jar | |-- jconn3-1.0.0-SNAPSHOT.jar | |-- logback-classic-1.0.13.jar | |-- logback-core-1.0.13.jar | |-- plugin-rdbms-util-0.0.1-SNAPSHOT.jar | `-- slf4j-api-1.7.10.jar |-- plugin.json |-- plugin_job_template.json `-- rdbmswriter-0.0.1-SNAPSHOT.jar ``` * 必选:是
    * 默认值:无
    * **username** * 描述:数据源的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:数据源指定用户名的密码
    * 必选:是
    * 默认值:无
    * **table** * 描述:目标表名称,如果表的schema信息和上述配置username不一致,请使用schema.table的格式填写table信息。
    * 必选:是
    * 默认值:无
    * **column** * 描述:所配置的表中需要同步的列名集合。以英文逗号(,)进行分隔。`我们强烈不推荐用户使用默认列情况`
    * 必选:是
    * 默认值:无
    * **preSql** * 描述:执行数据同步任务之前率先执行的sql语句,目前只允许执行一条SQL语句,例如清除旧数据。
    * 必选:否
    * 默认值:无
    * **postSql** * 描述:执行数据同步任务之后执行的sql语句,目前只允许执行一条SQL语句,例如加上某一个时间戳。
    * 必选:否
    * 默认值:无
    * **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与RDBMS的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
    * 必选:否
    * 默认值:1024
    ### 3.3 类型转换 目前RDBMSWriter支持大部分通用得关系数据库类型如数字、字符等,但也存在部分个别类型没有支持的情况,请注意检查你的类型,根据具体的数据库做选择。 ================================================ FILE: rdbmswriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 rdbmswriter rdbmswriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.dameng Dm7JdbcDriver17 7.6.0.142 com.sybase jconn3 1.0.0-SNAPSHOT system ${basedir}/src/main/libs/jconn3-1.0.0-SNAPSHOT.jar ppas ppas 16 system ${basedir}/src/main/libs/edb-jdbc16.jar com.ibm.db2.jcc db2jcc db2jcc4 org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: rdbmswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/rdbmswriter target/ rdbmswriter-0.0.1-SNAPSHOT.jar plugin/writer/rdbmswriter src/main/libs *.* plugin/writer/rdbmswriter/libs false plugin/writer/rdbmswriter/libs runtime ================================================ FILE: rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/RdbmsWriter.java ================================================ package com.alibaba.datax.plugin.reader.rdbmswriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.util.List; public class RdbmsWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.RDBMS; static { //加载插件下面配置的驱动类 DBUtil.loadDriverClass("writer", "rdbms"); } public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterMaster; @Override public void init() { this.originalConfig = super.getPluginJobConf(); // warn:not like mysql, only support insert mode, don't use String writeMode = this.originalConfig.getString(Key.WRITE_MODE); if (null != writeMode) { throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "写入模式(writeMode)配置有误. 因为不支持配置参数项 writeMode: %s, 仅使用insert sql 插入数据. 请检查您的配置并作出修改.", writeMode)); } this.commonRdbmsWriterMaster = new SubCommonRdbmsWriter.Job( DATABASE_TYPE); this.commonRdbmsWriterMaster.init(this.originalConfig); } @Override public void prepare() { this.commonRdbmsWriterMaster.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterMaster.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterMaster.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterMaster.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterSlave; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterSlave = new SubCommonRdbmsWriter.Task( DATABASE_TYPE); this.commonRdbmsWriterSlave.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterSlave.prepare(this.writerSliceConfig); } public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterSlave.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterSlave.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterSlave.destroy(this.writerSliceConfig); } } } ================================================ FILE: rdbmswriter/src/main/java/com/alibaba/datax/plugin/reader/rdbmswriter/SubCommonRdbmsWriter.java ================================================ package com.alibaba.datax.plugin.reader.rdbmswriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Types; public class SubCommonRdbmsWriter extends CommonRdbmsWriter { static { DBUtil.loadDriverClass("writer", "rdbms"); } public static class Job extends CommonRdbmsWriter.Job { public Job(DataBaseType dataBaseType) { super(dataBaseType); } } public static class Task extends CommonRdbmsWriter.Task { public Task(DataBaseType dataBaseType) { super(dataBaseType); } @Override protected PreparedStatement fillPreparedStatementColumnType( PreparedStatement preparedStatement, int columnIndex, int columnSqltype, String typeName, Column column) throws SQLException { java.util.Date utilDate; try { switch (columnSqltype) { case Types.CHAR: case Types.NCHAR: case Types.CLOB: case Types.NCLOB: case Types.VARCHAR: case Types.LONGVARCHAR: case Types.NVARCHAR: case Types.LONGNVARCHAR: if (null == column.getRawData()) { preparedStatement.setObject(columnIndex + 1, null); } else { preparedStatement.setString(columnIndex + 1, column.asString()); } break; case Types.SMALLINT: case Types.INTEGER: case Types.BIGINT: case Types.TINYINT: String strLongValue = column.asString(); if (emptyAsNull && "".equals(strLongValue)) { preparedStatement.setObject(columnIndex + 1, null); } else if (null == column.getRawData()) { preparedStatement.setObject(columnIndex + 1, null); } else { preparedStatement.setLong(columnIndex + 1, column.asLong()); } break; case Types.NUMERIC: case Types.DECIMAL: case Types.FLOAT: case Types.REAL: case Types.DOUBLE: String strValue = column.asString(); if (emptyAsNull && "".equals(strValue)) { preparedStatement.setObject(columnIndex + 1, null); } else if (null == column.getRawData()) { preparedStatement.setObject(columnIndex + 1, null); } else { preparedStatement.setDouble(columnIndex + 1, column.asDouble()); } break; case Types.DATE: java.sql.Date sqlDate = null; utilDate = column.asDate(); if (null != utilDate) { sqlDate = new java.sql.Date(utilDate.getTime()); preparedStatement.setDate(columnIndex + 1, sqlDate); } else { preparedStatement.setNull(columnIndex + 1, Types.DATE); } break; case Types.TIME: java.sql.Time sqlTime = null; utilDate = column.asDate(); if (null != utilDate) { sqlTime = new java.sql.Time(utilDate.getTime()); preparedStatement.setTime(columnIndex + 1, sqlTime); } else { preparedStatement.setNull(columnIndex + 1, Types.TIME); } break; case Types.TIMESTAMP: java.sql.Timestamp sqlTimestamp = null; utilDate = column.asDate(); if (null != utilDate) { sqlTimestamp = new java.sql.Timestamp( utilDate.getTime()); preparedStatement.setTimestamp(columnIndex + 1, sqlTimestamp); } else { preparedStatement.setNull(columnIndex + 1, Types.TIMESTAMP); } break; case Types.BINARY: case Types.VARBINARY: case Types.BLOB: case Types.LONGVARBINARY: if (null == column.getRawData()) { preparedStatement.setObject(columnIndex + 1, null); } else { preparedStatement.setBytes(columnIndex + 1, column.asBytes()); } break; case Types.BOOLEAN: if (null == column.getRawData()) { preparedStatement.setNull(columnIndex + 1, Types.BOOLEAN); } else { preparedStatement.setBoolean(columnIndex + 1, column.asBoolean()); } break; // warn: bit(1) -> Types.BIT 可使用setBoolean // warn: bit(>1) -> Types.VARBINARY 可使用setBytes case Types.BIT: if (null == column.getRawData()) { preparedStatement.setObject(columnIndex + 1, null); } else if (this.dataBaseType == DataBaseType.MySql) { preparedStatement.setBoolean(columnIndex + 1, column.asBoolean()); } else { preparedStatement.setString(columnIndex + 1, column.asString()); } break; default: preparedStatement.setObject(columnIndex + 1, column.getRawData()); break; } } catch (DataXException e) { throw new SQLException(String.format( "类型转换错误:[%s] 字段名:[%s], 字段类型:[%d], 字段Java类型:[%s].", column, this.resultSetMetaData.getLeft().get(columnIndex), this.resultSetMetaData.getMiddle().get(columnIndex), this.resultSetMetaData.getRight().get(columnIndex))); } return preparedStatement; } } } ================================================ FILE: rdbmswriter/src/main/resources/plugin.json ================================================ { "name": "rdbmswriter", "class": "com.alibaba.datax.plugin.reader.rdbmswriter.RdbmsWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba", "drivers":["dm.jdbc.driver.DmDriver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver", "com.ibm.db2.jcc.DB2Driver"] } ================================================ FILE: rdbmswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "rdbmswriter", "parameter": { "username": "", "password": "", "writeMode": "", "column": [], "session": [], "preSql": [], "connection": [ { "jdbcUrl": "", "table": [] } ] } } ================================================ FILE: rpm/t_dp_dw_datax_3_core_all-build.sh ================================================ #!/bin/bash export PATH=/home/tops/bin/:${PATH} export temppath=$1 cd $temppath/rpm sed -i "s/^Release:.*$/Release: "$4"/" $2.spec sed -i "s/^Version:.*$/Version: "$3"/" $2.spec sed -i "s/UNKNOWN_DATAX_VERSION/$3-$4/g" ../core/src/main/bin/datax.py sed -i "s/UNKNOWN_DATAX_VERSION/$3-$4/g" ../core/src/main/bin/perftrace.py export TAGS=TAG:`svn info|grep "URL"|cut -d ":" -f 2-|sed "s/^ //g"|awk -F "trunk|tags|branche" '{print $1}'`tags/$2_A_`echo $3|tr "." "_"`_$4 sed -i "s#%description#%description \n $TAGS#g" $2.spec /usr/local/bin/rpm_create -p /home/admin -v $3 -r $4 $2.spec -k mv `find . -name $2-$3-$4*rpm` . ================================================ FILE: rpm/t_dp_dw_datax_3_hook_dqc-build.sh ================================================ #!/bin/bash export PATH=/home/tops/bin/:${PATH} export temppath=$1 cd $temppath/rpm sed -i "s/^Release:.*$/Release: "$4"/" $2.spec sed -i "s/^Version:.*$/Version: "$3"/" $2.spec export TAGS=TAG:`svn info|grep "URL"|cut -d ":" -f 2-|sed "s/^ //g"|awk -F "trunk|tags|branche" '{print $1}'`tags/$2_A_`echo $3|tr "." "_"`_$4 sed -i "s#%description#%description \n $TAGS#g" $2.spec /usr/local/bin/rpm_create -p /home/admin -v $3 -r $4 $2.spec -k mv `find . -name $2-$3-$4*rpm` . ================================================ FILE: selectdbwriter/doc/selectdbwriter.md ================================================ # SelectdbWriter 插件文档 ## 1 快速介绍 SelectdbWriter支持将大批量数据写入SELECTDB中。 ## 2 实现原理 SelectdbWriter 通过调用selectdb api (/copy/upload),返回一个重定向的S3地址,使用Http向S3地址发送字节流,设置参数达到要求时执行copy into ## 3 编译 1. 运行 init-env.sh 2. 编译 selectdbwriter: i. 单独编译 selectdbwriter 插件: ```text mvn clean install -pl plugin-rdbms-util,selectdbwriter -DskipTests ``` ii.编译整个 DataX 项目: ```text mvn package assembly:assembly -Dmaven.test.skip=true ``` 产出在 target/datax/datax/. hdfsreader, hdfswriter and oscarwriter 这三个插件需要额外的jar包。如果你并不需要这些插件,可以在 DataX/pom.xml 中删除这些插件的模块。 iii.编译错误 如遇到如下编译错误: ```text Could not find artifact com.alibaba.datax:datax-all:pom:0.0.1-SNAPSHOT ``` 可尝试以下方式解决: a.下载 alibaba-datax-maven-m2-20210928.tar.gz b.解压后,将得到的 alibaba/datax/ 目录,拷贝到所使用的 maven 对应的 .m2/repository/com/alibaba/ 下。 c.再次尝试编译。 ## 3 功能说明 ### 3.1 配置样例 这里是一份从Stream读取数据后导入至selectdb的配置文件。 ``` { "job":{ "content":[ { "reader":{ "name":"streamreader", "parameter":{ "column":[ { "type":"string", "random":"0,31" }, { "type":"string", "random":"0,31" }, { "type":"string", "random":"0,31" }, { "type":"string", "random":"0,31" }, { "type":"long", "random":"0,5" }, { "type":"string", "random":"0,10" }, { "type":"string", "random":"0,5" }, { "type":"string", "random":"0,31" }, { "type":"string", "random":"0,31" }, { "type":"string", "random":"0,21" }, { "type":"string", "random":"0,31" }, { "type":"long", "random":"0,10" }, { "type":"long", "random":"0,20" }, { "type":"date", "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" }, { "type":"long", "random":"0,10" }, { "type":"date", "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" }, { "type":"string", "random":"0,10" }, { "type":"long", "random":"0,10" }, { "type":"date", "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" }, { "type":"long", "random":"0,10" }, { "type":"date", "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" }, { "type":"long", "random":"0,10" }, { "type":"date", "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" }, { "type":"long", "random":"0,10" }, { "type":"date", "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" }, { "type":"string", "random":"0,100" }, { "type":"string", "random":"0,1" }, { "type":"long", "random":"0,1" }, { "type":"string", "random":"0,64" }, { "type":"string", "random":"0,20" }, { "type":"string", "random":"0,31" }, { "type":"long", "random":"0,3" }, { "type":"long", "random":"0,3" }, { "type":"long", "random":"0,19" }, { "type":"date", "random":"2022-01-01 12:00:00,2023-01-01 12:00:00" }, { "type":"string", "random":"0,1" } ], "sliceRecordCount":10 } }, "writer":{ "name":"selectdbwriter", "parameter":{ "loadUrl":[ "xxx:47150" ], "loadProps":{ "file.type":"json", "file.strip_outer_array":"true" }, "column":[ "id", "table_id", "table_no", "table_name", "table_status", "no_disturb", "dinner_type", "member_id", "reserve_bill_no", "pre_order_no", "queue_num", "person_num", "open_time", "open_time_format", "order_time", "order_time_format", "table_bill_id", "offer_time", "offer_time_format", "confirm_bill_time", "confirm_bill_time_format", "bill_time", "bill_time_format", "clear_time", "clear_time_format", "table_message", "bill_close", "table_type", "pad_mac", "company_id", "shop_id", "is_sync", "table_split_no", "ts", "ts_format", "dr" ], "username":"admin", "password":"SelectDB2022", "postSql":[ ], "preSql":[ ], "connection":[ { "jdbcUrl":"jdbc:mysql://xxx:34142/cl_test", "table":[ "ods_pos_pro_table_dynamic_delta_v4" ], "selectedDatabase":"cl_test" } ], "maxBatchRows":1000000, "maxBatchByteSize":536870912000 } } } ], "setting":{ "errorLimit":{ "percentage":0.02, "record":0 }, "speed":{ "channel":5 } } } } ``` ### 3.2 参数说明 ```text **jdbcUrl** - 描述:selectdb 的 JDBC 连接串,用户执行 preSql 或 postSQL。 - 必选:是 - 默认值:无 * **loadUrl** - 描述:作为 selecdb 的连接目标。格式为 "ip:port"。其中 IP 是 selectdb的private-link,port 是selectdb 集群的 http_port - 必选:是 - 默认值:无 * **username** - 描述:访问selectdb数据库的用户名 - 必选:是 - 默认值:无 * **password** - 描述:访问selectdb数据库的密码 - 必选:否 - 默认值:空 * **connection.selectedDatabase** - 描述:需要写入的selectdb数据库名称。 - 必选:是 - 默认值:无 * **connection.table** - 描述:需要写入的selectdb表名称。 - 必选:是 - 默认值:无 * **column** - 描述:目的表**需要写入数据**的字段,这些字段将作为生成的 Json 数据的字段名。字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 - 必选:是 - 默认值:否 * **preSql** - 描述:写入数据到目的表前,会先执行这里的标准语句。 - 必选:否 - 默认值:无 * **postSql** - 描述:写入数据到目的表后,会执行这里的标准语句。 - 必选:否 - 默认值:无 * **maxBatchRows** - 描述:每批次导入数据的最大行数。和 **batchSize** 共同控制每批次的导入数量。每批次数据达到两个阈值之一,即开始导入这一批次的数据。 - 必选:否 - 默认值:500000 * **batchSize** - 描述:每批次导入数据的最大数据量。和 **maxBatchRows** 共同控制每批次的导入数量。每批次数据达到两个阈值之一,即开始导入这一批次的数据。 - 必选:否 - 默认值:90M * **maxRetries** - 描述:每批次导入数据失败后的重试次数。 - 必选:否 - 默认值:3 * **labelPrefix** - 描述:每批次上传文件的 label 前缀。最终的 label 将有 `labelPrefix + UUID` 组成全局唯一的 label,确保数据不会重复导入 - 必选:否 - 默认值:`datax_selectdb_writer_` * **loadProps** - 描述:COPY INOT 的请求参数 这里包括导入的数据格式:file.type等,导入数据格式默认我们使用csv,支持JSON,具体可以参照下面类型转换部分 - 必选:否 - 默认值:无 * **clusterName** - 描述:selectdb could 集群名称 - 必选:否 - 默认值:无 * **flushQueueLength** - 描述:队列长度 - 必选:否 - 默认值:1 * **flushInterval** - 描述:数据写入批次的时间间隔,如果maxBatchRows 和 batchSize 参数设置的有很大,那么很可能达不到你这设置的数据量大小,会执行导入。 - 必选:否 - 默认值:30000ms ``` ### 类型转换 默认传入的数据均会被转为字符串,并以`\t`作为列分隔符,`\n`作为行分隔符,组成`csv`文件进行Selectdb导入操作。 默认是csv格式导入,如需更改列分隔符, 则正确配置 `loadProps` 即可: ```json "loadProps": { "file.column_separator": "\\x01", "file.line_delimiter": "\\x02" } ``` 如需更改导入格式为`json`, 则正确配置 `loadProps` 即可: ```json "loadProps": { "file.type": "json", "file.strip_outer_array": true } ``` ================================================ FILE: selectdbwriter/doc/stream2selectdb.json ================================================ { "core": { "transport": { "channel": { "speed": { "byte": 10485760 } } } }, "job": { "content": [ { "reader": {}, "writer": { "name": "selectdbwriter", "parameter": { "loadUrl": [ "xxx:35871" ], "loadProps": { "file.type": "json", "file.strip_outer_array": "true" }, "database": "db1", "column": [ "k1", "k2", "k3", "k4", "k5" ], "username": "admin", "password": "SelectDB2022", "postSql": [], "preSql": [], "connection": [ { "jdbcUrl": "jdbc:mysql://xxx:32386/cl_test", "table": [ "test_selectdb" ], "selectedDatabase": "cl_test" } ], "maxBatchRows": 200000, "batchSize": 53687091200 } } } ], "setting": { "errorLimit": { "percentage": 0.02, "record": 0 }, "speed": { "byte": 10485760 } } } } ================================================ FILE: selectdbwriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 selectdbwriter selectdbwriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java ${mysql.driver.version} org.apache.httpcomponents httpclient 4.5.13 com.fasterxml.jackson.core jackson-annotations 2.13.3 com.fasterxml.jackson.core jackson-core 2.13.3 com.fasterxml.jackson.core jackson-databind 2.13.3 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: selectdbwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/selectdbwriter target/ selectdbwriter-0.0.1-SNAPSHOT.jar plugin/writer/selectdbwriter false plugin/writer/selectdbwriter/libs runtime ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/BaseResponse.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.fasterxml.jackson.annotation.JsonIgnoreProperties; @JsonIgnoreProperties(ignoreUnknown = true) public class BaseResponse { private int code; private String msg; private T data; private int count; public int getCode() { return code; } public String getMsg() { return msg; } public T getData(){ return data; } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/CopyIntoResp.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.fasterxml.jackson.annotation.JsonIgnoreProperties; import java.util.Map; @JsonIgnoreProperties(ignoreUnknown = true) public class CopyIntoResp extends BaseResponse{ private String code; private String exception; private Map result; public String getDataCode() { return code; } public String getException() { return exception; } public Map getResult() { return result; } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/CopySQLBuilder.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import java.util.Map; import java.util.StringJoiner; public class CopySQLBuilder { private final static String COPY_SYNC = "copy.async"; private final String fileName; private final Keys options; private Map properties; public CopySQLBuilder(Keys options, String fileName) { this.options=options; this.fileName=fileName; this.properties=options.getLoadProps(); } public String buildCopySQL(){ StringBuilder sb = new StringBuilder(); sb.append("COPY INTO ") .append(options.getDatabase() + "." + options.getTable()) .append(" FROM @~('").append(fileName).append("') ") .append("PROPERTIES ("); //copy into must be sync properties.put(COPY_SYNC,false); StringJoiner props = new StringJoiner(","); for(Map.Entry entry : properties.entrySet()){ String key = String.valueOf(entry.getKey()); String value = String.valueOf(entry.getValue()); String prop = String.format("'%s'='%s'",key,value); props.add(prop); } sb.append(props).append(" )"); return sb.toString(); } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/DelimiterParser.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.google.common.base.Strings; import java.io.StringWriter; public class DelimiterParser { private static final String HEX_STRING = "0123456789ABCDEF"; public static String parse(String sp, String dSp) throws RuntimeException { if ( Strings.isNullOrEmpty(sp)) { return dSp; } if (!sp.toUpperCase().startsWith("\\X")) { return sp; } String hexStr = sp.substring(2); // check hex str if (hexStr.isEmpty()) { throw new RuntimeException("Failed to parse delimiter: Hex str is empty"); } if (hexStr.length() % 2 != 0) { throw new RuntimeException("Failed to parse delimiter: Hex str length error"); } for (char hexChar : hexStr.toUpperCase().toCharArray()) { if (HEX_STRING.indexOf(hexChar) == -1) { throw new RuntimeException("Failed to parse delimiter: Hex str format error"); } } // transform to separator StringWriter writer = new StringWriter(); for (byte b : hexStrToBytes(hexStr)) { writer.append((char) b); } return writer.toString(); } private static byte[] hexStrToBytes(String hexStr) { String upperHexStr = hexStr.toUpperCase(); int length = upperHexStr.length() / 2; char[] hexChars = upperHexStr.toCharArray(); byte[] bytes = new byte[length]; for (int i = 0; i < length; i++) { int pos = i * 2; bytes[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1])); } return bytes; } private static byte charToByte(char c) { return (byte) HEX_STRING.indexOf(c); } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/HttpPostBuilder.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import org.apache.commons.codec.binary.Base64; import org.apache.http.HttpEntity; import org.apache.http.HttpHeaders; import org.apache.http.client.methods.HttpPost; import java.nio.charset.StandardCharsets; import java.util.HashMap; import java.util.Map; public class HttpPostBuilder { String url; Map header; HttpEntity httpEntity; public HttpPostBuilder() { header = new HashMap<>(); } public HttpPostBuilder setUrl(String url) { this.url = url; return this; } public HttpPostBuilder addCommonHeader() { header.put(HttpHeaders.EXPECT, "100-continue"); return this; } public HttpPostBuilder baseAuth(String user, String password) { final String authInfo = user + ":" + password; byte[] encoded = Base64.encodeBase64(authInfo.getBytes(StandardCharsets.UTF_8)); header.put(HttpHeaders.AUTHORIZATION, "Basic " + new String(encoded)); return this; } public HttpPostBuilder setEntity(HttpEntity httpEntity) { this.httpEntity = httpEntity; return this; } public HttpPost build() { SelectdbUtil.checkNotNull(url); SelectdbUtil.checkNotNull(httpEntity); HttpPost put = new HttpPost(url); header.forEach(put::setHeader); put.setEntity(httpEntity); return put; } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/HttpPutBuilder.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import org.apache.commons.codec.binary.Base64; import org.apache.http.HttpEntity; import org.apache.http.HttpHeaders; import org.apache.http.client.methods.HttpPut; import org.apache.http.entity.StringEntity; import java.nio.charset.StandardCharsets; import java.util.HashMap; import java.util.Map; public class HttpPutBuilder { String url; Map header; HttpEntity httpEntity; public HttpPutBuilder() { header = new HashMap<>(); } public HttpPutBuilder setUrl(String url) { this.url = url; return this; } public HttpPutBuilder addFileName(String fileName){ header.put("fileName", fileName); return this; } public HttpPutBuilder setEmptyEntity() { try { this.httpEntity = new StringEntity(""); } catch (Exception e) { throw new IllegalArgumentException(e); } return this; } public HttpPutBuilder addCommonHeader() { header.put(HttpHeaders.EXPECT, "100-continue"); return this; } public HttpPutBuilder baseAuth(String user, String password) { final String authInfo = user + ":" + password; byte[] encoded = Base64.encodeBase64(authInfo.getBytes(StandardCharsets.UTF_8)); header.put(HttpHeaders.AUTHORIZATION, "Basic " + new String(encoded)); return this; } public HttpPutBuilder setEntity(HttpEntity httpEntity) { this.httpEntity = httpEntity; return this; } public HttpPut build() { SelectdbUtil.checkNotNull(url); SelectdbUtil.checkNotNull(httpEntity); HttpPut put = new HttpPut(url); header.forEach(put::setHeader); put.setEntity(httpEntity); return put; } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/Keys.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import java.io.Serializable; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class Keys implements Serializable { private static final long serialVersionUID = 1l; private static final int DEFAULT_MAX_RETRIES = 3; private static final int BATCH_ROWS = 500000; private static final long DEFAULT_FLUSH_INTERVAL = 30000; private static final String LOAD_PROPS_FORMAT = "file.type"; public enum StreamLoadFormat { CSV, JSON; } private static final String USERNAME = "username"; private static final String PASSWORD = "password"; private static final String DATABASE = "connection[0].selectedDatabase"; private static final String TABLE = "connection[0].table[0]"; private static final String COLUMN = "column"; private static final String PRE_SQL = "preSql"; private static final String POST_SQL = "postSql"; private static final String JDBC_URL = "connection[0].jdbcUrl"; private static final String LABEL_PREFIX = "labelPrefix"; private static final String MAX_BATCH_ROWS = "maxBatchRows"; private static final String MAX_BATCH_SIZE = "batchSize"; private static final String FLUSH_INTERVAL = "flushInterval"; private static final String LOAD_URL = "loadUrl"; private static final String FLUSH_QUEUE_LENGTH = "flushQueueLength"; private static final String LOAD_PROPS = "loadProps"; private static final String DEFAULT_LABEL_PREFIX = "datax_selectdb_writer_"; private static final long DEFAULT_MAX_BATCH_SIZE = 90 * 1024 * 1024; //default 90M private static final String CLUSTER_NAME = "clusterName"; private static final String MAX_RETRIES = "maxRetries"; private final Configuration options; private List infoSchemaColumns; private List userSetColumns; private boolean isWildcardColumn; public Keys ( Configuration options) { this.options = options; this.userSetColumns = options.getList(COLUMN, String.class).stream().map(str -> str.replace("`", "")).collect(Collectors.toList()); if (1 == options.getList(COLUMN, String.class).size() && "*".trim().equals(options.getList(COLUMN, String.class).get(0))) { this.isWildcardColumn = true; } } public void doPretreatment() { validateRequired(); validateStreamLoadUrl(); } public String getJdbcUrl() { return options.getString(JDBC_URL); } public String getDatabase() { return options.getString(DATABASE); } public String getTable() { return options.getString(TABLE); } public String getUsername() { return options.getString(USERNAME); } public String getPassword() { return options.getString(PASSWORD); } public String getClusterName(){ return options.getString(CLUSTER_NAME); } public String getLabelPrefix() { String label = options.getString(LABEL_PREFIX); return null == label ? DEFAULT_LABEL_PREFIX : label; } public List getLoadUrlList() { return options.getList(LOAD_URL, String.class); } public List getColumns() { if (isWildcardColumn) { return this.infoSchemaColumns; } return this.userSetColumns; } public boolean isWildcardColumn() { return this.isWildcardColumn; } public void setInfoCchemaColumns(List cols) { this.infoSchemaColumns = cols; } public List getPreSqlList() { return options.getList(PRE_SQL, String.class); } public List getPostSqlList() { return options.getList(POST_SQL, String.class); } public Map getLoadProps() { return options.getMap(LOAD_PROPS); } public int getMaxRetries() { Integer retries = options.getInt(MAX_RETRIES); return null == retries ? DEFAULT_MAX_RETRIES : retries; } public int getBatchRows() { Integer rows = options.getInt(MAX_BATCH_ROWS); return null == rows ? BATCH_ROWS : rows; } public long getBatchSize() { Long size = options.getLong(MAX_BATCH_SIZE); return null == size ? DEFAULT_MAX_BATCH_SIZE : size; } public long getFlushInterval() { Long interval = options.getLong(FLUSH_INTERVAL); return null == interval ? DEFAULT_FLUSH_INTERVAL : interval; } public int getFlushQueueLength() { Integer len = options.getInt(FLUSH_QUEUE_LENGTH); return null == len ? 1 : len; } public StreamLoadFormat getStreamLoadFormat() { Map loadProps = getLoadProps(); if (null == loadProps) { return StreamLoadFormat.CSV; } if (loadProps.containsKey(LOAD_PROPS_FORMAT) && StreamLoadFormat.JSON.name().equalsIgnoreCase(String.valueOf(loadProps.get(LOAD_PROPS_FORMAT)))) { return StreamLoadFormat.JSON; } return StreamLoadFormat.CSV; } private void validateStreamLoadUrl() { List urlList = getLoadUrlList(); for (String host : urlList) { if (host.split(":").length < 2) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, "The format of loadUrl is not correct, please enter:[`fe_ip:fe_http_ip;fe_ip:fe_http_ip`]."); } } } private void validateRequired() { final String[] requiredOptionKeys = new String[]{ USERNAME, DATABASE, TABLE, COLUMN, LOAD_URL }; for (String optionKey : requiredOptionKeys) { options.getNecessaryValue(optionKey, DBUtilErrorCode.REQUIRED_VALUE); } } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbBaseCodec.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.alibaba.datax.common.element.Column; public class SelectdbBaseCodec { protected String convertionField( Column col) { if (null == col.getRawData() || Column.Type.NULL == col.getType()) { return null; } if ( Column.Type.BOOL == col.getType()) { return String.valueOf(col.asLong()); } if ( Column.Type.BYTES == col.getType()) { byte[] bts = (byte[])col.getRawData(); long value = 0; for (int i = 0; i < bts.length; i++) { value += (bts[bts.length - i - 1] & 0xffL) << (8 * i); } return String.valueOf(value); } return col.asString(); } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCodec.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.alibaba.datax.common.element.Record; import java.io.Serializable; public interface SelectdbCodec extends Serializable { String codec( Record row); } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCodecFactory.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import java.util.Map; public class SelectdbCodecFactory { public SelectdbCodecFactory (){ } public static SelectdbCodec createCodec( Keys writerOptions) { if ( Keys.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { Map props = writerOptions.getLoadProps(); return new SelectdbCsvCodec (null == props || !props.containsKey("file.column_separator") ? null : String.valueOf(props.get("file.column_separator"))); } if ( Keys.StreamLoadFormat.JSON.equals(writerOptions.getStreamLoadFormat())) { return new SelectdbJsonCodec (writerOptions.getColumns()); } throw new RuntimeException("Failed to create row serializer, unsupported `format` from stream load properties."); } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCopyIntoObserver.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.fasterxml.jackson.core.type.TypeReference; import com.fasterxml.jackson.databind.ObjectMapper; import org.apache.commons.lang3.StringUtils; import org.apache.http.Header; import org.apache.http.HttpEntity; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.entity.InputStreamEntity; import org.apache.http.entity.StringEntity; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClientBuilder; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.ByteArrayInputStream; import java.io.IOException; import java.net.HttpURLConnection; import java.net.URL; import java.nio.ByteBuffer; import java.nio.charset.StandardCharsets; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.regex.Pattern; public class SelectdbCopyIntoObserver { private static final Logger LOG = LoggerFactory.getLogger(SelectdbCopyIntoObserver.class); private Keys options; private long pos; public static final int SUCCESS = 0; public static final String FAIL = "1"; private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); private final HttpClientBuilder httpClientBuilder = HttpClients .custom() .disableRedirectHandling(); private CloseableHttpClient httpClient; private static final String UPLOAD_URL_PATTERN = "%s/copy/upload"; private static final String COMMIT_PATTERN = "%s/copy/query"; private static final Pattern COMMITTED_PATTERN = Pattern.compile("errCode = 2, detailMessage = No files can be copied, matched (\\d+) files, " + "filtered (\\d+) files because files may be loading or loaded"); public SelectdbCopyIntoObserver(Keys options) { this.options = options; this.httpClient = httpClientBuilder.build(); } public void streamLoad(WriterTuple data) throws Exception { String host = getLoadHost(); if (host == null) { throw new RuntimeException("load_url cannot be empty, or the host cannot connect.Please check your configuration."); } String loadUrl = String.format(UPLOAD_URL_PATTERN, host); String uploadAddress = getUploadAddress(loadUrl, data.getLabel()); put(uploadAddress, data.getLabel(), addRows(data.getRows(), data.getBytes().intValue())); executeCopy(host,data.getLabel()); } private String getUploadAddress(String loadUrl, String fileName) throws IOException { HttpPutBuilder putBuilder = new HttpPutBuilder(); putBuilder.setUrl(loadUrl) .addFileName(fileName) .addCommonHeader() .setEmptyEntity() .baseAuth(options.getUsername(), options.getPassword()); CloseableHttpResponse execute = httpClientBuilder.build().execute(putBuilder.build()); int statusCode = execute.getStatusLine().getStatusCode(); String reason = execute.getStatusLine().getReasonPhrase(); if (statusCode == 307) { Header location = execute.getFirstHeader("location"); String uploadAddress = location.getValue(); LOG.info("redirect to s3:{}", uploadAddress); return uploadAddress; } else { HttpEntity entity = execute.getEntity(); String result = entity == null ? null : EntityUtils.toString(entity); LOG.error("Failed get the redirected address, status {}, reason {}, response {}", statusCode, reason, result); throw new RuntimeException("Could not get the redirected address."); } } private byte[] addRows(List rows, int totalBytes) { if (Keys.StreamLoadFormat.CSV.equals(options.getStreamLoadFormat())) { Map props = (options.getLoadProps() == null ? new HashMap<>() : options.getLoadProps()); byte[] lineDelimiter = DelimiterParser.parse((String) props.get("file.line_delimiter"), "\n").getBytes(StandardCharsets.UTF_8); ByteBuffer bos = ByteBuffer.allocate(totalBytes + rows.size() * lineDelimiter.length); for (byte[] row : rows) { bos.put(row); bos.put(lineDelimiter); } return bos.array(); } if (Keys.StreamLoadFormat.JSON.equals(options.getStreamLoadFormat())) { ByteBuffer bos = ByteBuffer.allocate(totalBytes + (rows.isEmpty() ? 2 : rows.size() + 1)); bos.put("[".getBytes(StandardCharsets.UTF_8)); byte[] jsonDelimiter = ",".getBytes(StandardCharsets.UTF_8); boolean isFirstElement = true; for (byte[] row : rows) { if (!isFirstElement) { bos.put(jsonDelimiter); } bos.put(row); isFirstElement = false; } bos.put("]".getBytes(StandardCharsets.UTF_8)); return bos.array(); } throw new RuntimeException("Failed to join rows data, unsupported `file.type` from copy into properties:"); } public void put(String loadUrl, String fileName, byte[] data) throws IOException { LOG.info(String.format("Executing upload file to: '%s', size: '%s'", loadUrl, data.length)); HttpPutBuilder putBuilder = new HttpPutBuilder(); putBuilder.setUrl(loadUrl) .addCommonHeader() .setEntity(new InputStreamEntity(new ByteArrayInputStream(data))); CloseableHttpResponse response = httpClient.execute(putBuilder.build()); final int statusCode = response.getStatusLine().getStatusCode(); if (statusCode != 200) { String result = response.getEntity() == null ? null : EntityUtils.toString(response.getEntity()); LOG.error("upload file {} error, response {}", fileName, result); throw new SelectdbWriterException("upload file error: " + fileName,true); } } private String getLoadHost() { List hostList = options.getLoadUrlList(); long tmp = pos + hostList.size(); for (; pos < tmp; pos++) { String host = new StringBuilder("http://").append(hostList.get((int) (pos % hostList.size()))).toString(); if (checkConnection(host)) { return host; } } return null; } private boolean checkConnection(String host) { try { URL url = new URL(host); HttpURLConnection co = (HttpURLConnection) url.openConnection(); co.setConnectTimeout(5000); co.connect(); co.disconnect(); return true; } catch (Exception e1) { e1.printStackTrace(); return false; } } /** * execute copy into */ public void executeCopy(String hostPort, String fileName) throws IOException{ long start = System.currentTimeMillis(); CopySQLBuilder copySQLBuilder = new CopySQLBuilder(options, fileName); String copySQL = copySQLBuilder.buildCopySQL(); LOG.info("build copy SQL is {}", copySQL); Map params = new HashMap<>(); params.put("sql", copySQL); if(StringUtils.isNotBlank(options.getClusterName())){ params.put("cluster",options.getClusterName()); } HttpPostBuilder postBuilder = new HttpPostBuilder(); postBuilder.setUrl(String.format(COMMIT_PATTERN, hostPort)) .baseAuth(options.getUsername(), options.getPassword()) .setEntity(new StringEntity(OBJECT_MAPPER.writeValueAsString(params))); CloseableHttpResponse response = httpClient.execute(postBuilder.build()); final int statusCode = response.getStatusLine().getStatusCode(); final String reasonPhrase = response.getStatusLine().getReasonPhrase(); String loadResult = ""; if (statusCode != 200) { LOG.warn("commit failed with status {} {}, reason {}", statusCode, hostPort, reasonPhrase); throw new SelectdbWriterException("commit error with file: " + fileName,true); } else if (response.getEntity() != null){ loadResult = EntityUtils.toString(response.getEntity()); boolean success = handleCommitResponse(loadResult); if(success){ LOG.info("commit success cost {}ms, response is {}", System.currentTimeMillis() - start, loadResult); }else{ throw new SelectdbWriterException("commit fail",true); } } } public boolean handleCommitResponse(String loadResult) throws IOException { BaseResponse baseResponse = OBJECT_MAPPER.readValue(loadResult, new TypeReference>(){}); if(baseResponse.getCode() == SUCCESS){ CopyIntoResp dataResp = baseResponse.getData(); if(FAIL.equals(dataResp.getDataCode())){ LOG.error("copy into execute failed, reason:{}", loadResult); return false; }else{ Map result = dataResp.getResult(); if(!result.get("state").equals("FINISHED") && !isCommitted(result.get("msg"))){ LOG.error("copy into load failed, reason:{}", loadResult); return false; }else{ return true; } } }else{ LOG.error("commit failed, reason:{}", loadResult); return false; } } public static boolean isCommitted(String msg) { return COMMITTED_PATTERN.matcher(msg).matches(); } public void close() throws IOException { if (null != httpClient) { try { httpClient.close(); } catch (IOException e) { LOG.error("Closing httpClient failed.", e); throw new RuntimeException("Closing httpClient failed.", e); } } } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbCsvCodec.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.alibaba.datax.common.element.Record; public class SelectdbCsvCodec extends SelectdbBaseCodec implements SelectdbCodec { private static final long serialVersionUID = 1L; private final String columnSeparator; public SelectdbCsvCodec ( String sp) { this.columnSeparator = DelimiterParser.parse(sp, "\t"); } @Override public String codec( Record row) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < row.getColumnNumber(); i++) { String value = convertionField(row.getColumn(i)); sb.append(null == value ? "\\N" : value); if (i < row.getColumnNumber() - 1) { sb.append(columnSeparator); } } return sb.toString(); } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbJsonCodec.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.alibaba.datax.common.element.Record; import com.alibaba.fastjson2.JSON; import java.util.HashMap; import java.util.List; import java.util.Map; public class SelectdbJsonCodec extends SelectdbBaseCodec implements SelectdbCodec { private static final long serialVersionUID = 1L; private final List fieldNames; public SelectdbJsonCodec ( List fieldNames) { this.fieldNames = fieldNames; } @Override public String codec( Record row) { if (null == fieldNames) { return ""; } Map rowMap = new HashMap<> (fieldNames.size()); int idx = 0; for (String fieldName : fieldNames) { rowMap.put(fieldName, convertionField(row.getColumn(idx))); idx++; } return JSON.toJSONString(rowMap); } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbUtil.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.druid.sql.parser.ParserException; import com.google.common.base.Strings; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.util.ArrayList; import java.util.Collections; import java.util.List; /** * jdbc util */ public class SelectdbUtil { private static final Logger LOG = LoggerFactory.getLogger(SelectdbUtil.class); private SelectdbUtil() {} public static List getDorisTableColumns( Connection conn, String databaseName, String tableName) { String currentSql = String.format("SELECT COLUMN_NAME FROM `information_schema`.`COLUMNS` WHERE `TABLE_SCHEMA` = '%s' AND `TABLE_NAME` = '%s' ORDER BY `ORDINAL_POSITION` ASC;", databaseName, tableName); List columns = new ArrayList<> (); ResultSet rs = null; try { rs = DBUtil.query(conn, currentSql); while (DBUtil.asyncResultSetNext(rs)) { String colName = rs.getString("COLUMN_NAME"); columns.add(colName); } return columns; } catch (Exception e) { throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); } finally { DBUtil.closeDBResources(rs, null, null); } } public static List renderPreOrPostSqls(List preOrPostSqls, String tableName) { if (null == preOrPostSqls) { return Collections.emptyList(); } List renderedSqls = new ArrayList<>(); for (String sql : preOrPostSqls) { if (! Strings.isNullOrEmpty(sql)) { renderedSqls.add(sql.replace(Constant.TABLE_NAME_PLACEHOLDER, tableName)); } } return renderedSqls; } public static void executeSqls(Connection conn, List sqls) { Statement stmt = null; String currentSql = null; try { stmt = conn.createStatement(); for (String sql : sqls) { currentSql = sql; DBUtil.executeSqlWithoutResultSet(stmt, sql); } } catch (Exception e) { throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); } finally { DBUtil.closeDBResources(null, stmt, null); } } public static void preCheckPrePareSQL( Keys options) { String table = options.getTable(); List preSqls = options.getPreSqlList(); List renderedPreSqls = SelectdbUtil.renderPreOrPostSqls(preSqls, table); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { LOG.info("Begin to preCheck preSqls:[{}].", String.join(";", renderedPreSqls)); for (String sql : renderedPreSqls) { try { DBUtil.sqlValid(sql, DataBaseType.MySql); } catch ( ParserException e) { throw RdbmsException.asPreSQLParserException(DataBaseType.MySql,e,sql); } } } } public static void preCheckPostSQL( Keys options) { String table = options.getTable(); List postSqls = options.getPostSqlList(); List renderedPostSqls = SelectdbUtil.renderPreOrPostSqls(postSqls, table); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { LOG.info("Begin to preCheck postSqls:[{}].", String.join(";", renderedPostSqls)); for(String sql : renderedPostSqls) { try { DBUtil.sqlValid(sql, DataBaseType.MySql); } catch (ParserException e){ throw RdbmsException.asPostSQLParserException(DataBaseType.MySql,e,sql); } } } } public static T checkNotNull(T reference) { if (reference == null) { throw new NullPointerException(); } else { return reference; } } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriter.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.List; /** * doris data writer */ public class SelectdbWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig = null; private Keys options; @Override public void init() { this.originalConfig = super.getPluginJobConf(); options = new Keys (super.getPluginJobConf()); options.doPretreatment(); } @Override public void preCheck(){ this.init(); SelectdbUtil.preCheckPrePareSQL(options); SelectdbUtil.preCheckPostSQL(options); } @Override public void prepare() { String username = options.getUsername(); String password = options.getPassword(); String jdbcUrl = options.getJdbcUrl(); List renderedPreSqls = SelectdbUtil.renderPreOrPostSqls(options.getPreSqlList(), options.getTable()); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); LOG.info("Begin to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPreSqls), jdbcUrl); SelectdbUtil.executeSqls(conn, renderedPreSqls); DBUtil.closeDBResources(null, null, conn); } } @Override public List split(int mandatoryNumber) { List configurations = new ArrayList<>(mandatoryNumber); for (int i = 0; i < mandatoryNumber; i++) { configurations.add(originalConfig); } return configurations; } @Override public void post() { String username = options.getUsername(); String password = options.getPassword(); String jdbcUrl = options.getJdbcUrl(); List renderedPostSqls = SelectdbUtil.renderPreOrPostSqls(options.getPostSqlList(), options.getTable()); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); LOG.info("Start to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPostSqls), jdbcUrl); SelectdbUtil.executeSqls(conn, renderedPostSqls); DBUtil.closeDBResources(null, null, conn); } } @Override public void destroy() { } } public static class Task extends Writer.Task { private SelectdbWriterManager writerManager; private Keys options; private SelectdbCodec rowCodec; @Override public void init() { options = new Keys (super.getPluginJobConf()); if (options.isWildcardColumn()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, options.getJdbcUrl(), options.getUsername(), options.getPassword()); List columns = SelectdbUtil.getDorisTableColumns(conn, options.getDatabase(), options.getTable()); options.setInfoCchemaColumns(columns); } writerManager = new SelectdbWriterManager(options); rowCodec = SelectdbCodecFactory.createCodec(options); } @Override public void prepare() { } public void startWrite(RecordReceiver recordReceiver) { try { Record record; while ((record = recordReceiver.getFromReader()) != null) { if (record.getColumnNumber() != options.getColumns().size()) { throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "There is an error in the column configuration information. " + "This is because you have configured a task where the number of fields to be read from the source:%s " + "is not equal to the number of fields to be written to the destination table:%s. " + "Please check your configuration and make changes.", record.getColumnNumber(), options.getColumns().size())); } writerManager.writeRecord(rowCodec.codec(record)); } } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); } } @Override public void post() { try { writerManager.close(); } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); } } @Override public void destroy() {} @Override public boolean supportFailOver(){ return false; } } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriterException.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; public class SelectdbWriterException extends RuntimeException { private boolean reCreateLabel; public SelectdbWriterException() { super(); } public SelectdbWriterException(String message) { super(message); } public SelectdbWriterException(String message, boolean reCreateLabel) { super(message); this.reCreateLabel = reCreateLabel; } public SelectdbWriterException(String message, Throwable cause) { super(message, cause); } public SelectdbWriterException(Throwable cause) { super(cause); } protected SelectdbWriterException(String message, Throwable cause, boolean enableSuppression, boolean writableStackTrace) { super(message, cause, enableSuppression, writableStackTrace); } public boolean needReCreateLabel() { return reCreateLabel; } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/SelectdbWriterManager.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import com.google.common.base.Strings; import org.apache.commons.lang3.concurrent.BasicThreadFactory; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; import java.util.UUID; import java.util.concurrent.Executors; import java.util.concurrent.LinkedBlockingDeque; import java.util.concurrent.ScheduledExecutorService; import java.util.concurrent.ScheduledFuture; import java.util.concurrent.TimeUnit; public class SelectdbWriterManager { private static final Logger LOG = LoggerFactory.getLogger(SelectdbWriterManager.class); private final SelectdbCopyIntoObserver visitor; private final Keys options; private final List buffer = new ArrayList<>(); private int batchCount = 0; private long batchSize = 0; private volatile boolean closed = false; private volatile Exception flushException; private final LinkedBlockingDeque flushQueue; private ScheduledExecutorService scheduler; private ScheduledFuture scheduledFuture; public SelectdbWriterManager(Keys options) { this.options = options; this.visitor = new SelectdbCopyIntoObserver(options); flushQueue = new LinkedBlockingDeque<>(options.getFlushQueueLength()); this.startScheduler(); this.startAsyncFlushing(); } public void startScheduler() { stopScheduler(); this.scheduler = Executors.newScheduledThreadPool(1, new BasicThreadFactory.Builder().namingPattern("Doris-interval-flush").daemon(true).build()); this.scheduledFuture = this.scheduler.schedule(() -> { synchronized (SelectdbWriterManager.this) { if (!closed) { try { String label = createBatchLabel(); LOG.info(String.format("Selectdb interval Sinking triggered: label[%s].", label)); if (batchCount == 0) { startScheduler(); } flush(label, false); } catch (Exception e) { flushException = e; } } } }, options.getFlushInterval(), TimeUnit.MILLISECONDS); } public void stopScheduler() { if (this.scheduledFuture != null) { scheduledFuture.cancel(false); this.scheduler.shutdown(); } } public final synchronized void writeRecord(String record) throws IOException { checkFlushException(); try { byte[] bts = record.getBytes(StandardCharsets.UTF_8); buffer.add(bts); batchCount++; batchSize += bts.length; if (batchCount >= options.getBatchRows() || batchSize >= options.getBatchSize()) { String label = createBatchLabel(); if(LOG.isDebugEnabled()){ LOG.debug(String.format("buffer Sinking triggered: rows[%d] label [%s].", batchCount, label)); } flush(label, false); } } catch (Exception e) { throw new SelectdbWriterException("Writing records to selectdb failed.", e); } } public synchronized void flush(String label, boolean waitUtilDone) throws Exception { checkFlushException(); if (batchCount == 0) { if (waitUtilDone) { waitAsyncFlushingDone(); } return; } flushQueue.put(new WriterTuple(label, batchSize, new ArrayList<>(buffer))); if (waitUtilDone) { // wait the last flush waitAsyncFlushingDone(); } buffer.clear(); batchCount = 0; batchSize = 0; } public synchronized void close() throws IOException { if (!closed) { closed = true; try { String label = createBatchLabel(); if (batchCount > 0) { if (LOG.isDebugEnabled()) { LOG.debug(String.format("Selectdb Sink is about to close: label[%s].", label)); } } flush(label, true); } catch (Exception e) { throw new RuntimeException("Writing records to Selectdb failed.", e); } } checkFlushException(); } public String createBatchLabel() { StringBuilder sb = new StringBuilder(); if (!Strings.isNullOrEmpty(options.getLabelPrefix())) { sb.append(options.getLabelPrefix()); } return sb.append(UUID.randomUUID().toString()) .toString(); } private void startAsyncFlushing() { // start flush thread Thread flushThread = new Thread(new Runnable() { public void run() { while (true) { try { asyncFlush(); } catch (Exception e) { flushException = e; } } } }); flushThread.setDaemon(true); flushThread.start(); } private void waitAsyncFlushingDone() throws InterruptedException { // wait previous flushings for (int i = 0; i <= options.getFlushQueueLength(); i++) { flushQueue.put(new WriterTuple("", 0l, null)); } checkFlushException(); } private void asyncFlush() throws Exception { WriterTuple flushData = flushQueue.take(); if (Strings.isNullOrEmpty(flushData.getLabel())) { return; } stopScheduler(); for (int i = 0; i <= options.getMaxRetries(); i++) { try { // copy into visitor.streamLoad(flushData); startScheduler(); break; } catch (Exception e) { LOG.warn("Failed to flush batch data to selectdb, retry times = {}", i, e); if (i >= options.getMaxRetries()) { throw new RuntimeException(e); } if (e instanceof SelectdbWriterException && ((SelectdbWriterException)e).needReCreateLabel()) { String newLabel = createBatchLabel(); LOG.warn(String.format("Batch label changed from [%s] to [%s]", flushData.getLabel(), newLabel)); flushData.setLabel(newLabel); } try { Thread.sleep(1000l * Math.min(i + 1, 100)); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); throw new RuntimeException("Unable to flush, interrupted while doing another attempt", e); } } } } private void checkFlushException() { if (flushException != null) { throw new RuntimeException("Writing records to selectdb failed.", flushException); } } } ================================================ FILE: selectdbwriter/src/main/java/com/alibaba/datax/plugin/writer/selectdbwriter/WriterTuple.java ================================================ package com.alibaba.datax.plugin.writer.selectdbwriter; import java.util.List; public class WriterTuple { private String label; private Long bytes; private List rows; public WriterTuple ( String label, Long bytes, List rows){ this.label = label; this.rows = rows; this.bytes = bytes; } public String getLabel() { return label; } public void setLabel(String label) { this.label = label; } public Long getBytes() { return bytes; } public List getRows() { return rows; } } ================================================ FILE: selectdbwriter/src/main/resources/plugin.json ================================================ { "name": "selectdbwriter", "class": "com.alibaba.datax.plugin.writer.selectdbwriter.SelectdbWriter", "description": "selectdb writer plugin", "developer": "selectdb" } ================================================ FILE: selectdbwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "selectdbwriter", "parameter": { "username": "", "password": "", "column": [], "preSql": [], "postSql": [], "loadUrl": [], "loadProps": {}, "connection": [ { "jdbcUrl": "", "selectedDatabase": "", "table": [] } ] } } ================================================ FILE: sqlserverreader/doc/sqlserverreader.md ================================================ # SqlServerReader 插件文档 ___ ## 1 快速介绍 SqlServerReader插件实现了从SqlServer读取数据。在底层实现上,SqlServerReader通过JDBC连接远程SqlServer数据库,并执行相应的sql语句将数据从SqlServer库中SELECT出来。 ## 2 实现原理 简而言之,SqlServerReader通过JDBC连接器连接到远程的SqlServer数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程SqlServer数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,SqlServerReader将其拼接为SQL语句发送到SqlServer数据库;对于用户配置querySql信息,SqlServer直接将其发送到SqlServer数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从SqlServer数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { "byte": 1048576 } }, "content": [ { "reader": { "name": "sqlserverreader", "parameter": { // 数据库连接用户名 "username": "root", // 数据库连接密码 "password": "root", "column": [ "id" ], "splitPk": "db_id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:sqlserver://localhost:3433;DatabaseName=dbname" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "print": true, "encoding": "UTF-8" } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "speed": 1048576 }, "content": [ { "reader": { "name": "sqlserverreader", "parameter": { "username": "root", "password": "root", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10;" ], "jdbcUrl": [ "jdbc:sqlserver://bad_ip:3433;DatabaseName=dbname", "jdbc:sqlserver://127.0.0.1:bad_port;DatabaseName=dbname", "jdbc:sqlserver://127.0.0.1:3306;DatabaseName=dbname" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "visible": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,SqlServerReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,SqlServerReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 jdbcUrl按照SqlServer官方规范,并可以填写连接附件控制信息。具体请参看[SqlServer官方文档](http://technet.microsoft.com/zh-cn/library/ms378749(v=SQL.110).aspx)。 * 必选:是
    * 默认值:无
    * **username** * 描述:数据源的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:数据源指定用户名的密码
    * 必选:是
    * 默认值:无
    * **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,SqlServerReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
    * 必选:是
    * 默认值:无
    * **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如["\*"]。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照JSON格式: ["id", "[table]", "1", "'bazhen.csy'", "null", "COUNT(*)", "2.3" , "true"] id为普通列名,[table]为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 column必须用户显示指定同步的列集合,不允许为空! * 必选:是
    * 默认值:无
    * **splitPk** * 描述:SqlServerReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形型数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,SqlServerReader将报错! splitPk设置为空,底层将视作用户不允许对单表进行切分,因此使用单通道进行抽取。 * 必选:否
    * 默认值:无
    * **where** * 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
    where条件可以有效地进行业务增量同步。如果该值为空,代表同步全表所有的信息。 * 必选:否
    * 默认值:无
    * **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
    `当用户配置querySql时,SqlServerReader直接忽略table、column、where条件的配置`。 * 必选:否
    * 默认值:无
    * **fetchSize** * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
    `注意,该值过大(>2048)可能造成DataX进程OOM。`。 * 必选:否
    * 默认值:1024
    ### 3.3 类型转换 目前SqlServerReader支持大部分SqlServer类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出SqlServerReader针对SqlServer类型转换列表: | DataX 内部类型| SqlServer 数据类型 | | -------- | ----- | | Long |bigint, int, smallint, tinyint| | Double |float, decimal, real, numeric| |String |char,nchar,ntext,nvarchar,text,varchar,nvarchar(MAX),varchar(MAX)| | Date |date, datetime, time | | Boolean |bit| | Bytes |binary,varbinary,varbinary(MAX),timestamp| 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 * `timestamp类型作为二进制类型`。 ## 4 性能报告 暂无 ## 5 约束限制 ### 5.1 主备同步数据恢复问题 主备同步问题指SqlServer使用主从灾备,备库从主库不间断通过binlog恢复数据。由于主备数据同步存在一定的时间差,特别在于某些特定情况,例如网络延迟等问题,导致备库同步恢复的数据与主库有较大差别,导致从备库同步的数据不是一份当前时间的完整镜像。 针对这个问题,我们提供了preSql功能,该功能待补充。 ### 5.2 一致性约束 SqlServer在数据存储划分中属于RDBMS系统,对外可以提供强一致性数据查询接口。例如当一次同步任务启动运行过程中,当该库存在其他数据写入方写入数据时,SqlServerReader完全不会获取到写入更新数据,这是由于数据库本身的快照特性决定的。关于数据库快照特性,请参看[MVCC Wikipedia](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) 上述是在SqlServerReader单线程模型下数据同步一致性的特性,由于SqlServerReader可以根据用户配置信息使用了并发数据抽取,因此不能严格保证数据一致性:当SqlServerReader根据splitPk进行数据切分后,会先后启动多个并发任务完成数据同步。由于多个并发任务相互之间不属于同一个读事务,同时多个并发任务存在时间间隔。因此这份数据并不是`完整的`、`一致的`数据快照信息。 针对多线程的一致性快照需求,在技术上目前无法实现,只能从工程角度解决,工程化的方式存在取舍,我们提供几个解决思路给用户,用户可以自行选择: 1. 使用单线程同步,即不再进行数据切片。缺点是速度比较慢,但是能够很好保证一致性。 2. 关闭其他数据写入方,保证当前数据为静态数据,例如,锁表、关闭备库同步等等。缺点是可能影响在线业务。 ### 5.3 数据库编码问题 SqlServerReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此SqlServerReader不需用户指定编码,可以自动识别编码并转码。 ### 5.4 增量数据同步 SqlServerReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种: * 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,SqlServerReader只需要WHERE条件跟上一同步阶段时间戳即可。 * 对于新增流水型数据,SqlServerReader可以WHERE条件后跟上一阶段最大自增ID即可。 对于业务上无字段区分新增、修改数据情况,SqlServerReader也无法进行增量数据同步,只能同步全量数据。 ### 5.5 Sql安全性 SqlServerReader提供querySql语句交给用户自己实现SELECT抽取语句,SqlServerReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。 ## 6 FAQ ================================================ FILE: sqlserverreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT sqlserverreader com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.microsoft.sqlserver sqljdbc4 4.0 com.alibaba.datax plugin-rdbms-util ${datax-project-version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: sqlserverreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/sqlserverreader target/ sqlserverreader-0.0.1-SNAPSHOT.jar plugin/reader/sqlserverreader false plugin/reader/sqlserverreader/libs runtime ================================================ FILE: sqlserverreader/src/main/java/com/alibaba/datax/plugin/reader/sqlserverreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.sqlserverreader; public class Constant { public static final int DEFAULT_FETCH_SIZE = 1024; } ================================================ FILE: sqlserverreader/src/main/java/com/alibaba/datax/plugin/reader/sqlserverreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.sqlserverreader; public class Key { public static final String FETCH_SIZE = "fetchSize"; } ================================================ FILE: sqlserverreader/src/main/java/com/alibaba/datax/plugin/reader/sqlserverreader/SqlServerReader.java ================================================ package com.alibaba.datax.plugin.reader.sqlserverreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class SqlServerReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.SQLServer; public static class Job extends Reader.Job { private Configuration originalConfig = null; private CommonRdbmsReader.Job commonRdbmsReaderJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); int fetchSize = this.originalConfig.getInt( com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constant.DEFAULT_FETCH_SIZE); if (fetchSize < 1) { throw DataXException .asDataXException(DBUtilErrorCode.REQUIRED_VALUE, String.format("您配置的fetchSize有误,根据DataX的设计,fetchSize : [%d] 设置值不能小于 1.", fetchSize)); } this.originalConfig.set( com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); this.commonRdbmsReaderJob = new CommonRdbmsReader.Job( DATABASE_TYPE); this.commonRdbmsReaderJob.init(this.originalConfig); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderJob.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderJob.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderTask; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderTask = new CommonRdbmsReader.Task( DATABASE_TYPE ,super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderTask.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig .getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderTask.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); } } } ================================================ FILE: sqlserverreader/src/main/java/com/alibaba/datax/plugin/reader/sqlserverreader/SqlServerReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.sqlserverreader; import com.alibaba.datax.common.spi.ErrorCode; public enum SqlServerReaderErrorCode implements ErrorCode { ; private String code; private String description; private SqlServerReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } } ================================================ FILE: sqlserverreader/src/main/resources/plugin.json ================================================ { "name": "sqlserverreader", "class": "com.alibaba.datax.plugin.reader.sqlserverreader.SqlServerReader", "description": "useScene: test. mechanism: use datax framework to transport data from SQL Server. warn: The more you know about the data, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: sqlserverreader/src/main/resources/plugin_job_template.json ================================================ { "name": "sqlserverreader", "parameter": { "username": "", "password": "", "connection": [ { "table": [], "jdbcUrl": [] } ] } } ================================================ FILE: sqlserverwriter/doc/sqlserverwriter.md ================================================ # DataX SqlServerWriter --- ## 1 快速介绍 SqlServerWriter 插件实现了写入数据到 SqlServer 库的目的表的功能。在底层实现上, SqlServerWriter 通过 JDBC 连接远程 SqlServer 数据库,并执行相应的 insert into ... sql 语句将数据写入 SqlServer,内部会分批次提交入库。 SqlServerWriter 面向ETL开发工程师,他们使用 SqlServerWriter 从数仓导入数据到 SqlServer。同时 SqlServerWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 SqlServerWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据你配置生成相应的SQL语句 * `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行)
    注意: 1. 目的表所在数据库必须是主库才能写入数据;整个任务至少需具备 insert into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 2.SqlServerWriter和MysqlWriter不同,不支持配置writeMode参数。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 SqlServer 导入的数据。 ``` { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": {}, "writer": { "name": "sqlserverwriter", "parameter": { "username": "root", "password": "root", "column": [ "db_id", "db_type", "db_ip", "db_port", "db_role", "db_name", "db_username", "db_password", "db_modify_time", "db_modify_user", "db_description", "db_tddl_info" ], "connection": [ { "table": [ "db_info_for_writer" ], "jdbcUrl": "jdbc:sqlserver://[HOST_NAME]:PORT;DatabaseName=[DATABASE_NAME]" } ], "session": ["SET IDENTITY_INSERT TABLE_NAME ON"], "preSql": [ "delete from @table where db_id = -1;" ], "postSql": [ "update @table set db_modify_time = now() where db_id = 1;" ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息 ,jdbcUrl必须包含在connection配置单元中。 注意:1、在一个数据库上只能配置一个值。这与 SqlServerReader 支持多个备库探测不同,因为此处不支持同一个数据库存在多个主库的情况(双主导入数据情况) 2、jdbcUrl按照SqlServer官方规范,并可以填写连接附加参数信息。具体请参看 SqlServer官方文档或者咨询对应 DBA。 * 必选:是
    * 默认值:无
    * **username** * 描述:目的数据库的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:目的数据库的密码
    * 必选:是
    * 默认值:无
    * **table** * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
    * 默认值:无
    * **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用*表示, 例如: "column": ["\*"] **column配置项必须指定,不能留空!** 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、此处 column 不能配置任何常量值 * 必选:是
    * 默认值:否
    * **session** * 描述:DataX在获取 seqlserver 连接时,执行session指定的SQL语句,修改当前connection session属性
    * 必选:否
    * 默认值:无
    * **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。
    * 必选:否
    * 默认值:无
    * **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
    * 必选:否
    * 默认值:无
    * **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与SqlServer的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
    * 必选:否
    * 默认值:1024
    ### 3.3 类型转换 类似 SqlServerReader ,目前 SqlServerWriter 支持大部分 SqlServer 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出 SqlServerWriter 针对 SqlServer 类型转换列表: | DataX 内部类型| SqlServer 数据类型 | | -------- | ----- | | Long || | Double || | String || | Date || | Boolean || | Bytes || ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: ``` ``` 单行记录类似于: ``` ``` #### 4.1.2 机器参数 * 执行 DataX 的机器参数为: 1. cpu: 24 Core Intel(R) Xeon(R) CPU E5-2430 0 @ 2.20GHz 2. mem: 94GB 3. net: 千兆双网卡 4. disc: DataX 数据不落磁盘,不统计此项 * SqlServer 数据库机器参数为: 1. cpu: 4 Core Intel(R) Xeon(R) CPU E5420 @ 2.50GHz 2. mem: 7GB #### 4.1.3 DataX jvm 参数 -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError #### 4.1.4 性能测试作业配置 ``` ``` ### 4.2 测试报告 #### 4.2.1 测试报告 ## 5 约束限制 ## FAQ *** **Q: SqlServerWriter 执行 postSql 语句报错,那么数据导入到目标数据库了吗?** A: DataX 导入过程存在三块逻辑,pre 操作、导入操作、post 操作,其中任意一环报错,DataX 作业报错。由于 DataX 不能保证在同一个事务完成上述几个操作,因此有可能数据已经落入到目标端。 *** **Q: 按照上述说法,那么有部分脏数据导入数据库,如果影响到线上数据库怎么办?** A: 目前有两种解法,第一种配置 pre 语句,该 sql 可以清理当天导入数据, DataX 每次导入时候可以把上次清理干净并导入完整数据。第二种,向临时表导入数据,完成后再 rename 到线上表。 *** **Q: 上面第二种方法可以避免对线上数据造成影响,那我具体怎样操作?** A: 可以配置临时表导入 ================================================ FILE: sqlserverwriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT sqlserverwriter sqlserverwriter jar writer data into sqlserver database com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.microsoft.sqlserver sqljdbc4 4.0 com.alibaba.datax plugin-rdbms-util ${datax-project-version} maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: sqlserverwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/sqlserverwriter target/ sqlserverwriter-0.0.1-SNAPSHOT.jar plugin/writer/sqlserverwriter false plugin/writer/sqlserverwriter/libs runtime ================================================ FILE: sqlserverwriter/src/main/java/com/alibaba/datax/plugin/writer/sqlserverwriter/SqlServerWriter.java ================================================ package com.alibaba.datax.plugin.writer.sqlserverwriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import java.util.List; public class SqlServerWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.SQLServer; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); // warn:not like mysql, sqlserver only support insert mode String writeMode = this.originalConfig.getString(Key.WRITE_MODE); if (null != writeMode) { throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "写入模式(writeMode)配置错误. 因为sqlserver不支持配置项 writeMode: %s, sqlserver只能使用insert sql 插入数据. 请检查您的配置并作出修改", writeMode)); } this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterJob.init(this.originalConfig); } @Override public void prepare() { this.commonRdbmsWriterJob.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); } @Override public void post() { this.commonRdbmsWriterJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterJob.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterTask; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterTask = new CommonRdbmsWriter.Task( DATABASE_TYPE); this.commonRdbmsWriterTask.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); } public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterTask.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); } } } ================================================ FILE: sqlserverwriter/src/main/java/com/alibaba/datax/plugin/writer/sqlserverwriter/SqlServerWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.sqlserverwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum SqlServerWriterErrorCode implements ErrorCode { ; private final String code; private final String describe; private SqlServerWriterErrorCode(String code, String describe) { this.code = code; this.describe = describe; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.describe; } @Override public String toString() { return String.format("Code:[%s], Describe:[%s]. ", this.code, this.describe); } } ================================================ FILE: sqlserverwriter/src/main/resources/plugin.json ================================================ { "name": "sqlserverwriter", "class": "com.alibaba.datax.plugin.writer.sqlserverwriter.SqlServerWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute insert sql. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: sqlserverwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "sqlserverwriter", "parameter": { "username": "", "password": "", "column": [], "preSql": [], "connection": [ { "jdbcUrl": "", "table": [] } ], "preSql": [], "postSql": [] } } ================================================ FILE: starrocksreader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 starrocksreader starrocksreader jar 8 8 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} mysql mysql-connector-java 5.1.46 src/main/java **/*.properties maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: starrocksreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/starrocksreader target/ starrocksreader-0.0.1-SNAPSHOT.jar plugin/reader/starrocksreader false plugin/reader/starrocksreader/libs runtime ================================================ FILE: starrocksreader/src/main/java/com/alibaba/datax/plugin/reader/starrocksreader/StarRocksReader.java ================================================ package com.alibaba.datax.plugin.reader.starrocksreader; import java.util.List; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.reader.Constant; import com.alibaba.datax.plugin.rdbms.reader.Key; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class StarRocksReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.StarRocks; public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private Configuration originalConfig = null; private CommonRdbmsReader.Job commonRdbmsReaderJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); int fetchSize = this.originalConfig.getInt(Constant.FETCH_SIZE, Integer.MIN_VALUE); this.originalConfig.set(Constant.FETCH_SIZE, fetchSize); this.commonRdbmsReaderJob = new CommonRdbmsReader.Job(DATABASE_TYPE); this.commonRdbmsReaderJob.init(this.originalConfig); } @Override public void preCheck(){ init(); this.commonRdbmsReaderJob.preCheck(this.originalConfig,DATABASE_TYPE); } @Override public void prepare() { } @Override public List split(int adviceNumber) { LOG.info("split() begin..."); List splitResult = this.commonRdbmsReaderJob.split(this.originalConfig, adviceNumber); /** * 在日志中告知用户,为什么实际datax切分跑的channel数会小于用户配置的channel数 */ if(splitResult.size() < adviceNumber){ // 如果用户没有配置切分主键splitPk if(StringUtils.isBlank(this.originalConfig.getString(Key.SPLIT_PK, null))){ LOG.info("User has not configured splitPk."); }else{ // 用户配置了切分主键,但是切分主键可能重复太多,或者要同步的表的记录太少,无法切分成adviceNumber个task LOG.info("User has configured splitPk. But the number of task finally split is smaller than that user has configured. " + "The possible reasons are: 1) too many repeated splitPk values, 2) too few records."); } } LOG.info("split() ok and end..."); return splitResult; } @Override public void post() { this.commonRdbmsReaderJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderJob.destroy(this.originalConfig); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderTask; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderTask = new CommonRdbmsReader.Task(DATABASE_TYPE, super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderTask.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig.getInt(Constant.FETCH_SIZE); this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderTask.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); } } } ================================================ FILE: starrocksreader/src/main/resources/plugin.json ================================================ { "name": "starrocksreader", "class": "com.alibaba.datax.plugin.reader.starrocksreader.StarRocksReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: starrockswriter/doc/starrockswriter.md ================================================ # DataX StarRocksWriter --- ## 1 快速介绍 StarRocksWriter 插件实现了写入数据到 StarRocks 主库的目的表的功能。在底层实现上, StarRocksWriter 通过Streamload以csv格式导入数据至StarRocks。 ## 2 实现原理 StarRocksWriter 通过Streamload以csv格式导入数据至StarRocks, 内部将`reader`读取的数据进行缓存后批量导入至StarRocks,以提高写入性能。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存Mysql读取数据后导入至StarRocks。 ```json { "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "xxxx", "password": "xxxx", "column": [ "k1", "k2", "v1", "v2" ], "connection": [ { "table": [ "table1", "table2" ], "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/datax_test1" ] }, { "table": [ "table3", "table4" ], "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/datax_test2" ] } ] } }, "writer": { "name": "starrockswriter", "parameter": { "username": "xxxx", "password": "xxxx", "column": ["k1", "k2", "v1", "v2"], "preSql": [], "postSql": [], "connection": [ { "table": ["xxx"], "jdbcUrl": "jdbc:mysql://172.28.17.100:9030/", "selectedDatabase": "xxxx" } ], "loadUrl": ["172.28.17.100:8030", "172.28.17.100:8030"], "loadProps": {} } } } ] } } ``` ### 3.2 参数说明 * **username** * 描述:StarRocks数据库的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:StarRocks数据库的密码
    * 必选:是
    * 默认值:无
    * **selectedDatabase** * 描述:StarRocks表的数据库名称。 * 必选:是
    * 默认值:无
    * **table** * 描述:StarRocks表的表名称。 * 必选:是
    * 默认值:无
    * **loadUrl** * 描述:StarRocks FE的地址用于Streamload,可以为多个fe地址,`fe_ip:fe_http_port`。 * 必选:是
    * 默认值:无
    * **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 **column配置项必须指定,不能留空!** 注意:我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 * 必选:是
    * 默认值:否
    * **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。
    * 必选:否
    * 默认值:无
    * **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。
    * 必选:否
    * 默认值:无
    * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息,用于执行`preSql`及`postSql`。
    * 必选:否
    * 默认值:无
    * **maxBatchRows** * 描述:单次StreamLoad导入的最大行数
    * 必选:否
    * 默认值:500000 (50W)
    * **maxBatchSize** * 描述:单次StreamLoad导入的最大字节数。
    * 必选:否
    * 默认值:104857600 (100M) * **flushInterval** * 描述:上一次StreamLoad结束至下一次开始的时间间隔(单位:ms)。
    * 必选:否
    * 默认值:300000 (ms) * **loadProps** * 描述:StreamLoad 的请求参数,详情参照StreamLoad介绍页面。
    * 必选:否
    * 默认值:无
    ### 3.3 类型转换 默认传入的数据均会被转为字符串,并以`\t`作为列分隔符,`\n`作为行分隔符,组成`csv`文件进行StreamLoad导入操作。 如需更改列分隔符, 则正确配置 `loadProps` 即可: ```json "loadProps": { "column_separator": "\\x01", "row_delimiter": "\\x02" } ``` 如需更改导入格式为`json`, 则正确配置 `loadProps` 即可: ```json "loadProps": { "format": "json", "strip_outer_array": true } ``` ## 4 性能报告 ## 5 约束限制 ## FAQ ================================================ FILE: starrockswriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT starrockswriter starrockswriter 1.1.0 jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} commons-codec commons-codec 1.9 org.apache.commons commons-lang3 3.12.0 commons-logging commons-logging 1.1.1 org.apache.httpcomponents httpcore 4.4.6 org.apache.httpcomponents httpclient 4.5.3 com.alibaba.fastjson2 fastjson2 mysql mysql-connector-java 5.1.46 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} org.apache.maven.plugins maven-shade-plugin 3.0.0 package shade true org.apache.http com.starrocks.shade.org.apache.http org.apache.commons com.starrocks.shade.org.apache.commons org.apache.commons:commons-lang3 commons-codec:commons-codec commons-logging:* org.apache.httpcomponents:httpclient org.apache.httpcomponents:httpcore *:* META-INF/*.SF META-INF/*.DSA META-INF/*.RSA maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: starrockswriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/starrockswriter target/ starrockswriter-1.1.0.jar plugin/writer/starrockswriter false plugin/writer/starrockswriter/libs runtime ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/StarRocksWriter.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.starrocks.connector.datax.plugin.writer.starrockswriter.manager.StarRocksWriterManager; import com.starrocks.connector.datax.plugin.writer.starrockswriter.row.StarRocksISerializer; import com.starrocks.connector.datax.plugin.writer.starrockswriter.row.StarRocksSerializerFactory; import com.starrocks.connector.datax.plugin.writer.starrockswriter.util.StarRocksWriterUtil; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.util.ArrayList; import java.util.List; public class StarRocksWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig = null; private StarRocksWriterOptions options; @Override public void init() { this.originalConfig = super.getPluginJobConf(); String selectedDatabase = super.getPluginJobConf().getString(StarRocksWriterOptions.KEY_SELECTED_DATABASE); if(StringUtils.isBlank(this.originalConfig.getString(StarRocksWriterOptions.KEY_DATABASE)) && StringUtils.isNotBlank(selectedDatabase)){ this.originalConfig.set(StarRocksWriterOptions.KEY_DATABASE, selectedDatabase); } options = new StarRocksWriterOptions(super.getPluginJobConf()); options.doPretreatment(); } @Override public void preCheck(){ this.init(); StarRocksWriterUtil.preCheckPrePareSQL(options); StarRocksWriterUtil.preCheckPostSQL(options); } @Override public void prepare() { String username = options.getUsername(); String password = options.getPassword(); String jdbcUrl = options.getJdbcUrl(); List renderedPreSqls = StarRocksWriterUtil.renderPreOrPostSqls(options.getPreSqlList(), options.getTable()); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); LOG.info("Begin to execute preSqls:[{}]. context info:{}.", String.join(";", renderedPreSqls), jdbcUrl); StarRocksWriterUtil.executeSqls(conn, renderedPreSqls); DBUtil.closeDBResources(null, null, conn); } } @Override public List split(int mandatoryNumber) { List configurations = new ArrayList<>(mandatoryNumber); for (int i = 0; i < mandatoryNumber; i++) { configurations.add(originalConfig); } return configurations; } @Override public void post() { String username = options.getUsername(); String password = options.getPassword(); String jdbcUrl = options.getJdbcUrl(); List renderedPostSqls = StarRocksWriterUtil.renderPreOrPostSqls(options.getPostSqlList(), options.getTable()); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, jdbcUrl, username, password); LOG.info("Begin to execute postSqls:[{}]. context info:{}.", String.join(";", renderedPostSqls), jdbcUrl); StarRocksWriterUtil.executeSqls(conn, renderedPostSqls); DBUtil.closeDBResources(null, null, conn); } } @Override public void destroy() { } } public static class Task extends Writer.Task { private StarRocksWriterManager writerManager; private StarRocksWriterOptions options; private StarRocksISerializer rowSerializer; @Override public void init() { options = new StarRocksWriterOptions(super.getPluginJobConf()); if (options.isWildcardColumn()) { Connection conn = DBUtil.getConnection(DataBaseType.MySql, options.getJdbcUrl(), options.getUsername(), options.getPassword()); List columns = StarRocksWriterUtil.getStarRocksColumns(conn, options.getDatabase(), options.getTable()); options.setInfoCchemaColumns(columns); } writerManager = new StarRocksWriterManager(options); rowSerializer = StarRocksSerializerFactory.createSerializer(options); } @Override public void prepare() { } public void startWrite(RecordReceiver recordReceiver) { try { Record record; while ((record = recordReceiver.getFromReader()) != null) { if (record.getColumnNumber() != options.getColumns().size()) { throw DataXException .asDataXException( DBUtilErrorCode.CONF_ERROR, String.format( "Column configuration error. The number of reader columns %d and the number of writer columns %d are not equal.", record.getColumnNumber(), options.getColumns().size())); } writerManager.writeRecord(rowSerializer.serialize(record)); } } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); } } @Override public void post() { try { writerManager.close(); } catch (Exception e) { throw DataXException.asDataXException(DBUtilErrorCode.WRITE_DATA_ERROR, e); } } @Override public void destroy() {} @Override public boolean supportFailOver(){ return false; } } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/StarRocksWriterOptions.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter; import java.io.Serializable; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DBUtilErrorCode; import org.apache.commons.lang3.StringUtils; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class StarRocksWriterOptions implements Serializable { private static final long serialVersionUID = 1l; private static final long KILO_BYTES_SCALE = 1024l; private static final long MEGA_BYTES_SCALE = KILO_BYTES_SCALE * KILO_BYTES_SCALE; private static final int MAX_RETRIES = 1; private static final int BATCH_ROWS = 500000; private static final long BATCH_BYTES = 5 * MEGA_BYTES_SCALE; private static final long FLUSH_INTERVAL = 300000; private static final String KEY_LOAD_PROPS_FORMAT = "format"; public enum StreamLoadFormat { CSV, JSON; } public static final String KEY_USERNAME = "username"; public static final String KEY_PASSWORD = "password"; public static final String KEY_DATABASE = "database"; public static final String KEY_SELECTED_DATABASE = "selectedDatabase"; public static final String KEY_TABLE = "table"; public static final String KEY_COLUMN = "column"; public static final String KEY_PRE_SQL = "preSql"; public static final String KEY_POST_SQL = "postSql"; public static final String KEY_JDBC_URL = "jdbcUrl"; public static final String KEY_LABEL_PREFIX = "labelPrefix"; public static final String KEY_MAX_BATCH_ROWS = "maxBatchRows"; public static final String KEY_MAX_BATCH_SIZE = "maxBatchSize"; public static final String KEY_FLUSH_INTERVAL = "flushInterval"; public static final String KEY_LOAD_URL = "loadUrl"; public static final String KEY_FLUSH_QUEUE_LENGTH = "flushQueueLength"; public static final String KEY_LOAD_PROPS = "loadProps"; public static final String CONNECTION_JDBC_URL = "connection[0].jdbcUrl"; public static final String CONNECTION_TABLE_NAME = "connection[0].table[0]"; public static final String CONNECTION_SELECTED_DATABASE = "connection[0].selectedDatabase"; private final Configuration options; private List infoCchemaColumns; private List userSetColumns; private boolean isWildcardColumn; public StarRocksWriterOptions(Configuration options) { this.options = options; // database String database = this.options.getString(CONNECTION_SELECTED_DATABASE); if (StringUtils.isBlank(database)) { database = this.options.getString(KEY_SELECTED_DATABASE); } if (StringUtils.isNotBlank(database)) { this.options.set(KEY_DATABASE, database); } // jdbcUrl String jdbcUrl = this.options.getString(CONNECTION_JDBC_URL); if (StringUtils.isNotBlank(jdbcUrl)) { this.options.set(KEY_JDBC_URL, jdbcUrl); } // table String table = this.options.getString(CONNECTION_TABLE_NAME); if (StringUtils.isNotBlank(table)) { this.options.set(KEY_TABLE, table); } // column this.userSetColumns = options.getList(KEY_COLUMN, String.class).stream().map(str -> str.replace("`", "")).collect(Collectors.toList()); if (1 == options.getList(KEY_COLUMN, String.class).size() && "*".trim().equals(options.getList(KEY_COLUMN, String.class).get(0))) { this.isWildcardColumn = true; } } public void doPretreatment() { validateRequired(); validateStreamLoadUrl(); } public String getJdbcUrl() { return options.getString(KEY_JDBC_URL); } public String getDatabase() { return options.getString(KEY_DATABASE); } public String getTable() { return options.getString(KEY_TABLE); } public String getUsername() { return options.getString(KEY_USERNAME); } public String getPassword() { return options.getString(KEY_PASSWORD); } public String getLabelPrefix() { return options.getString(KEY_LABEL_PREFIX); } public List getLoadUrlList() { return options.getList(KEY_LOAD_URL, String.class); } public List getColumns() { if (isWildcardColumn) { return this.infoCchemaColumns; } return this.userSetColumns; } public boolean isWildcardColumn() { return this.isWildcardColumn; } public void setInfoCchemaColumns(List cols) { this.infoCchemaColumns = cols; } public List getPreSqlList() { return options.getList(KEY_PRE_SQL, String.class); } public List getPostSqlList() { return options.getList(KEY_POST_SQL, String.class); } public Map getLoadProps() { return options.getMap(KEY_LOAD_PROPS); } public int getMaxRetries() { return MAX_RETRIES; } public int getBatchRows() { Integer rows = options.getInt(KEY_MAX_BATCH_ROWS); return null == rows ? BATCH_ROWS : rows; } public long getBatchSize() { Long size = options.getLong(KEY_MAX_BATCH_SIZE); return null == size ? BATCH_BYTES : size; } public long getFlushInterval() { Long interval = options.getLong(KEY_FLUSH_INTERVAL); return null == interval ? FLUSH_INTERVAL : interval; } public int getFlushQueueLength() { Integer len = options.getInt(KEY_FLUSH_QUEUE_LENGTH); return null == len ? 1 : len; } public StreamLoadFormat getStreamLoadFormat() { Map loadProps = getLoadProps(); if (null == loadProps) { return StreamLoadFormat.CSV; } if (loadProps.containsKey(KEY_LOAD_PROPS_FORMAT) && StreamLoadFormat.JSON.name().equalsIgnoreCase(String.valueOf(loadProps.get(KEY_LOAD_PROPS_FORMAT)))) { return StreamLoadFormat.JSON; } return StreamLoadFormat.CSV; } private void validateStreamLoadUrl() { List urlList = getLoadUrlList(); for (String host : urlList) { if (host.split(":").length < 2) { throw DataXException.asDataXException(DBUtilErrorCode.CONF_ERROR, "The format of loadUrl is illegal, please input `fe_ip:fe_http_ip;fe_ip:fe_http_ip`."); } } } private void validateRequired() { final String[] requiredOptionKeys = new String[]{ KEY_USERNAME, KEY_DATABASE, KEY_TABLE, KEY_COLUMN, KEY_LOAD_URL }; for (String optionKey : requiredOptionKeys) { options.getNecessaryValue(optionKey, DBUtilErrorCode.REQUIRED_VALUE); } } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksFlushTuple.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.manager; import java.util.List; public class StarRocksFlushTuple { private String label; private Long bytes; private List rows; public StarRocksFlushTuple(String label, Long bytes, List rows) { this.label = label; this.bytes = bytes; this.rows = rows; } public String getLabel() { return label; } public void setLabel(String label) { this.label = label; } public Long getBytes() { return bytes; } public List getRows() { return rows; } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksStreamLoadFailedException.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.manager; import java.io.IOException; import java.util.Map; public class StarRocksStreamLoadFailedException extends IOException { static final long serialVersionUID = 1L; private final Map response; private boolean reCreateLabel; public StarRocksStreamLoadFailedException(String message, Map response) { super(message); this.response = response; } public StarRocksStreamLoadFailedException(String message, Map response, boolean reCreateLabel) { super(message); this.response = response; this.reCreateLabel = reCreateLabel; } public Map getFailedResponse() { return response; } public boolean needReCreateLabel() { return reCreateLabel; } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksStreamLoadVisitor.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.manager; import java.io.IOException; import java.net.HttpURLConnection; import java.net.URL; import java.nio.ByteBuffer; import java.nio.charset.StandardCharsets; import com.alibaba.fastjson2.JSON; import com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriterOptions; import com.starrocks.connector.datax.plugin.writer.starrockswriter.row.StarRocksDelimiterParser; import org.apache.commons.codec.binary.Base64; import org.apache.http.HttpEntity; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.methods.HttpPut; import org.apache.http.entity.ByteArrayEntity; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.DefaultRedirectStrategy; import org.apache.http.impl.client.HttpClientBuilder; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors; public class StarRocksStreamLoadVisitor { private static final Logger LOG = LoggerFactory.getLogger(StarRocksStreamLoadVisitor.class); private final StarRocksWriterOptions writerOptions; private long pos; private static final String RESULT_FAILED = "Fail"; private static final String RESULT_LABEL_EXISTED = "Label Already Exists"; private static final String LAEBL_STATE_VISIBLE = "VISIBLE"; private static final String LAEBL_STATE_COMMITTED = "COMMITTED"; private static final String RESULT_LABEL_PREPARE = "PREPARE"; private static final String RESULT_LABEL_ABORTED = "ABORTED"; private static final String RESULT_LABEL_UNKNOWN = "UNKNOWN"; public StarRocksStreamLoadVisitor(StarRocksWriterOptions writerOptions) { this.writerOptions = writerOptions; } public void doStreamLoad(StarRocksFlushTuple flushData) throws IOException { String host = getAvailableHost(); if (null == host) { throw new IOException("None of the host in `load_url` could be connected."); } String loadUrl = new StringBuilder(host) .append("/api/") .append(writerOptions.getDatabase()) .append("/") .append(writerOptions.getTable()) .append("/_stream_load") .toString(); if (LOG.isDebugEnabled()) { LOG.debug(String.format("Start to join batch data: rows[%d] bytes[%d] label[%s].", flushData.getRows().size(), flushData.getBytes(), flushData.getLabel())); } Map loadResult = doHttpPut(loadUrl, flushData.getLabel(), joinRows(flushData.getRows(), flushData.getBytes().intValue())); final String keyStatus = "Status"; if (null == loadResult || !loadResult.containsKey(keyStatus)) { LOG.error("unknown result status. {}", loadResult); throw new IOException("Unable to flush data to StarRocks: unknown result status. " + loadResult); } if (LOG.isDebugEnabled()) { LOG.debug(new StringBuilder("StreamLoad response:\n").append(JSON.toJSONString(loadResult)).toString()); } if (RESULT_FAILED.equals(loadResult.get(keyStatus))) { StringBuilder errorBuilder = new StringBuilder("Failed to flush data to StarRocks.\n"); if (loadResult.containsKey("Message")) { errorBuilder.append(loadResult.get("Message")); errorBuilder.append('\n'); } if (loadResult.containsKey("ErrorURL")) { LOG.error("StreamLoad response: {}", loadResult); try { errorBuilder.append(doHttpGet(loadResult.get("ErrorURL").toString())); errorBuilder.append('\n'); } catch (IOException e) { LOG.warn("Get Error URL failed. {} ", loadResult.get("ErrorURL"), e); } } else { errorBuilder.append(JSON.toJSONString(loadResult)); errorBuilder.append('\n'); } throw new IOException(errorBuilder.toString()); } else if (RESULT_LABEL_EXISTED.equals(loadResult.get(keyStatus))) { LOG.debug(new StringBuilder("StreamLoad response:\n").append(JSON.toJSONString(loadResult)).toString()); // has to block-checking the state to get the final result checkLabelState(host, flushData.getLabel()); } } private String getAvailableHost() { List hostList = writerOptions.getLoadUrlList(); long tmp = pos + hostList.size(); for (; pos < tmp; pos++) { String host = new StringBuilder("http://").append(hostList.get((int) (pos % hostList.size()))).toString(); if (tryHttpConnection(host)) { return host; } } return null; } private boolean tryHttpConnection(String host) { try { URL url = new URL(host); HttpURLConnection co = (HttpURLConnection) url.openConnection(); co.setConnectTimeout(1000); co.connect(); co.disconnect(); return true; } catch (Exception e1) { LOG.warn("Failed to connect to address:{}", host, e1); return false; } } private byte[] joinRows(List rows, int totalBytes) { if (StarRocksWriterOptions.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { Map props = (writerOptions.getLoadProps() == null ? new HashMap<>() : writerOptions.getLoadProps()); byte[] lineDelimiter = StarRocksDelimiterParser.parse((String)props.get("row_delimiter"), "\n").getBytes(StandardCharsets.UTF_8); ByteBuffer bos = ByteBuffer.allocate(totalBytes + rows.size() * lineDelimiter.length); for (byte[] row : rows) { bos.put(row); bos.put(lineDelimiter); } return bos.array(); } if (StarRocksWriterOptions.StreamLoadFormat.JSON.equals(writerOptions.getStreamLoadFormat())) { ByteBuffer bos = ByteBuffer.allocate(totalBytes + (rows.isEmpty() ? 2 : rows.size() + 1)); bos.put("[".getBytes(StandardCharsets.UTF_8)); byte[] jsonDelimiter = ",".getBytes(StandardCharsets.UTF_8); boolean isFirstElement = true; for (byte[] row : rows) { if (!isFirstElement) { bos.put(jsonDelimiter); } bos.put(row); isFirstElement = false; } bos.put("]".getBytes(StandardCharsets.UTF_8)); return bos.array(); } throw new RuntimeException("Failed to join rows data, unsupported `format` from stream load properties:"); } @SuppressWarnings("unchecked") private void checkLabelState(String host, String label) throws IOException { int idx = 0; while(true) { try { TimeUnit.SECONDS.sleep(Math.min(++idx, 5)); } catch (InterruptedException ex) { break; } try (CloseableHttpClient httpclient = HttpClients.createDefault()) { HttpGet httpGet = new HttpGet(new StringBuilder(host).append("/api/").append(writerOptions.getDatabase()).append("/get_load_state?label=").append(label).toString()); httpGet.setHeader("Authorization", getBasicAuthHeader(writerOptions.getUsername(), writerOptions.getPassword())); httpGet.setHeader("Connection", "close"); try (CloseableHttpResponse resp = httpclient.execute(httpGet)) { HttpEntity respEntity = getHttpEntity(resp); if (respEntity == null) { throw new IOException(String.format("Failed to flush data to StarRocks, Error " + "could not get the final state of label[%s].\n", label), null); } Map result = (Map)JSON.parse(EntityUtils.toString(respEntity)); String labelState = (String)result.get("state"); if (null == labelState) { throw new IOException(String.format("Failed to flush data to StarRocks, Error " + "could not get the final state of label[%s]. response[%s]\n", label, EntityUtils.toString(respEntity)), null); } LOG.info(String.format("Checking label[%s] state[%s]\n", label, labelState)); switch(labelState) { case LAEBL_STATE_VISIBLE: case LAEBL_STATE_COMMITTED: return; case RESULT_LABEL_PREPARE: continue; case RESULT_LABEL_ABORTED: throw new StarRocksStreamLoadFailedException(String.format("Failed to flush data to StarRocks, Error " + "label[%s] state[%s]\n", label, labelState), null, true); case RESULT_LABEL_UNKNOWN: default: throw new IOException(String.format("Failed to flush data to StarRocks, Error " + "label[%s] state[%s]\n", label, labelState), null); } } } } } @SuppressWarnings("unchecked") private Map doHttpPut(String loadUrl, String label, byte[] data) throws IOException { LOG.info(String.format("Executing stream load to: '%s', size: '%s'", loadUrl, data.length)); final HttpClientBuilder httpClientBuilder = HttpClients.custom() .setRedirectStrategy(new DefaultRedirectStrategy() { @Override protected boolean isRedirectable(String method) { return true; } }); try (CloseableHttpClient httpclient = httpClientBuilder.build()) { HttpPut httpPut = new HttpPut(loadUrl); List cols = writerOptions.getColumns(); if (null != cols && !cols.isEmpty() && StarRocksWriterOptions.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { httpPut.setHeader("columns", String.join(",", cols.stream().map(f -> String.format("`%s`", f)).collect(Collectors.toList()))); } if (null != writerOptions.getLoadProps()) { for (Map.Entry entry : writerOptions.getLoadProps().entrySet()) { httpPut.setHeader(entry.getKey(), String.valueOf(entry.getValue())); } } httpPut.setHeader("Expect", "100-continue"); httpPut.setHeader("label", label); httpPut.setHeader("Content-Type", "application/x-www-form-urlencoded"); httpPut.setHeader("Authorization", getBasicAuthHeader(writerOptions.getUsername(), writerOptions.getPassword())); httpPut.setEntity(new ByteArrayEntity(data)); httpPut.setConfig(RequestConfig.custom().setRedirectsEnabled(true).build()); try (CloseableHttpResponse resp = httpclient.execute(httpPut)) { int code = resp.getStatusLine().getStatusCode(); if (200 != code) { String errorText; try { HttpEntity respEntity = resp.getEntity(); errorText = EntityUtils.toString(respEntity); } catch (Exception err) { errorText = "find errorText failed: " + err.getMessage(); } LOG.warn("Request failed with code:{}, err:{}", code, errorText); Map errorMap = new HashMap<>(); errorMap.put("Status", "Fail"); errorMap.put("Message", errorText); return errorMap; } HttpEntity respEntity = resp.getEntity(); if (null == respEntity) { LOG.warn("Request failed with empty response."); return null; } return (Map)JSON.parse(EntityUtils.toString(respEntity)); } } } private String getBasicAuthHeader(String username, String password) { String auth = username + ":" + password; byte[] encodedAuth = Base64.encodeBase64(auth.getBytes(StandardCharsets.UTF_8)); return new StringBuilder("Basic ").append(new String(encodedAuth)).toString(); } private HttpEntity getHttpEntity(CloseableHttpResponse resp) { int code = resp.getStatusLine().getStatusCode(); if (200 != code) { LOG.warn("Request failed with code:{}", code); return null; } HttpEntity respEntity = resp.getEntity(); if (null == respEntity) { LOG.warn("Request failed with empty response."); return null; } return respEntity; } private String doHttpGet(String getUrl) throws IOException { LOG.info("Executing GET from {}.", getUrl); try (CloseableHttpClient httpclient = buildHttpClient()) { HttpGet httpGet = new HttpGet(getUrl); try (CloseableHttpResponse resp = httpclient.execute(httpGet)) { HttpEntity respEntity = resp.getEntity(); if (null == respEntity) { LOG.warn("Request failed with empty response."); return null; } return EntityUtils.toString(respEntity); } } } private CloseableHttpClient buildHttpClient(){ final HttpClientBuilder httpClientBuilder = HttpClients.custom() .setRedirectStrategy(new DefaultRedirectStrategy() { @Override protected boolean isRedirectable(String method) { return true; } }); return httpClientBuilder.build(); } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/manager/StarRocksWriterManager.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.manager; import org.apache.commons.lang3.concurrent.BasicThreadFactory; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.List; import java.util.UUID; import java.util.concurrent.Executors; import java.util.concurrent.LinkedBlockingDeque; import java.util.concurrent.ScheduledExecutorService; import java.util.concurrent.ScheduledFuture; import java.util.concurrent.TimeUnit; import com.google.common.base.Strings; import com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriterOptions; public class StarRocksWriterManager { private static final Logger LOG = LoggerFactory.getLogger(StarRocksWriterManager.class); private final StarRocksStreamLoadVisitor starrocksStreamLoadVisitor; private final StarRocksWriterOptions writerOptions; private final List buffer = new ArrayList<>(); private int batchCount = 0; private long batchSize = 0; private volatile boolean closed = false; private volatile Exception flushException; private final LinkedBlockingDeque flushQueue; private ScheduledExecutorService scheduler; private ScheduledFuture scheduledFuture; public StarRocksWriterManager(StarRocksWriterOptions writerOptions) { this.writerOptions = writerOptions; this.starrocksStreamLoadVisitor = new StarRocksStreamLoadVisitor(writerOptions); flushQueue = new LinkedBlockingDeque<>(writerOptions.getFlushQueueLength()); this.startScheduler(); this.startAsyncFlushing(); } public void startScheduler() { stopScheduler(); this.scheduler = Executors.newScheduledThreadPool(1, new BasicThreadFactory.Builder().namingPattern("starrocks-interval-flush").daemon(true).build()); this.scheduledFuture = this.scheduler.schedule(() -> { synchronized (StarRocksWriterManager.this) { if (!closed) { try { String label = createBatchLabel(); LOG.info(String.format("StarRocks interval Sinking triggered: label[%s].", label)); if (batchCount == 0) { startScheduler(); } flush(label, false); } catch (Exception e) { flushException = e; } } } }, writerOptions.getFlushInterval(), TimeUnit.MILLISECONDS); } public void stopScheduler() { if (this.scheduledFuture != null) { scheduledFuture.cancel(false); this.scheduler.shutdown(); } } public final synchronized void writeRecord(String record) throws IOException { checkFlushException(); try { byte[] bts = record.getBytes(StandardCharsets.UTF_8); buffer.add(bts); batchCount++; batchSize += bts.length; if (batchCount >= writerOptions.getBatchRows() || batchSize >= writerOptions.getBatchSize()) { String label = createBatchLabel(); if (LOG.isDebugEnabled()) { LOG.debug(String.format("StarRocks buffer Sinking triggered: rows[%d] label[%s].", batchCount, label)); } flush(label, false); } } catch (Exception e) { throw new IOException("Writing records to StarRocks failed.", e); } } public synchronized void flush(String label, boolean waitUtilDone) throws Exception { checkFlushException(); if (batchCount == 0) { if (waitUtilDone) { waitAsyncFlushingDone(); } return; } flushQueue.put(new StarRocksFlushTuple(label, batchSize, new ArrayList<>(buffer))); if (waitUtilDone) { // wait the last flush waitAsyncFlushingDone(); } buffer.clear(); batchCount = 0; batchSize = 0; } public synchronized void close() { if (!closed) { closed = true; try { String label = createBatchLabel(); if (batchCount > 0) { if (LOG.isDebugEnabled()) { LOG.debug(String.format("StarRocks Sink is about to close: label[%s].", label)); } } flush(label, true); } catch (Exception e) { throw new RuntimeException("Writing records to StarRocks failed.", e); } } checkFlushException(); } public String createBatchLabel() { StringBuilder sb = new StringBuilder(); if (!Strings.isNullOrEmpty(writerOptions.getLabelPrefix())) { sb.append(writerOptions.getLabelPrefix()); } return sb.append(UUID.randomUUID().toString()) .toString(); } private void startAsyncFlushing() { // start flush thread Thread flushThread = new Thread(new Runnable(){ public void run() { while(true) { try { asyncFlush(); } catch (Exception e) { flushException = e; } } } }); flushThread.setDaemon(true); flushThread.start(); } private void waitAsyncFlushingDone() throws InterruptedException { // wait previous flushings for (int i = 0; i <= writerOptions.getFlushQueueLength(); i++) { flushQueue.put(new StarRocksFlushTuple("", 0l, null)); } checkFlushException(); } private void asyncFlush() throws Exception { StarRocksFlushTuple flushData = flushQueue.take(); if (Strings.isNullOrEmpty(flushData.getLabel())) { return; } stopScheduler(); if (LOG.isDebugEnabled()) { LOG.debug(String.format("Async stream load: rows[%d] bytes[%d] label[%s].", flushData.getRows().size(), flushData.getBytes(), flushData.getLabel())); } for (int i = 0; i <= writerOptions.getMaxRetries(); i++) { try { // flush to StarRocks with stream load starrocksStreamLoadVisitor.doStreamLoad(flushData); LOG.info(String.format("Async stream load finished: label[%s].", flushData.getLabel())); startScheduler(); break; } catch (Exception e) { LOG.warn("Failed to flush batch data to StarRocks, retry times = {}", i, e); if (i >= writerOptions.getMaxRetries()) { throw new IOException(e); } if (e instanceof StarRocksStreamLoadFailedException && ((StarRocksStreamLoadFailedException)e).needReCreateLabel()) { String newLabel = createBatchLabel(); LOG.warn(String.format("Batch label changed from [%s] to [%s]", flushData.getLabel(), newLabel)); flushData.setLabel(newLabel); } try { Thread.sleep(1000l * Math.min(i + 1, 10)); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); throw new IOException("Unable to flush, interrupted while doing another attempt", e); } } } } private void checkFlushException() { if (flushException != null) { throw new RuntimeException("Writing records to StarRocks failed.", flushException); } } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksBaseSerializer.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Column.Type; public class StarRocksBaseSerializer { protected String fieldConvertion(Column col) { if (null == col.getRawData() || Type.NULL == col.getType()) { return null; } if (Type.BOOL == col.getType()) { return String.valueOf(col.asLong()); } if (Type.BYTES == col.getType()) { byte[] bts = (byte[])col.getRawData(); long value = 0; for (int i = 0; i < bts.length; i++) { value += (bts[bts.length - i - 1] & 0xffL) << (8 * i); } return String.valueOf(value); } return col.asString(); } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksCsvSerializer.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; import java.io.StringWriter; import com.alibaba.datax.common.element.Record; import com.google.common.base.Strings; public class StarRocksCsvSerializer extends StarRocksBaseSerializer implements StarRocksISerializer { private static final long serialVersionUID = 1L; private final String columnSeparator; public StarRocksCsvSerializer(String sp) { this.columnSeparator = StarRocksDelimiterParser.parse(sp, "\t"); } @Override public String serialize(Record row) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < row.getColumnNumber(); i++) { String value = fieldConvertion(row.getColumn(i)); sb.append(null == value ? "\\N" : value); if (i < row.getColumnNumber() - 1) { sb.append(columnSeparator); } } return sb.toString(); } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksDelimiterParser.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; import java.io.StringWriter; import com.google.common.base.Strings; public class StarRocksDelimiterParser { private static final String HEX_STRING = "0123456789ABCDEF"; public static String parse(String sp, String dSp) throws RuntimeException { if (Strings.isNullOrEmpty(sp)) { return dSp; } if (!sp.toUpperCase().startsWith("\\X")) { return sp; } String hexStr = sp.substring(2); // check hex str if (hexStr.isEmpty()) { throw new RuntimeException("Failed to parse delimiter: `Hex str is empty`"); } if (hexStr.length() % 2 != 0) { throw new RuntimeException("Failed to parse delimiter: `Hex str length error`"); } for (char hexChar : hexStr.toUpperCase().toCharArray()) { if (HEX_STRING.indexOf(hexChar) == -1) { throw new RuntimeException("Failed to parse delimiter: `Hex str format error`"); } } // transform to separator StringWriter writer = new StringWriter(); for (byte b : hexStrToBytes(hexStr)) { writer.append((char) b); } return writer.toString(); } private static byte[] hexStrToBytes(String hexStr) { String upperHexStr = hexStr.toUpperCase(); int length = upperHexStr.length() / 2; char[] hexChars = upperHexStr.toCharArray(); byte[] bytes = new byte[length]; for (int i = 0; i < length; i++) { int pos = i * 2; bytes[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1])); } return bytes; } private static byte charToByte(char c) { return (byte) HEX_STRING.indexOf(c); } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksISerializer.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; import java.io.Serializable; import com.alibaba.datax.common.element.Record; public interface StarRocksISerializer extends Serializable { String serialize(Record row); } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksJsonSerializer.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; import java.util.HashMap; import java.util.List; import java.util.Map; import com.alibaba.datax.common.element.Record; import com.alibaba.fastjson2.JSON; public class StarRocksJsonSerializer extends StarRocksBaseSerializer implements StarRocksISerializer { private static final long serialVersionUID = 1L; private final List fieldNames; public StarRocksJsonSerializer(List fieldNames) { this.fieldNames = fieldNames; } @Override public String serialize(Record row) { if (null == fieldNames) { return ""; } Map rowMap = new HashMap<>(fieldNames.size()); int idx = 0; for (String fieldName : fieldNames) { rowMap.put(fieldName, fieldConvertion(row.getColumn(idx))); idx++; } return JSON.toJSONString(rowMap); } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/row/StarRocksSerializerFactory.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.row; import java.util.Map; import com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriterOptions; public class StarRocksSerializerFactory { private StarRocksSerializerFactory() {} public static StarRocksISerializer createSerializer(StarRocksWriterOptions writerOptions) { if (StarRocksWriterOptions.StreamLoadFormat.CSV.equals(writerOptions.getStreamLoadFormat())) { Map props = writerOptions.getLoadProps(); return new StarRocksCsvSerializer(null == props || !props.containsKey("column_separator") ? null : String.valueOf(props.get("column_separator"))); } if (StarRocksWriterOptions.StreamLoadFormat.JSON.equals(writerOptions.getStreamLoadFormat())) { return new StarRocksJsonSerializer(writerOptions.getColumns()); } throw new RuntimeException("Failed to create row serializer, unsupported `format` from stream load properties."); } } ================================================ FILE: starrockswriter/src/main/java/com/starrocks/connector/datax/plugin/writer/starrockswriter/util/StarRocksWriterUtil.java ================================================ package com.starrocks.connector.datax.plugin.writer.starrockswriter.util; import com.alibaba.datax.plugin.rdbms.util.DBUtil; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.util.RdbmsException; import com.alibaba.datax.plugin.rdbms.writer.Constant; import com.alibaba.druid.sql.parser.ParserException; import com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriterOptions; import com.google.common.base.Strings; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.util.*; public final class StarRocksWriterUtil { private static final Logger LOG = LoggerFactory.getLogger(StarRocksWriterUtil.class); private StarRocksWriterUtil() {} public static List getStarRocksColumns(Connection conn, String databaseName, String tableName) { String currentSql = String.format("SELECT COLUMN_NAME FROM `information_schema`.`COLUMNS` WHERE `TABLE_SCHEMA` = '%s' AND `TABLE_NAME` = '%s' ORDER BY `ORDINAL_POSITION` ASC;", databaseName, tableName); List columns = new ArrayList<>(); ResultSet rs = null; try { rs = DBUtil.query(conn, currentSql); while (DBUtil.asyncResultSetNext(rs)) { String colName = rs.getString("COLUMN_NAME"); columns.add(colName); } return columns; } catch (Exception e) { throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); } finally { DBUtil.closeDBResources(rs, null, null); } } public static List renderPreOrPostSqls(List preOrPostSqls, String tableName) { if (null == preOrPostSqls) { return Collections.emptyList(); } List renderedSqls = new ArrayList<>(); for (String sql : preOrPostSqls) { if (!Strings.isNullOrEmpty(sql)) { renderedSqls.add(sql.replace(Constant.TABLE_NAME_PLACEHOLDER, tableName)); } } return renderedSqls; } public static void executeSqls(Connection conn, List sqls) { Statement stmt = null; String currentSql = null; try { stmt = conn.createStatement(); for (String sql : sqls) { currentSql = sql; DBUtil.executeSqlWithoutResultSet(stmt, sql); } } catch (Exception e) { throw RdbmsException.asQueryException(DataBaseType.MySql, e, currentSql, null, null); } finally { DBUtil.closeDBResources(null, stmt, null); } } public static void preCheckPrePareSQL(StarRocksWriterOptions options) { String table = options.getTable(); List preSqls = options.getPreSqlList(); List renderedPreSqls = StarRocksWriterUtil.renderPreOrPostSqls(preSqls, table); if (null != renderedPreSqls && !renderedPreSqls.isEmpty()) { LOG.info("Begin to preCheck preSqls:[{}].", String.join(";", renderedPreSqls)); for (String sql : renderedPreSqls) { try { DBUtil.sqlValid(sql, DataBaseType.MySql); } catch (ParserException e) { throw RdbmsException.asPreSQLParserException(DataBaseType.MySql,e,sql); } } } } public static void preCheckPostSQL(StarRocksWriterOptions options) { String table = options.getTable(); List postSqls = options.getPostSqlList(); List renderedPostSqls = StarRocksWriterUtil.renderPreOrPostSqls(postSqls, table); if (null != renderedPostSqls && !renderedPostSqls.isEmpty()) { LOG.info("Begin to preCheck postSqls:[{}].", String.join(";", renderedPostSqls)); for(String sql : renderedPostSqls) { try { DBUtil.sqlValid(sql, DataBaseType.MySql); } catch (ParserException e){ throw RdbmsException.asPostSQLParserException(DataBaseType.MySql,e,sql); } } } } } ================================================ FILE: starrockswriter/src/main/resources/plugin.json ================================================ { "name": "starrockswriter", "class": "com.starrocks.connector.datax.plugin.writer.starrockswriter.StarRocksWriter", "description": "useScene: prod. mechanism: StarRocksStreamLoad. warn: The more you know about the database, the less problems you encounter.", "developer": "starrocks" } ================================================ FILE: starrockswriter/src/main/resources/plugin_job_template.json ================================================ { "name": "starrockswriter", "parameter": { "username": "", "password": "", "column": [], "preSql": [], "postSql": [], "loadUrl": [], "connection": [ { "jdbcUrl": "", "selectedDatabase": "", "table": [] } ] } } ================================================ FILE: streamreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT streamreader streamreader jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 src/main/resources **/*.* true maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: streamreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/streamreader target/ streamreader-0.0.1-SNAPSHOT.jar plugin/reader/streamreader false plugin/reader/streamreader/libs runtime ================================================ FILE: streamreader/src/main/java/com/alibaba/datax/plugin/reader/streamreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.streamreader; public class Constant { public static final String TYPE = "type"; public static final String VALUE = "value"; public static final String RANDOM = "random"; public static final String DATE_FORMAT_MARK = "dateFormat"; public static final String DEFAULT_DATE_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String HAVE_MIXUP_FUNCTION = "hasMixupFunction"; public static final String MIXUP_FUNCTION_PATTERN = "\\s*(.*)\\s*,\\s*(.*)\\s*"; public static final String MIXUP_FUNCTION_PARAM1 = "mixupParam1"; public static final String MIXUP_FUNCTION_PARAM2 = "mixupParam2"; } ================================================ FILE: streamreader/src/main/java/com/alibaba/datax/plugin/reader/streamreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.streamreader; public class Key { /** * should look like:[{"value":"123","type":"int"},{"value":"hello","type":"string"}] */ public static final String COLUMN = "column"; public static final String SLICE_RECORD_COUNT = "sliceRecordCount"; } ================================================ FILE: streamreader/src/main/java/com/alibaba/datax/plugin/reader/streamreader/StreamReader.java ================================================ package com.alibaba.datax.plugin.reader.streamreader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.fastjson2.JSONObject; import org.apache.commons.lang3.RandomStringUtils; import org.apache.commons.lang3.RandomUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Date; import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern; public class StreamReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private Pattern mixupFunctionPattern; private Configuration originalConfig; @Override public void init() { this.originalConfig = super.getPluginJobConf(); // warn: 忽略大小写 this.mixupFunctionPattern = Pattern.compile(Constant.MIXUP_FUNCTION_PATTERN, Pattern.CASE_INSENSITIVE); dealColumn(this.originalConfig); Long sliceRecordCount = this.originalConfig .getLong(Key.SLICE_RECORD_COUNT); if (null == sliceRecordCount) { throw DataXException.asDataXException(StreamReaderErrorCode.REQUIRED_VALUE, "没有设置参数[sliceRecordCount]."); } else if (sliceRecordCount < 1) { throw DataXException.asDataXException(StreamReaderErrorCode.ILLEGAL_VALUE, "参数[sliceRecordCount]不能小于1."); } } private void dealColumn(Configuration originalConfig) { List columns = originalConfig.getList(Key.COLUMN, JSONObject.class); if (null == columns || columns.isEmpty()) { throw DataXException.asDataXException(StreamReaderErrorCode.REQUIRED_VALUE, "没有设置参数[column]."); } List dealedColumns = new ArrayList(); for (JSONObject eachColumn : columns) { Configuration eachColumnConfig = Configuration.from(eachColumn); try { this.parseMixupFunctions(eachColumnConfig); } catch (Exception e) { throw DataXException.asDataXException(StreamReaderErrorCode.NOT_SUPPORT_TYPE, String.format("解析混淆函数失败[%s]", e.getMessage()), e); } String typeName = eachColumnConfig.getString(Constant.TYPE); if (StringUtils.isBlank(typeName)) { // empty typeName will be set to default type: string eachColumnConfig.set(Constant.TYPE, Type.STRING); } else { if (Type.DATE.name().equalsIgnoreCase(typeName)) { boolean notAssignDateFormat = StringUtils .isBlank(eachColumnConfig .getString(Constant.DATE_FORMAT_MARK)); if (notAssignDateFormat) { eachColumnConfig.set(Constant.DATE_FORMAT_MARK, Constant.DEFAULT_DATE_FORMAT); } } if (!Type.isTypeIllegal(typeName)) { throw DataXException.asDataXException( StreamReaderErrorCode.NOT_SUPPORT_TYPE, String.format("不支持类型[%s]", typeName)); } } dealedColumns.add(eachColumnConfig.toJSON()); } originalConfig.set(Key.COLUMN, dealedColumns); } private void parseMixupFunctions(Configuration eachColumnConfig) throws Exception{ // 支持随机函数, demo如下: // LONG: random 0, 10 0到10之间的随机数字 // STRING: random 0, 10 0到10长度之间的随机字符串 // BOOL: random 0, 10 false 和 true出现的比率 // DOUBLE: random 0, 10 0到10之间的随机浮点数 // DATE: random 2014-07-07 00:00:00, 2016-07-07 00:00:00 开始时间->结束时间之间的随机时间,日期格式默认(不支持逗号)yyyy-MM-dd HH:mm:ss // BYTES: random 0, 10 0到10长度之间的随机字符串获取其UTF-8编码的二进制串 // 配置了混淆函数后,可不配置value // 2者都没有配置 String columnValue = eachColumnConfig.getString(Constant.VALUE); String columnMixup = eachColumnConfig.getString(Constant.RANDOM); if (StringUtils.isBlank(columnMixup)) { eachColumnConfig.getNecessaryValue(Constant.VALUE, StreamReaderErrorCode.REQUIRED_VALUE); } // 2者都有配置 if (StringUtils.isNotBlank(columnMixup) && StringUtils.isNotBlank(columnValue)) { LOG.warn(String.format("您配置了streamreader常量列(value:%s)和随机混淆列(random:%s), 常量列优先", columnValue, columnMixup)); eachColumnConfig.remove(Constant.RANDOM); } if (StringUtils.isNotBlank(columnMixup)) { Matcher matcher= this.mixupFunctionPattern.matcher(columnMixup); if (matcher.matches()) { String param1 = matcher.group(1); long param1Int = 0; String param2 = matcher.group(2); long param2Int = 0; if (StringUtils.isBlank(param1) && StringUtils.isBlank(param2)) { throw DataXException.asDataXException( StreamReaderErrorCode.ILLEGAL_VALUE, String.format("random混淆函数不合法[%s], 混淆函数random的参数不能为空:%s, %s", columnMixup, param1, param2)); } String typeName = eachColumnConfig.getString(Constant.TYPE); if (Type.DATE.name().equalsIgnoreCase(typeName)) { String dateFormat = eachColumnConfig.getString(Constant.DATE_FORMAT_MARK, Constant.DEFAULT_DATE_FORMAT); try{ SimpleDateFormat format = new SimpleDateFormat( eachColumnConfig.getString(Constant.DATE_FORMAT_MARK, Constant.DEFAULT_DATE_FORMAT)); //warn: do no concern int -> long param1Int = format.parse(param1).getTime();//milliseconds param2Int = format.parse(param2).getTime();//milliseconds }catch (ParseException e) { throw DataXException.asDataXException( StreamReaderErrorCode.ILLEGAL_VALUE, String.format("dateFormat参数[%s]和混淆函数random的参数不匹配,解析错误:%s, %s", dateFormat, param1, param2), e); } } else { param1Int = Integer.parseInt(param1); param2Int = Integer.parseInt(param2); } if (param1Int < 0 || param2Int < 0) { throw DataXException.asDataXException( StreamReaderErrorCode.ILLEGAL_VALUE, String.format("random混淆函数不合法[%s], 混淆函数random的参数不能为负数:%s, %s", columnMixup, param1, param2)); } if (!Type.BOOL.name().equalsIgnoreCase(typeName)) { if (param1Int > param2Int) { throw DataXException.asDataXException( StreamReaderErrorCode.ILLEGAL_VALUE, String.format("random混淆函数不合法[%s], 混淆函数random的参数需要第一个小于等于第二个:%s, %s", columnMixup, param1, param2)); } } eachColumnConfig.set(Constant.MIXUP_FUNCTION_PARAM1, param1Int); eachColumnConfig.set(Constant.MIXUP_FUNCTION_PARAM2, param2Int); } else { throw DataXException.asDataXException( StreamReaderErrorCode.ILLEGAL_VALUE, String.format("random混淆函数不合法[%s], 需要为param1, param2形式", columnMixup)); } this.originalConfig.set(Constant.HAVE_MIXUP_FUNCTION, true); } } @Override public void prepare() { } @Override public List split(int adviceNumber) { List configurations = new ArrayList(); for (int i = 0; i < adviceNumber; i++) { configurations.add(this.originalConfig.clone()); } return configurations; } @Override public void post() { } @Override public void destroy() { } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private List columns; private long sliceRecordCount; private boolean haveMixupFunction; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.columns = this.readerSliceConfig.getList(Key.COLUMN, String.class); this.sliceRecordCount = this.readerSliceConfig .getLong(Key.SLICE_RECORD_COUNT); this.haveMixupFunction = this.readerSliceConfig.getBool( Constant.HAVE_MIXUP_FUNCTION, false); } @Override public void prepare() { } @Override public void startRead(RecordSender recordSender) { Record oneRecord = buildOneRecord(recordSender, this.columns); while (this.sliceRecordCount > 0) { if (this.haveMixupFunction) { oneRecord = buildOneRecord(recordSender, this.columns); } recordSender.sendToWriter(oneRecord); this.sliceRecordCount--; } } @Override public void post() { } @Override public void destroy() { } private Column buildOneColumn(Configuration eachColumnConfig) throws Exception { String columnValue = eachColumnConfig .getString(Constant.VALUE); Type columnType = Type.valueOf(eachColumnConfig.getString( Constant.TYPE).toUpperCase()); String columnMixup = eachColumnConfig.getString(Constant.RANDOM); long param1Int = eachColumnConfig.getLong(Constant.MIXUP_FUNCTION_PARAM1, 0L); long param2Int = eachColumnConfig.getLong(Constant.MIXUP_FUNCTION_PARAM2, 1L); boolean isColumnMixup = StringUtils.isNotBlank(columnMixup); switch (columnType) { case STRING: if (isColumnMixup) { return new StringColumn(RandomStringUtils.randomAlphanumeric((int)RandomUtils.nextLong(param1Int, param2Int + 1))); } else { return new StringColumn(columnValue); } case LONG: if (isColumnMixup) { return new LongColumn(RandomUtils.nextLong(param1Int, param2Int + 1)); } else { return new LongColumn(columnValue); } case DOUBLE: if (isColumnMixup) { return new DoubleColumn(RandomUtils.nextDouble(param1Int, param2Int + 1)); } else { return new DoubleColumn(columnValue); } case DATE: SimpleDateFormat format = new SimpleDateFormat( eachColumnConfig.getString(Constant.DATE_FORMAT_MARK, Constant.DEFAULT_DATE_FORMAT)); if (isColumnMixup) { return new DateColumn(new Date(RandomUtils.nextLong(param1Int, param2Int + 1))); } else { return new DateColumn(format.parse(columnValue)); } case BOOL: if (isColumnMixup) { // warn: no concern -10 etc..., how about (0, 0)(0, 1)(1,2) if (param1Int == param2Int) { param1Int = 0; param2Int = 1; } if (param1Int == 0) { return new BoolColumn(true); } else if (param2Int == 0) { return new BoolColumn(false); } else { long randomInt = RandomUtils.nextLong(0, param1Int + param2Int + 1); return new BoolColumn(randomInt <= param1Int ? false : true); } } else { return new BoolColumn("true".equalsIgnoreCase(columnValue) ? true : false); } case BYTES: if (isColumnMixup) { return new BytesColumn(RandomStringUtils.randomAlphanumeric((int)RandomUtils.nextLong(param1Int, param2Int + 1)).getBytes()); } else { return new BytesColumn(columnValue.getBytes()); } default: // in fact,never to be here throw new Exception(String.format("不支持类型[%s]", columnType.name())); } } private Record buildOneRecord(RecordSender recordSender, List columns) { if (null == recordSender) { throw new IllegalArgumentException( "参数[recordSender]不能为空."); } if (null == columns || columns.isEmpty()) { throw new IllegalArgumentException( "参数[column]不能为空."); } Record record = recordSender.createRecord(); try { for (String eachColumn : columns) { Configuration eachColumnConfig = Configuration.from(eachColumn); record.addColumn(this.buildOneColumn(eachColumnConfig)); } } catch (Exception e) { throw DataXException.asDataXException(StreamReaderErrorCode.ILLEGAL_VALUE, "构造一个record失败.", e); } return record; } } private enum Type { STRING, LONG, BOOL, DOUBLE, DATE, BYTES, ; private static boolean isTypeIllegal(String typeString) { try { Type.valueOf(typeString.toUpperCase()); } catch (Exception e) { return false; } return true; } } } ================================================ FILE: streamreader/src/main/java/com/alibaba/datax/plugin/reader/streamreader/StreamReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.streamreader; import com.alibaba.datax.common.spi.ErrorCode; public enum StreamReaderErrorCode implements ErrorCode { REQUIRED_VALUE("StreamReader-00", "缺失必要的值"), ILLEGAL_VALUE("StreamReader-01", "值非法"), NOT_SUPPORT_TYPE("StreamReader-02", "不支持的column类型"),; private final String code; private final String description; private StreamReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: streamreader/src/main/resources/plugin.json ================================================ { "name": "streamreader", "class": "com.alibaba.datax.plugin.reader.streamreader.StreamReader", "description": { "useScene": "only for developer test.", "mechanism": "use datax framework to transport data from stream.", "warn": "Never use it in your real job." }, "developer": "alibaba" } ================================================ FILE: streamreader/src/main/resources/plugin_job_template.json ================================================ { "name": "streamreader", "parameter": { "sliceRecordCount": "", "column": [] } } ================================================ FILE: streamwriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT streamwriter streamwriter jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic src/main/resources **/*.* true maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: streamwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/streamwriter target/ streamwriter-0.0.1-SNAPSHOT.jar plugin/writer/streamwriter false plugin/writer/streamwriter/libs runtime ================================================ FILE: streamwriter/src/main/java/com/alibaba/datax/plugin/writer/streamwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.streamwriter; public class Key { public static final String FIELD_DELIMITER = "fieldDelimiter"; public static final String PRINT = "print"; public static final String PATH = "path"; public static final String FILE_NAME = "fileName"; public static final String RECORD_NUM_BEFORE_SLEEP = "recordNumBeforeSleep"; public static final String SLEEP_TIME = "sleepTime"; } ================================================ FILE: streamwriter/src/main/java/com/alibaba/datax/plugin/writer/streamwriter/StreamWriter.java ================================================ package com.alibaba.datax.plugin.writer.streamwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.io.FileUtils; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.*; import java.util.ArrayList; import java.util.List; public class StreamWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory .getLogger(Job.class); private Configuration originalConfig; @Override public void init() { this.originalConfig = super.getPluginJobConf(); String path = this.originalConfig.getString(Key.PATH, null); String fileName = this.originalConfig.getString(Key.FILE_NAME, null); if(StringUtils.isNoneBlank(path) && StringUtils.isNoneBlank(fileName)) { validateParameter(path, fileName); } } private void validateParameter(String path, String fileName) { try { // warn: 这里用户需要配一个目录 File dir = new File(path); if (dir.isFile()) { throw DataXException .asDataXException( StreamWriterErrorCode.ILLEGAL_VALUE, String.format( "您配置的path: [%s] 不是一个合法的目录, 请您注意文件重名, 不合法目录名等情况.", path)); } if (!dir.exists()) { boolean createdOk = dir.mkdirs(); if (!createdOk) { throw DataXException .asDataXException( StreamWriterErrorCode.CONFIG_INVALID_EXCEPTION, String.format("您指定的文件路径 : [%s] 创建失败.", path)); } } String fileFullPath = buildFilePath(path, fileName); File newFile = new File(fileFullPath); if(newFile.exists()) { try { FileUtils.forceDelete(newFile); } catch (IOException e) { throw DataXException.asDataXException( StreamWriterErrorCode.RUNTIME_EXCEPTION, String.format("删除文件失败 : [%s] ", fileFullPath), e); } } } catch (SecurityException se) { throw DataXException.asDataXException( StreamWriterErrorCode.SECURITY_NOT_ENOUGH, String.format("您没有权限创建文件路径 : [%s] ", path), se); } } @Override public void prepare() { } @Override public List split(int mandatoryNumber) { List writerSplitConfigs = new ArrayList(); for (int i = 0; i < mandatoryNumber; i++) { writerSplitConfigs.add(this.originalConfig); } return writerSplitConfigs; } @Override public void post() { } @Override public void destroy() { } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory .getLogger(Task.class); private static final String NEWLINE_FLAG = System.getProperty("line.separator", "\n"); private Configuration writerSliceConfig; private String fieldDelimiter; private boolean print; private String path; private String fileName; private long recordNumBeforSleep; private long sleepTime; @Override public void init() { this.writerSliceConfig = getPluginJobConf(); this.fieldDelimiter = this.writerSliceConfig.getString( Key.FIELD_DELIMITER, "\t"); this.print = this.writerSliceConfig.getBool(Key.PRINT, true); this.path = this.writerSliceConfig.getString(Key.PATH, null); this.fileName = this.writerSliceConfig.getString(Key.FILE_NAME, null); this.recordNumBeforSleep = this.writerSliceConfig.getLong(Key.RECORD_NUM_BEFORE_SLEEP, 0); this.sleepTime = this.writerSliceConfig.getLong(Key.SLEEP_TIME, 0); if(recordNumBeforSleep < 0) { throw DataXException.asDataXException(StreamWriterErrorCode.CONFIG_INVALID_EXCEPTION, "recordNumber 不能为负值"); } if(sleepTime <0) { throw DataXException.asDataXException(StreamWriterErrorCode.CONFIG_INVALID_EXCEPTION, "sleep 不能为负值"); } } @Override public void prepare() { } @Override public void startWrite(RecordReceiver recordReceiver) { if(StringUtils.isNoneBlank(path) && StringUtils.isNoneBlank(fileName)) { writeToFile(recordReceiver,path, fileName, recordNumBeforSleep, sleepTime); } else { try { BufferedWriter writer = new BufferedWriter( new OutputStreamWriter(System.out, "UTF-8")); Record record; while ((record = recordReceiver.getFromReader()) != null) { if (this.print) { writer.write(recordToString(record)); } else { /* do nothing */ } } writer.flush(); } catch (Exception e) { throw DataXException.asDataXException(StreamWriterErrorCode.RUNTIME_EXCEPTION, e); } } } private void writeToFile(RecordReceiver recordReceiver, String path, String fileName, long recordNumBeforSleep, long sleepTime) { LOG.info("begin do write..."); String fileFullPath = buildFilePath(path, fileName); LOG.info(String.format("write to file : [%s]", fileFullPath)); BufferedWriter writer = null; try { File newFile = new File(fileFullPath); newFile.createNewFile(); writer = new BufferedWriter( new OutputStreamWriter(new FileOutputStream(newFile, true), "UTF-8")); Record record; int count =0; while ((record = recordReceiver.getFromReader()) != null) { if(recordNumBeforSleep > 0 && sleepTime >0 &&count == recordNumBeforSleep) { LOG.info("StreamWriter start to sleep ... recordNumBeforSleep={},sleepTime={}",recordNumBeforSleep,sleepTime); try { Thread.sleep(sleepTime * 1000l); } catch (InterruptedException e) { } } writer.write(recordToString(record)); count++; } writer.flush(); } catch (Exception e) { throw DataXException.asDataXException(StreamWriterErrorCode.RUNTIME_EXCEPTION, e); } finally { IOUtils.closeQuietly(writer); } } @Override public void post() { } @Override public void destroy() { } private String recordToString(Record record) { int recordLength = record.getColumnNumber(); if (0 == recordLength) { return NEWLINE_FLAG; } Column column; StringBuilder sb = new StringBuilder(); for (int i = 0; i < recordLength; i++) { column = record.getColumn(i); sb.append(column.asString()).append(fieldDelimiter); } sb.setLength(sb.length() - 1); sb.append(NEWLINE_FLAG); return sb.toString(); } } private static String buildFilePath(String path, String fileName) { boolean isEndWithSeparator = false; switch (IOUtils.DIR_SEPARATOR) { case IOUtils.DIR_SEPARATOR_UNIX: isEndWithSeparator = path.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR)); break; case IOUtils.DIR_SEPARATOR_WINDOWS: isEndWithSeparator = path.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR_WINDOWS)); break; default: break; } if (!isEndWithSeparator) { path = path + IOUtils.DIR_SEPARATOR; } return String.format("%s%s", path, fileName); } } ================================================ FILE: streamwriter/src/main/java/com/alibaba/datax/plugin/writer/streamwriter/StreamWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.streamwriter; import com.alibaba.datax.common.spi.ErrorCode; public enum StreamWriterErrorCode implements ErrorCode { RUNTIME_EXCEPTION("StreamWriter-00", "运行时异常"), ILLEGAL_VALUE("StreamWriter-01", "您填写的参数值不合法."), CONFIG_INVALID_EXCEPTION("StreamWriter-02", "您的参数配置错误."), SECURITY_NOT_ENOUGH("TxtFileWriter-03", "您缺少权限执行相应的文件写入操作."); private final String code; private final String description; private StreamWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: streamwriter/src/main/resources/plugin.json ================================================ { "name": "streamwriter", "class": "com.alibaba.datax.plugin.writer.streamwriter.StreamWriter", "description": { "useScene": "only for developer test.", "mechanism": "use datax framework to transport data to stream.", "warn": "Never use it in your real job." }, "developer": "alibaba" } ================================================ FILE: streamwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "streamwriter", "parameter": { "encoding": "", "print": true } } ================================================ FILE: sybasereader/doc/sybasereader.md ================================================ # SybaseReader 插件文档 ___ ## 1 快速介绍 SybaseReader插件实现了从Sybase读取数据。在底层实现上,SybaseReader通过JDBC连接远程Sybase数据库,并执行相应的sql语句将数据从Sybase库中SELECT出来。 ## 2 实现原理 简而言之,SybaseReader通过JDBC连接器连接到远程的Sybase数据库,并根据用户配置的信息生成查询SELECT SQL语句并发送到远程Sybase数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。 对于用户配置Table、Column、Where的信息,SybaseReader将其拼接为SQL语句发送到Sybase数据库;对于用户配置querySql信息,Sybase直接将其发送到Sybase数据库。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从Sybase数据库同步抽取数据到本地的作业: ``` { "job": { "setting": { "speed": { //设置传输速度 byte/s 尽量逼近这个速度但是不高于它. // channel 表示通道数量,byte表示通道速度,如果单通道速度1MB,配置byte为1048576表示一个channel "byte": 1048576 }, //出错限制 "errorLimit": { //先选择record "record": 0, //百分比 1表示100% "percentage": 0.02 } }, "content": [ { "reader": { "name": "SybaseReader", "parameter": { // 数据库连接用户名 "username": "root", // 数据库连接密码 "password": "root", "column": [ "id","name" ], //切分主键 "splitPk": "db_id", "connection": [ { "table": [ "table" ], "jdbcUrl": [ "jdbc:sybase:Tds:192.168.1.92:5000/tempdb?charset=cp936" ] } ] } }, "writer": { //writer类型 "name": "streamwriter", // 是否打印内容 "parameter": { "print": true } } } ] } } ``` * 配置一个自定义SQL的数据库同步任务到本地内容的作业: ``` { "job": { "setting": { "speed": { "channel": 5 } }, "content": [ { "reader": { "name": "SybaseReader", "parameter": { "username": "root", "password": "root", "where": "", "connection": [ { "querySql": [ "select db_id,on_line_flag from db_info where db_id < 10" ], "jdbcUrl": [ "jdbc:sybase:Tds:192.168.1.92:5000/tempdb?charset=cp936" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "visible": false, "encoding": "UTF-8" } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,SybaseReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,SybaseReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。 jdbcUrl按照Sybase官方规范,并可以填写连接附件控制信息。具体请参看[Sybase官方文档](http://www.Sybase.com/technetwork/database/enterprise-edition/documentation/index.html)。 * 必选:是
    * 默认值:无
    * **username** * 描述:数据源的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:数据源指定用户名的密码
    * 必选:是
    * 默认值:无
    * **table** * 描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,SybaseReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
    * 必选:是
    * 默认值:无
    * **column** * 描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用\*代表默认使用所有列配置,例如['\*']。 支持列裁剪,即列可以挑选部分列进行导出。 支持列换序,即列可以不按照表schema信息进行导出。 支持常量配置,用户需要按照JSON格式: ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] id为普通列名,\`table\`为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。 Column必须显示填写,不允许为空! * 必选:是
    * 默认值:无
    * **splitPk** * 描述:SybaseReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。 推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 目前splitPk仅支持整形、字符串型数据切分,`不支持浮点、日期等其他类型`。如果用户指定其他非支持类型,SybaseReader将报错! splitPk如果不填写,将视作用户不对单表进行切分,SybaseReader使用单通道同步全量数据。 * 必选:否
    * 默认值:无
    * **where** * 描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
    where条件可以有效地进行业务增量同步。 * 必选:否
    * 默认值:无
    * **querySql** * 描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id
    `当用户配置querySql时,SybaseReader直接忽略table、column、where条件的配置`。 * 必选:否
    * 默认值:无
    * **fetchSize** * 描述:该配置项定义了插件和数据库服务器端每次批量数据获取条数,该值决定了DataX和服务器端的网络交互次数,能够较大的提升数据抽取性能。
    `注意,该值过大(>2048)可能造成DataX进程OOM。`。 * 必选:否
    * 默认值:1024
    ### 3.3 类型转换 目前SybaseReader支持大部分Sybase类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出SybaseReader针对Sybase类型转换列表: | DataX 内部类型| Sybase 数据类型 | | -------- | ----- | | Long |Tinyint,Smallint,Int,Money,Smallmoney| | Double |Float,Real,Numeric,Decimal| | String |Char,Varchar,Nchar,Nvarchar,Text| | Date |Timestamp,Datetime,Smalldatetime| | Boolean |bit, bool| | Bytes |Binary,Varbinary,Image| 请注意: * `除上述罗列字段类型外,其他类型均不支持`。 ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 为了模拟线上真实数据,我们设计两个Sybase数据表,分别为: #### 4.1.2 机器参数 * 执行DataX的机器参数为: * Sybase数据库机器参数为: ### 4.2 测试报告 #### 4.2.1 表1测试报告 | 并发任务数| DataX速度(Rec/s)|DataX流量|网卡流量|DataX运行负载|DB运行负载| |--------| --------|--------|--------|--------|--------| |1| DataX 统计速度(Rec/s)|DataX统计流量|网卡流量|DataX运行负载|DB运行负载| ## 5 约束限制 ### 5.1 一致性约束 Sybase在数据存储划分中属于RDBMS系统,对外可以提供强一致性数据查询接口。例如当一次同步任务启动运行过程中,当该库存在其他数据写入方写入数据时,SybaseReader完全不会获取到写入更新数据,这是由于数据库本身的快照特性决定的。关于数据库快照特性,请参看[MVCC Wikipedia](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) 上述是在SybaseReader单线程模型下数据同步一致性的特性,由于SybaseReader可以根据用户配置信息使用了并发数据抽取,因此不能严格保证数据一致性:当SybaseReader根据splitPk进行数据切分后,会先后启动多个并发任务完成数据同步。由于多个并发任务相互之间不属于同一个读事务,同时多个并发任务存在时间间隔。因此这份数据并不是`完整的`、`一致的`数据快照信息。 针对多线程的一致性快照需求,在技术上目前无法实现,只能从工程角度解决,工程化的方式存在取舍,我们提供几个解决思路给用户,用户可以自行选择: 1. 使用单线程同步,即不再进行数据切片。缺点是速度比较慢,但是能够很好保证一致性。 2. 关闭其他数据写入方,保证当前数据为静态数据,例如,锁表、关闭备库同步等等。缺点是可能影响在线业务。 ### 5.2 数据库编码问题 SybaseReader底层使用JDBC进行数据抽取,JDBC天然适配各类编码,并在底层进行了编码转换。因此SybaseReader不需用户指定编码,可以自动获取编码并转码。 对于Sybase底层写入编码和其设定的编码不一致的混乱情况,SybaseReader对此无法识别,对此也无法提供解决方案,对于这类情况,`导出有可能为乱码`。 ### 5.3 增量数据同步 SybaseReader使用JDBC SELECT语句完成数据抽取工作,因此可以使用SELECT...WHERE...进行增量数据抽取,方式有多种: * 数据库在线应用写入数据库时,填充modify字段为更改时间戳,包括新增、更新、删除(逻辑删)。对于这类应用,SybaseReader只需要WHERE条件跟上一同步阶段时间戳即可。 * 对于新增流水型数据,SybaseReader可以WHERE条件后跟上一阶段最大自增ID即可。 对于业务上无字段区分新增、修改数据情况,SybaseReader也无法进行增量数据同步,只能同步全量数据。 ### 5.4 Sql安全性 SybaseReader提供querySql语句交给用户自己实现SELECT抽取语句,SybaseReader本身对querySql不做任何安全性校验。这块交由DataX用户方自己保证。 ## 6 FAQ *** **Q: 目前已验证支持sybase的版本?** A: Sybase ASE 16/15.7 **Q: SybaseReader同步报错,报错信息为XXX** A: 网络或者权限问题,请使用Sybase命令行或者可视化工具进行测试: 如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 **Q: SybaseReader抽取速度很慢怎么办?** A: 影响抽取时间的原因大概有如下几个: 1. 由于SQL的plan异常,导致的抽取时间长; 在抽取时,尽可能使用全表扫描代替索引扫描; 2. 合理sql的并发度,减少抽取时间;根据表的大小, 3. 设置合理fetchsize,减少网络IO; ================================================ FILE: sybasereader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 sybasereader sybasereader jar 8 8 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.oracle ojdbc6 11.2.0.3 com.alibaba.datax datax-common 0.0.1-SNAPSHOT compile com.sybase.jconnect jconn4 16.0 system ${basedir}/src/main/libs/jconn4-16.0.jar junit junit 4.13.2 test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: sybasereader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/sybasereader src/main/libs *.* plugin/reader/sybasereader/libs target/ sybasereader-0.0.1-SNAPSHOT.jar plugin/reader/sybasereader false plugin/reader/sybasereader/libs runtime ================================================ FILE: sybasereader/src/main/java/com/alibaba/datax/plugin/reader/sybasereader/Constants.java ================================================ package com.alibaba.datax.plugin.reader.sybasereader; public class Constants { public static final int DEFAULT_FETCH_SIZE = 1024; } ================================================ FILE: sybasereader/src/main/java/com/alibaba/datax/plugin/reader/sybasereader/SybaseReader.java ================================================ package com.alibaba.datax.plugin.reader.sybasereader; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; import com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader; import com.alibaba.datax.plugin.rdbms.reader.Constant; public class SybaseReader extends Reader { private static final DataBaseType DATABASE_TYPE = DataBaseType.Sybase; public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory .getLogger(SybaseReader.Job.class); private Configuration originalConfig = null; private CommonRdbmsReader.Job commonRdbmsReaderJob; @Override public void init() { this.originalConfig = super.getPluginJobConf(); dealFetchSize(this.originalConfig); this.commonRdbmsReaderJob = new CommonRdbmsReader.Job( DATABASE_TYPE); this.commonRdbmsReaderJob.init(this.originalConfig); } @Override public void preCheck(){ init(); this.commonRdbmsReaderJob.preCheck(this.originalConfig,DATABASE_TYPE); } @Override public List split(int adviceNumber) { return this.commonRdbmsReaderJob.split(this.originalConfig, adviceNumber); } @Override public void post() { this.commonRdbmsReaderJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsReaderJob.destroy(this.originalConfig); } private void dealFetchSize(Configuration originalConfig) { int fetchSize = originalConfig.getInt( com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, Constants.DEFAULT_FETCH_SIZE); if (fetchSize < 1) { LOG.warn("对 sybasereader 需要配置 fetchSize, 对性能提升有较大影响 请配置fetchSize."); } originalConfig.set( com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE, fetchSize); } } public static class Task extends Reader.Task { private Configuration readerSliceConfig; private CommonRdbmsReader.Task commonRdbmsReaderTask; @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); this.commonRdbmsReaderTask = new CommonRdbmsReader.Task( DATABASE_TYPE ,super.getTaskGroupId(), super.getTaskId()); this.commonRdbmsReaderTask.init(this.readerSliceConfig); } @Override public void startRead(RecordSender recordSender) { int fetchSize = this.readerSliceConfig .getInt(com.alibaba.datax.plugin.rdbms.reader.Constant.FETCH_SIZE); this.commonRdbmsReaderTask.startRead(this.readerSliceConfig, recordSender, super.getTaskPluginCollector(), fetchSize); } @Override public void post() { this.commonRdbmsReaderTask.post(this.readerSliceConfig); } @Override public void destroy() { this.commonRdbmsReaderTask.destroy(this.readerSliceConfig); } } } ================================================ FILE: sybasereader/src/main/resources/plugin.json ================================================ { "name": "sybasereader", "class": "com.alibaba.datax.plugin.reader.sybasereader.SybaseReader", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: sybasereader/src/main/resources/plugin_job_template.json ================================================ { "name": "sybasereader", "parameter": { "username": "", "password": "", "column": [], "connection": [ { "table": [], "jdbcUrl": [] } ] } } ================================================ FILE: sybasereader/src/test/java/com/alibaba/datax/plugin/reader/sybasereader/SybaseDatabaseUnitTest.java ================================================ package com.alibaba.datax.plugin.reader.sybasereader; import org.junit.After; import org.junit.Before; import org.junit.Test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import static org.junit.Assert.assertEquals; public class SybaseDatabaseUnitTest { private Connection connection; @Before public void setUp() { // 连接到 Sybase 数据库 String jdbcUrl = "jdbc:sybase:Tds:192.172.172.80:1680/database"; String username = "admin"; String password = "admin123"; try { connection = DriverManager.getConnection(jdbcUrl, username, password); } catch (SQLException e) { e.printStackTrace(); } } @After public void tearDown() { if (connection != null) { try { connection.close(); } catch (SQLException e) { e.printStackTrace(); } } } @Test public void testDatabaseQuery() throws SQLException { String query = "SELECT COUNT(*) FROM your_table"; int expectedRowCount = 10; // 假设期望返回的行数是 10 Statement statement = connection.createStatement(); ResultSet resultSet = statement.executeQuery(query); resultSet.next(); int rowCount = resultSet.getInt(1); assertEquals(expectedRowCount, rowCount); } } ================================================ FILE: sybasewriter/doc/sybasewriter.md ================================================ # DataX SybaseWriter --- ## 1 快速介绍 SybaseWriter 插件实现了写入数据到 Sybase 主库的目的表的功能。在底层实现上, SybaseWriter 通过 JDBC 连接远程 Sybase 数据库,并执行相应的 insert into ... 或者 ( replace into ...) 的 sql 语句将数据写入 Sybase,内部会分批次提交入库,需要数据库本身采用 innodb 引擎。 SybaseWriter 面向ETL开发工程师,他们使用 SybaseWriter 从数仓导入数据到 Sybase。同时 SybaseWriter 亦可以作为数据迁移工具为DBA等用户提供服务。 ## 2 实现原理 SybaseWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据你配置的 `writeMode` 生成 * `insert into...`(当主键/唯一性索引冲突时会写不进去冲突的行) ##### 或者 * `replace into...`(没有遇到主键/唯一性索引冲突时,与 insert into 行为一致,冲突时会用新行替换原有行所有字段) 的语句写入数据到 Sybase。出于性能考虑,采用了 `PreparedStatement + Batch`,并且设置了:`rewriteBatchedStatements=true`,将数据缓冲到线程上下文 Buffer 中,当 Buffer 累计到预定阈值时,才发起写入请求。
    注意:目的表所在数据库必须是主库才能写入数据;整个任务至少需要具备 insert/replace into...的权限,是否需要其他权限,取决于你任务配置中在 preSql 和 postSql 中指定的语句。 ## 3 功能说明 ### 3.1 配置样例 * 这里使用一份从内存产生到 Sybase 导入的数据。 ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column" : [ { "value": "DataX", "type": "string" }, { "value": 19880808, "type": "long" }, { "value": "1988-08-08 08:08:08", "type": "date" }, { "value": true, "type": "bool" }, { "value": "test", "type": "bytes" } ], "sliceRecordCount": 1000 } }, "writer": { "name": "Sybasewriter", "parameter": { "writeMode": "insert", "username": "root", "password": "root", "column": [ "id", "name" ], "preSql": [ "delete from test" ], "connection": [ { "jdbcUrl":"jdbc:sybase:Tds:192.168.1.92:5000/tempdb?charset=cp936", "table": [ "test" ] } ] } } } ] } } ``` ### 3.2 参数说明 * **jdbcUrl** * 描述:目的数据库的 JDBC 连接信息。作业运行时,DataX 会在你提供的 jdbcUrl 后面追加如下属性:yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true 注意:1、在一个数据库上只能配置一个 jdbcUrl 值。这与 SybaseReader 支持多个备库探测不同,因为此处不支持同一个数据库存在多个主库的情况(双主导入数据情况) 2、jdbcUrl按照Sybase官方规范,并可以填写连接附加控制信息,比如想指定连接编码为 gbk ,则在 jdbcUrl 后面追加属性 useUnicode=true&characterEncoding=gbk。具体请参看 Sybase官方文档或者咨询对应 DBA。 * 必选:是
    * 默认值:无
    * **username** * 描述:目的数据库的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:目的数据库的密码
    * 必选:是
    * 默认值:无
    * **table** * 描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。 注意:table 和 jdbcUrl 必须包含在 connection 配置单元中 * 必选:是
    * 默认值:无
    * **column** * 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用`*`表示, 例如: `"column": ["*"]`。 **column配置项必须指定,不能留空!** 注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败 2、 column 不能配置任何常量值 * 必选:是
    * 默认值:否
    * **preSql** * 描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 `@table` 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:`"preSql":["delete from 表名"]`,效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称
    * 必选:否
    * 默认值:无
    * **postSql** * 描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql )
    * 必选:否
    * 默认值:无
    * **writeMode** * 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句
    * 必选:是
    * 所有选项:insert/replace/update
    * 默认值:insert
    * **batchSize** * 描述:一次性批量提交的记录数大小,该值可以极大减少DataX与Sybase的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
    * 必选:否
    * 默认值:1024
    ### 3.3 类型转换 类似 SybaseReader ,目前 SybaseWriter 支持大部分 Sybase 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 下面列出 SybaseWriter 针对 Sybase 类型转换列表: | DataX 内部类型| Sybase 数据类型 | | -------- | ----- | | Long |Tinyint,Smallint,Int,Money,Smallmoney| | Double |Float,Real,Numeric,Decimal| | String |Char,Varchar,Nchar,Nvarchar,Text| | Date |Timestamp,Datetime,Smalldatetime| | Boolean |bit, bool| | Bytes |Binary,Varbinary,Image| ## 4 性能报告 ## 5 约束限制 ## FAQ *** **Q: 目前已验证支持sybase的版本?** A: Sybase ASE 16/15.7 **Q: SybaseReader同步报错,报错信息为XXX** A: 网络或者权限问题,请使用Sybase命令行或者可视化工具进行测试: 如果上述命令也报错,那可以证实是环境问题,请联系你的DBA。 ================================================ FILE: sybasewriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 sybasewriter 8 8 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j org.slf4j slf4j-api ch.qos.logback logback-classic com.alibaba.datax plugin-rdbms-util ${datax-project-version} com.oracle ojdbc6 11.2.0.3 com.alibaba.datax datax-common 0.0.1-SNAPSHOT compile com.sybase.jconnect jconn4 16.0 system ${basedir}/src/main/libs/jconn4-16.0.jar maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: sybasewriter/src/main/assembly/package.xml ================================================ dir false src/main/java/resources plugin.json plugin_job_template.json plugin/writer/sybasewriter src/main/libs *.* plugin/writer/sybasewriter/libs target/ sybasewriter-0.0.1-SNAPSHOT.jar plugin/writer/sybasewriter false plugin/writer/sybasewriter/libs runtime ================================================ FILE: sybasewriter/src/main/java/com/alibaba/datax/plugin/writer/sybasewriter/SybaseWriter.java ================================================ package com.alibaba.datax.plugin.writer.sybasewriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter; import com.alibaba.datax.plugin.rdbms.writer.Key; import com.alibaba.datax.plugin.rdbms.util.DataBaseType; import java.util.List; public class SybaseWriter extends Writer { private static final DataBaseType DATABASE_TYPE = DataBaseType.Sybase; public static class Job extends Writer.Job { private Configuration originalConfig = null; private CommonRdbmsWriter.Job commonRdbmsWriterJob; @Override public void preCheck(){ this.init(); this.commonRdbmsWriterJob.writerPreCheck(this.originalConfig, DATABASE_TYPE); } @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.commonRdbmsWriterJob = new CommonRdbmsWriter.Job(DATABASE_TYPE); this.commonRdbmsWriterJob.init(this.originalConfig); } // 一般来说,是需要推迟到 task 中进行pre 的执行(单表情况例外) @Override public void prepare() { //实跑先不支持 权限 检验 //this.commonRdbmsWriterJob.privilegeValid(this.originalConfig, DATABASE_TYPE); this.commonRdbmsWriterJob.prepare(this.originalConfig); } @Override public List split(int mandatoryNumber) { return this.commonRdbmsWriterJob.split(this.originalConfig, mandatoryNumber); } // 一般来说,是需要推迟到 task 中进行post 的执行(单表情况例外) @Override public void post() { this.commonRdbmsWriterJob.post(this.originalConfig); } @Override public void destroy() { this.commonRdbmsWriterJob.destroy(this.originalConfig); } } public static class Task extends Writer.Task { private Configuration writerSliceConfig; private CommonRdbmsWriter.Task commonRdbmsWriterTask; @Override public void init() { this.writerSliceConfig = super.getPluginJobConf(); this.commonRdbmsWriterTask = new CommonRdbmsWriter.Task(DATABASE_TYPE); this.commonRdbmsWriterTask.init(this.writerSliceConfig); } @Override public void prepare() { this.commonRdbmsWriterTask.prepare(this.writerSliceConfig); } public void startWrite(RecordReceiver recordReceiver) { this.commonRdbmsWriterTask.startWrite(recordReceiver, this.writerSliceConfig, super.getTaskPluginCollector()); } @Override public void post() { this.commonRdbmsWriterTask.post(this.writerSliceConfig); } @Override public void destroy() { this.commonRdbmsWriterTask.destroy(this.writerSliceConfig); } @Override public boolean supportFailOver(){ String writeMode = writerSliceConfig.getString(Key.WRITE_MODE); return "replace".equalsIgnoreCase(writeMode); } } } ================================================ FILE: sybasewriter/src/main/java/resources/plugin.json ================================================ { "name": "sybasewriter", "class": "com.alibaba.datax.plugin.writer.sybasewriter.SybaseWriter", "description": "useScene: prod. mechanism: Jdbc connection using the database, execute select sql, retrieve data from the ResultSet. warn: The more you know about the database, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: sybasewriter/src/main/java/resources/plugin_job_template.json ================================================ { "name": "sybasewriter", "parameter": { "username": "", "password": "", "column": [], "connection": [ { "table": [], "jdbcUrl": [] } ] } } ================================================ FILE: tdenginereader/doc/tdenginereader-CN.md ================================================ # DataX TDengineReader ## 1 快速介绍 TDengineReader 插件实现了 TDengine 读取数据的功能。 ## 2 实现原理 TDengineReader 通过 TDengine 的 JDBC driver 查询获取数据。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从 TDengine 抽取数据作业: ```json { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "connection": [ { "table": [ "meters" ], "jdbcUrl": [ "jdbc:TAOS-RS://192.168.56.105:6041/test?timestampFormat=TIMESTAMP" ] } ], "column": [ "ts", "current", "voltage", "phase" ], "where": "ts>=0", "beginDateTime": "2017-07-14 10:40:00", "endDateTime": "2017-08-14 10:40:00" } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 1 } } } } ``` * 配置一个自定义 SQL 的数据抽取作业: ```json { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "user": "root", "password": "taosdata", "connection": [ { "querySql": [ "select * from test.meters" ], "jdbcUrl": [ "jdbc:TAOS-RS://192.168.56.105:6041/test?timestampFormat=TIMESTAMP" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 1 } } } } ``` ### 3.2 参数说明 * **username** * 描述:TDengine 实例的用户名
    * 必选:是
    * 默认值:无
    * **password** * 描述:TDengine 实例的密码
    * 必选:是
    * 默认值:无
    * **jdbcUrl** * 描述:TDengine 数据库的JDBC连接信息。注意,jdbcUrl必须包含在connection配置单元中。JdbcUrl具体请参看TDengine官方文档。 * 必选:是
    * 默认值:无
    * **querySql** * 描述:在有些业务场景下,where 这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了 querySql 后, TDengineReader 就会忽略 table, column, where, beginDateTime, endDateTime这些配置型,直接使用这个配置项的内容对数据进行筛选。例如需要 进行多表join后同步数据,使用 select a,b from table_a join table_b on table_a.id = table_b.id
    * 必选:否
    * 默认值:无
    * **table** * 描述:所选取的需要同步的表。使用 JSON 的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一 schema 结构, TDengineReader不予检查表是否同一逻辑表。注意,table必须包含在 connection 配置单元中。
    * 必选:是
    * 默认值:无
    * **where** * 描述:筛选条件中的 where 子句,TDengineReader 根据指定的column, table, where, begingDateTime, endDateTime 条件拼接 SQL,并根据这个 SQL 进行数据抽取。
    * 必选:否
    * 默认值:无
    * **beginDateTime** * 描述:数据的开始时间,Job 迁移从 begineDateTime 到 endDateTime 的数据,格式为 yyyy-MM-dd HH:mm:ss
    * 必选:否
    * 默认值:无
    * **endDateTime** * 描述:数据的结束时间,Job 迁移从 begineDateTime 到 endDateTime 的数据,格式为 yyyy-MM-dd HH:mm:ss
    * 必选:否
    * 默认值:无
    ### 3.3 类型转换 | TDengine 数据类型 | DataX 内部类型 | | --------------- | ------------- | | TINYINT | Long | | SMALLINT | Long | | INTEGER | Long | | BIGINT | Long | | FLOAT | Double | | DOUBLE | Double | | BOOLEAN | Bool | | TIMESTAMP | Date | | BINARY | Bytes | | NCHAR | String | ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 #### 4.1.2 机器参数 #### 4.1.3 DataX jvm 参数 -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数| DataX速度(Rec/s)|DataX流量(MB/s)| DataX机器网卡流出流量(MB/s)|DataX机器运行负载|DB网卡进入流量(MB/s)|DB运行负载|DB TPS| |--------| --------|--------|--------|--------|--------|--------|--------| |1| | | | | | | | |4| | | | | | | | |8| | | | | | | | |16| | | | | | | | |32| | | | | | | | 说明: #### 4.2.4 性能测试小结 1. 2. ## 5 约束限制 ## FAQ ================================================ FILE: tdenginereader/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 tdenginereader 8 8 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax.tdenginewriter tdenginewriter 0.0.1-SNAPSHOT compile com.taosdata.jdbc taos-jdbcdriver 2.0.39 junit junit ${junit-version} test com.alibaba.datax plugin-rdbms-util 0.0.1-SNAPSHOT compile com.alibaba.datax datax-core 0.0.1-SNAPSHOT test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single org.apache.maven.plugins maven-surefire-plugin 2.12.4 **/*Test.java true ================================================ FILE: tdenginereader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/tdenginereader target/ tdenginereader-0.0.1-SNAPSHOT.jar plugin/reader/tdenginereader false plugin/reader/tdenginereader/libs runtime ================================================ FILE: tdenginereader/src/main/java/com/alibaba/datax/plugin/reader/TDengineReader.java ================================================ package com.alibaba.datax.plugin.reader; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.tdenginewriter.Key; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.UnsupportedEncodingException; import java.sql.*; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.List; public class TDengineReader extends Reader { private static final String DATETIME_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig; @Override public void init() { this.originalConfig = super.getPluginJobConf(); // check username String username = this.originalConfig.getString(Key.USERNAME); if (StringUtils.isBlank(username)) throw DataXException.asDataXException(TDengineReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.USERNAME + "] is not set."); // check password String password = this.originalConfig.getString(Key.PASSWORD); if (StringUtils.isBlank(password)) throw DataXException.asDataXException(TDengineReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.PASSWORD + "] is not set."); // check connection List connectionList = this.originalConfig.getListConfiguration(Key.CONNECTION); if (connectionList == null || connectionList.isEmpty()) throw DataXException.asDataXException(TDengineReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.CONNECTION + "] is not set."); for (int i = 0; i < connectionList.size(); i++) { Configuration conn = connectionList.get(i); // check jdbcUrl List jdbcUrlList = conn.getList(Key.JDBC_URL); if (jdbcUrlList == null || jdbcUrlList.isEmpty()) { throw DataXException.asDataXException(TDengineReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.JDBC_URL + "] of connection[" + (i + 1) + "] is not set."); } // check table/querySql List querySqlList = conn.getList(Key.QUERY_SQL); if (querySqlList == null || querySqlList.isEmpty()) { String querySql = conn.getString(Key.QUERY_SQL); if (StringUtils.isBlank(querySql)) { List table = conn.getList(Key.TABLE); if (table == null || table.isEmpty()) throw DataXException.asDataXException(TDengineReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.TABLE + "] of connection[" + (i + 1) + "] is not set."); } } } SimpleDateFormat format = new SimpleDateFormat(DATETIME_FORMAT); // check beginDateTime String beginDatetime = this.originalConfig.getString(Key.BEGIN_DATETIME); long start = Long.MIN_VALUE; if (!StringUtils.isBlank(beginDatetime)) { try { start = format.parse(beginDatetime).getTime(); } catch (ParseException e) { throw DataXException.asDataXException(TDengineReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.BEGIN_DATETIME + "] needs to conform to the [" + DATETIME_FORMAT + "] format."); } } // check endDateTime String endDatetime = this.originalConfig.getString(Key.END_DATETIME); long end = Long.MAX_VALUE; if (!StringUtils.isBlank(endDatetime)) { try { end = format.parse(endDatetime).getTime(); } catch (ParseException e) { throw DataXException.asDataXException(TDengineReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.END_DATETIME + "] needs to conform to the [" + DATETIME_FORMAT + "] format."); } } if (start >= end) throw DataXException.asDataXException(TDengineReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.BEGIN_DATETIME + "] should be less than the parameter [" + Key.END_DATETIME + "]."); } @Override public void destroy() { } @Override public List split(int adviceNumber) { List configurations = new ArrayList<>(); List connectionList = this.originalConfig.getListConfiguration(Key.CONNECTION); for (Configuration conn : connectionList) { List jdbcUrlList = conn.getList(Key.JDBC_URL, String.class); for (String jdbcUrl : jdbcUrlList) { Configuration clone = this.originalConfig.clone(); clone.set(Key.JDBC_URL, jdbcUrl); clone.set(Key.TABLE, conn.getList(Key.TABLE)); clone.set(Key.QUERY_SQL, conn.getList(Key.QUERY_SQL)); clone.remove(Key.CONNECTION); configurations.add(clone); } } LOG.info("Configuration: {}", configurations); return configurations; } } public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration readerSliceConfig; private String mandatoryEncoding; private Connection conn; private List tables; private List columns; private String startTime; private String endTime; private String where; private List querySql; static { try { Class.forName("com.taosdata.jdbc.TSDBDriver"); Class.forName("com.taosdata.jdbc.rs.RestfulDriver"); } catch (ClassNotFoundException ignored) { LOG.warn(ignored.getMessage(), ignored); } } @Override public void init() { this.readerSliceConfig = super.getPluginJobConf(); String user = readerSliceConfig.getString(Key.USERNAME); String password = readerSliceConfig.getString(Key.PASSWORD); String url = readerSliceConfig.getString(Key.JDBC_URL); try { this.conn = DriverManager.getConnection(url, user, password); } catch (SQLException e) { throw DataXException.asDataXException(TDengineReaderErrorCode.CONNECTION_FAILED, "The parameter [" + Key.JDBC_URL + "] : " + url + " failed to connect since: " + e.getMessage(), e); } this.tables = readerSliceConfig.getList(Key.TABLE, String.class); this.columns = readerSliceConfig.getList(Key.COLUMN, String.class); this.startTime = readerSliceConfig.getString(Key.BEGIN_DATETIME); this.endTime = readerSliceConfig.getString(Key.END_DATETIME); this.where = readerSliceConfig.getString(Key.WHERE, "_c0 > " + Long.MIN_VALUE); this.querySql = readerSliceConfig.getList(Key.QUERY_SQL, String.class); this.mandatoryEncoding = readerSliceConfig.getString(Key.MANDATORY_ENCODING, "UTF-8"); } @Override public void destroy() { try { if (conn != null) conn.close(); } catch (SQLException e) { LOG.error(e.getMessage(), e); } } @Override public void startRead(RecordSender recordSender) { List sqlList = new ArrayList<>(); if (querySql == null || querySql.isEmpty()) { for (String table : tables) { StringBuilder sb = new StringBuilder(); sb.append("select ").append(StringUtils.join(columns, ",")).append(" from ").append(table).append(" "); sb.append("where ").append(where); if (!StringUtils.isBlank(startTime)) { sb.append(" and _c0 >= '").append(startTime).append("'"); } if (!StringUtils.isBlank(endTime)) { sb.append(" and _c0 < '").append(endTime).append("'"); } String sql = sb.toString().trim(); sqlList.add(sql); } } else { sqlList.addAll(querySql); } for (String sql : sqlList) { try (Statement stmt = conn.createStatement()) { ResultSet rs = stmt.executeQuery(sql); while (rs.next()) { Record record = buildRecord(recordSender, rs, mandatoryEncoding); recordSender.sendToWriter(record); } } catch (SQLException e) { LOG.error(e.getMessage(), e); } } } private Record buildRecord(RecordSender recordSender, ResultSet rs, String mandatoryEncoding) { Record record = recordSender.createRecord(); try { ResultSetMetaData metaData = rs.getMetaData(); for (int i = 1; i <= metaData.getColumnCount(); i++) { int columnType = metaData.getColumnType(i); switch (columnType) { case Types.SMALLINT: case Types.TINYINT: case Types.INTEGER: case Types.BIGINT: record.addColumn(new LongColumn(rs.getString(i))); break; case Types.FLOAT: case Types.DOUBLE: record.addColumn(new DoubleColumn(rs.getString(i))); break; case Types.BOOLEAN: record.addColumn(new BoolColumn(rs.getBoolean(i))); break; case Types.TIMESTAMP: record.addColumn(new DateColumn(rs.getTimestamp(i))); break; case Types.BINARY: record.addColumn(new BytesColumn(rs.getBytes(i))); break; case Types.NCHAR: String rawData; if (StringUtils.isBlank(mandatoryEncoding)) { rawData = rs.getString(i); } else { rawData = new String((rs.getBytes(i) == null ? new byte[0] : rs.getBytes(i)), mandatoryEncoding); } record.addColumn(new StringColumn(rawData)); break; } } } catch (SQLException e) { throw DataXException.asDataXException(TDengineReaderErrorCode.ILLEGAL_VALUE, "database query error!", e); } catch (UnsupportedEncodingException e) { throw DataXException.asDataXException(TDengineReaderErrorCode.ILLEGAL_VALUE, "illegal mandatoryEncoding", e); } return record; } } } ================================================ FILE: tdenginereader/src/main/java/com/alibaba/datax/plugin/reader/TDengineReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader; import com.alibaba.datax.common.spi.ErrorCode; public enum TDengineReaderErrorCode implements ErrorCode { REQUIRED_VALUE("TDengineReader-00", "parameter value is missing"), ILLEGAL_VALUE("TDengineReader-01", "invalid parameter value"), CONNECTION_FAILED("TDengineReader-02", "connection error"), RUNTIME_EXCEPTION("TDengineWriter-03", "runtime exception"); private final String code; private final String description; TDengineReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: tdenginereader/src/main/resources/plugin.json ================================================ { "name": "tdenginereader", "class": "com.alibaba.datax.plugin.reader.TDengineReader", "description": { "useScene": "data migration from tdengine", "mechanism": "use JDBC to read data from tdengine." }, "developer": "zyyang-taosdata" } ================================================ FILE: tdenginereader/src/main/resources/plugin_job_template.json ================================================ { "name": "tdenginereader", "parameter": { "user": "", "password": "", "connection": [ { "table": [ "" ], "jdbcUrl": [ "" ] } ], "column": [ "" ], "beginDateTime": "", "endDateTime": "", "where": "" } } ================================================ FILE: tdenginereader/src/test/java/com/alibaba/datax/plugin/reader/TDengine2DMTest.java ================================================ package com.alibaba.datax.plugin.reader; import com.alibaba.datax.core.Engine; import org.junit.Ignore; import org.junit.Test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Statement; import java.util.Random; @Ignore public class TDengine2DMTest { private static final String host1 = "192.168.56.105"; private static final String host2 = "192.168.0.72"; private final Random random = new Random(System.currentTimeMillis()); @Test public void t2dm_case01() throws Throwable { // given createSupTable("ms"); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2dm.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void t2dm_case02() throws Throwable { // given createSupTable("us"); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2dm.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void t2dm_case03() throws Throwable { // given createSupTable("ns"); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2dm.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } private void createSupTable(String precision) throws SQLException { final String url = "jdbc:TAOS-RS://" + host1 + ":6041/"; try (Connection conn = DriverManager.getConnection(url, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db1"); stmt.execute("create database if not exists db1 precision '" + precision + "'"); stmt.execute("create table db1.stb1(ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint, f5 float, " + "f6 double, f7 bool, f8 binary(100), f9 nchar(100)) tags(t1 timestamp, t2 tinyint, t3 smallint, " + "t4 int, t5 bigint, t6 float, t7 double, t8 bool, t9 binary(100), t10 nchar(100))"); for (int i = 1; i <= 10; i++) { stmt.execute("insert into db1.tb" + i + " using db1.stb1 tags(now, " + random.nextInt(10) + "," + random.nextInt(10) + "," + random.nextInt(10) + "," + random.nextInt(10) + "," + random.nextFloat() + "," + random.nextDouble() + "," + random.nextBoolean() + ",'abcABC123'," + "'北京朝阳望京') values(now+" + i + "s, " + random.nextInt(10) + "," + random.nextInt(10) + "," + +random.nextInt(10) + "," + random.nextInt(10) + "," + random.nextFloat() + "," + random.nextDouble() + "," + random.nextBoolean() + ",'abcABC123','北京朝阳望京')"); } stmt.close(); } final String url2 = "jdbc:dm://" + host2 + ":5236"; try (Connection conn = DriverManager.getConnection(url2, "TESTUSER", "test123456")) { conn.setAutoCommit(true); Statement stmt = conn.createStatement(); stmt.execute("drop table if exists stb2"); stmt.execute("create table stb2(ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint, f5 float, " + "f6 double, f7 BIT, f8 VARCHAR(100), f9 VARCHAR2(200), t1 timestamp, t2 tinyint, t3 smallint, " + "t4 int, t5 bigint, t6 float, t7 double, t8 BIT, t9 VARCHAR(100), t10 VARCHAR2(200))"); } } } ================================================ FILE: tdenginereader/src/test/java/com/alibaba/datax/plugin/reader/TDengine2StreamTest.java ================================================ package com.alibaba.datax.plugin.reader; import com.alibaba.datax.core.Engine; import org.junit.Ignore; import org.junit.Test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Statement; import java.util.Random; @Ignore public class TDengine2StreamTest { private static final String host = "192.168.56.105"; private static final Random random = new Random(System.currentTimeMillis()); @Test public void case01() throws Throwable { // given prepare("ms"); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2stream-1.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void case02() throws Throwable { // given prepare("ms"); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2stream-2.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } private void prepare(String precision) throws SQLException { final String url = "jdbc:TAOS-RS://" + host + ":6041/"; try (Connection conn = DriverManager.getConnection(url, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db1"); stmt.execute("create database if not exists db1 precision '" + precision + "'"); stmt.execute("create table db1.stb1(ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint, f5 float, " + "f6 double, f7 bool, f8 binary(100), f9 nchar(100)) tags(t1 timestamp, t2 tinyint, t3 smallint, " + "t4 int, t5 bigint, t6 float, t7 double, t8 bool, t9 binary(100), t10 nchar(100))"); for (int i = 1; i <= 10; i++) { stmt.execute("insert into db1.tb" + i + " using db1.stb1 tags(now, " + random.nextInt(10) + "," + random.nextInt(10) + "," + random.nextInt(10) + "," + random.nextInt(10) + "," + random.nextFloat() + "," + random.nextDouble() + "," + random.nextBoolean() + ",'abcABC123'," + "'北京朝阳望京') values(now+" + i + "s, " + random.nextInt(10) + "," + random.nextInt(10) + "," + +random.nextInt(10) + "," + random.nextInt(10) + "," + random.nextFloat() + "," + random.nextDouble() + "," + random.nextBoolean() + ",'abcABC123','北京朝阳望京')"); } stmt.close(); } } } ================================================ FILE: tdenginereader/src/test/java/com/alibaba/datax/plugin/reader/TDengineReaderTest.java ================================================ package com.alibaba.datax.plugin.reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.writer.tdenginewriter.Key; import org.junit.Assert; import org.junit.Test; import java.util.List; public class TDengineReaderTest { @Test public void jobInit_case01() { // given TDengineReader.Job job = new TDengineReader.Job(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"connection\": [{\"table\":[\"weather\"],\"jdbcUrl\":[\"jdbc:TAOS-RS://master:6041/test\"]}]," + "\"column\": [\"ts\",\"current\",\"voltage\",\"phase\"]," + "\"where\":\"_c0 > 0\"," + "\"beginDateTime\": \"2021-01-01 00:00:00\"," + "\"endDateTime\": \"2021-01-01 12:00:00\"" + "}"); job.setPluginJobConf(configuration); // when job.init(); // assert Configuration conf = job.getPluginJobConf(); Assert.assertEquals("root", conf.getString(Key.USERNAME)); Assert.assertEquals("taosdata", conf.getString("password")); Assert.assertEquals("weather", conf.getString("connection[0].table[0]")); Assert.assertEquals("jdbc:TAOS-RS://master:6041/test", conf.getString("connection[0].jdbcUrl[0]")); Assert.assertEquals("2021-01-01 00:00:00", conf.getString("beginDateTime")); Assert.assertEquals("2021-01-01 12:00:00", conf.getString("endDateTime")); Assert.assertEquals("_c0 > 0", conf.getString("where")); } @Test public void jobInit_case02() { // given TDengineReader.Job job = new TDengineReader.Job(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"connection\": [{\"querySql\":[\"select * from weather\"],\"jdbcUrl\":[\"jdbc:TAOS-RS://master:6041/test\"]}]," + "}"); job.setPluginJobConf(configuration); // when job.init(); // assert Configuration conf = job.getPluginJobConf(); Assert.assertEquals("root", conf.getString(Key.USERNAME)); Assert.assertEquals("taosdata", conf.getString("password")); Assert.assertEquals("jdbc:TAOS-RS://master:6041/test", conf.getString("connection[0].jdbcUrl[0]")); Assert.assertEquals("select * from weather", conf.getString("connection[0].querySql[0]")); } @Test public void jobSplit_case01() { // given TDengineReader.Job job = new TDengineReader.Job(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"connection\": [{\"table\":[\"weather\"],\"jdbcUrl\":[\"jdbc:TAOS-RS://master:6041/test\"]}]," + "\"column\": [\"ts\",\"current\",\"voltage\",\"phase\"]," + "\"where\":\"_c0 > 0\"," + "\"beginDateTime\": \"2021-01-01 00:00:00\"," + "\"endDateTime\": \"2021-01-01 12:00:00\"" + "}"); job.setPluginJobConf(configuration); // when job.init(); List configurationList = job.split(1); // assert Assert.assertEquals(1, configurationList.size()); Configuration conf = configurationList.get(0); Assert.assertEquals("root", conf.getString("username")); Assert.assertEquals("taosdata", conf.getString("password")); Assert.assertEquals("_c0 > 0", conf.getString("where")); Assert.assertEquals("weather", conf.getString("table[0]")); Assert.assertEquals("jdbc:TAOS-RS://master:6041/test", conf.getString("jdbcUrl")); } @Test public void jobSplit_case02() { // given TDengineReader.Job job = new TDengineReader.Job(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"connection\": [{\"querySql\":[\"select * from weather\"],\"jdbcUrl\":[\"jdbc:TAOS-RS://master:6041/test\"]}]," + "\"column\": [\"ts\",\"current\",\"voltage\",\"phase\"]," + "}"); job.setPluginJobConf(configuration); // when job.init(); List configurationList = job.split(1); // assert Assert.assertEquals(1, configurationList.size()); Configuration conf = configurationList.get(0); Assert.assertEquals("root", conf.getString("username")); Assert.assertEquals("taosdata", conf.getString("password")); Assert.assertEquals("select * from weather", conf.getString("querySql[0]")); Assert.assertEquals("jdbc:TAOS-RS://master:6041/test", conf.getString("jdbcUrl")); } @Test public void jobSplit_case03() { // given TDengineReader.Job job = new TDengineReader.Job(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"connection\": [{\"querySql\":[\"select * from weather\",\"select * from test.meters\"],\"jdbcUrl\":[\"jdbc:TAOS-RS://master:6041/test\", \"jdbc:TAOS://master:6030/test\"]}]," + "\"column\": [\"ts\",\"current\",\"voltage\",\"phase\"]," + "}"); job.setPluginJobConf(configuration); // when job.init(); List configurationList = job.split(1); // assert Assert.assertEquals(2, configurationList.size()); Configuration conf = configurationList.get(0); Assert.assertEquals("root", conf.getString("username")); Assert.assertEquals("taosdata", conf.getString("password")); Assert.assertEquals("select * from weather", conf.getString("querySql[0]")); Assert.assertEquals("jdbc:TAOS-RS://master:6041/test", conf.getString("jdbcUrl")); Configuration conf1 = configurationList.get(1); Assert.assertEquals("root", conf1.getString("username")); Assert.assertEquals("taosdata", conf1.getString("password")); Assert.assertEquals("select * from weather", conf1.getString("querySql[0]")); Assert.assertEquals("select * from test.meters", conf1.getString("querySql[1]")); Assert.assertEquals("jdbc:TAOS://master:6030/test", conf1.getString("jdbcUrl")); } } ================================================ FILE: tdenginereader/src/test/resources/t2dm.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "column": [ "*" ], "connection": [ { "table": [ "stb1" ], "jdbcUrl": [ "jdbc:TAOS-RS://192.168.56.105:6041/db1" ] } ] } }, "writer": { "name": "rdbmswriter", "parameter": { "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:dm://192.168.0.72:5236" } ], "username": "TESTUSER", "password": "test123456", "table": "stb2", "column": [ "*" ] } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginereader/src/test/resources/t2stream-1.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "f1", "f2", "t1", "t2" ], "connection": [ { "table": [ "stb1" ], "jdbcUrl": [ "jdbc:TAOS-RS://192.168.56.105:6041/db1" ] } ], "where": "t10 = '北京朝阳望京'", "beginDateTime": "2022-03-07 12:00:00", "endDateTime": "2022-03-07 19:00:00" } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginereader/src/test/resources/t2stream-2.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "connection": [ { "querySql": [ "select * from stb1 where t10 = '北京朝阳望京' and _c0 >= '2022-03-07 12:00:00' and _c0 < '2022-03-07 19:00:00'" ], "jdbcUrl": [ "jdbc:TAOS-RS://192.168.56.105:6041/db1" ] } ] } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/doc/tdenginewriter-CN.md ================================================ # DataX TDengineWriter 简体中文| [English](./tdenginewriter.md) ## 1 快速介绍 TDengineWriter插件实现了写入数据到TDengine数据库目标表的功能。底层实现上,TDengineWriter通过JDBC连接TDengine,按照TDengine的SQL语法,执行insert语句/schemaless语句,将数据写入TDengine。 TDengineWriter可以作为数据迁移工具供DBA将其它数据库的数据导入到TDengine。 ## 2 实现原理 TDengineWriter 通过 DataX 框架获取 Reader生成的协议数据,通过JDBC Driver连接TDengine,执行insert语句/schemaless语句,将数据写入TDengine。 在TDengine中,table可以分成超级表、子表、普通表三种类型,超级表和子表包括colum和tag,子表的tag列的值为固定值,普通表与关系型数据库中表的概念一致。(详细请参考:[数据模型](https://www.taosdata.com/docs/cn/v2.0/architecture#model) ) TDengineWriter支持向超级表、子表、普通表中写入数据,按照table的类型和column参数中是否包含tbname,使用以下方法进行写入: 1. table为超级表,column中指定tbname:使用自动建表的insert语句,使用tbname作为子表的名称。 2. table为超级表,column中未指定tbname:使用schemaless写入,TDengine会根据超级表名、tag值计算一个子表名称。 3. table为子表:使用insert语句写入,ignoreTagUnmatched参数为true时,忽略record中tag值与table的tag值不一致的数据。 4. table为普通表:使用insert语句写入。 ## 3 功能说明 ### 3.1 配置样例 配置一个写入TDengine的作业 先在TDengine上创建超级表: ```sql create database if not exists test; create table test.weather (ts timestamp, temperature int, humidity double) tags(is_normal bool, device_id binary(100), address nchar(100)); ``` 使用下面的Job配置,将数据写入TDengine: ```json { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "type": "string", "value": "tb1" }, { "type": "date", "value": "2022-02-20 12:00:01" }, { "type": "long", "random": "0, 10" }, { "type": "double", "random": "0, 10" }, { "type": "bool", "random": "0, 50" }, { "type": "bytes", "value": "abcABC123" }, { "type": "string", "value": "北京朝阳望京" } ], "sliceRecordCount": 1 } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "temperature", "humidity", "is_normal", "device_id", "address" ], "connection": [ { "table": [ "weather" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/test" } ], "batchSize": 100, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ``` ### 3.2 参数说明 * jdbcUrl * 描述:数据源的JDBC连接信息,TDengine的JDBC信息请参考:[Java连接器的使用](https://www.taosdata.com/docs/cn/v2.0/connector/java#url) * 必选:是 * 默认值:无 * username * 描述:用户名 * 必选:是 * 默认值:无 * password * 描述:用户名的密码 * 必选:是 * 默认值:无 * table * 描述:表名的集合,table应该包含column参数中的所有列(tbname除外)。注意,column中的tbname会被当作TDengine中子表名使用。 * 必选:是 * 默认值:无 * column * 描述:字段名的集合,字段的顺序应该与record中column的 * 必选:是 * 默认值:无 * batchSize * 描述:每batchSize条record为一个batch进行写入 * 必选:否 * 默认值:1 * ignoreTagsUnmatched * 描述:当table为TDengine中的一张子表,table具有tag值。如果数据的tag值与table的tag值不想等,数据不写入到table中。 * 必选:否 * 默认值:false ### 3.3 类型转换 datax中的数据类型,可以映射到TDengine的数据类型 | DataX 内部类型 | TDengine 数据类型 | | -------------- | ----------------------------------------- | | INT | TINYINT, SMALLINT, INT | | LONG | TIMESTAMP, TINYINT, SMALLINT, INT, BIGINT | | DOUBLE | FLOAT, DOUBLE | | STRING | TIMESTAMP, BINARY, NCHAR | | BOOL | BOOL | | DATE | TIMESTAMP | | BYTES | BINARY | ### 3.4 各数据源到TDengine的参考示例 下面是一些数据源到TDengine进行数据迁移的示例 | 数据迁移示例 | 配置的示例 | | ------------------ | ------------------------------------------------------------ | | TDengine到TDengine | [超级表到超级表,指定tbname](../src/test/resources/t2t-1.json) | | TDengine到TDengine | [超级表到超级表,不指定tbname](../src/test/resources/t2t-2.json) | | TDengine到TDengine | [超级表到子表](../src/test/resources/t2t-3.json) | | TDengine到TDengine | [普通表到普通表](../src/test/resources/t2t-4.json) | | RDBMS到TDengine | [普通表到超级表,指定tbname](../src/test/resources/dm2t-1.json) | | RDBMS到TDengine | [普通表到超级表,不指定tbname](../src/test/resources/dm2t-3.json) | | RDBMS到TDengine | [普通表到子表](../src/test/resources/dm2t-2.json) | | RDBMS到TDengine | [普通表到普通表](../src/test/resources/dm2t-4.json) | | OpenTSDB到TDengine | [metric到普通表](../src/test/resources/o2t-1.json) | ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 建表语句: 单行记录类似于: #### 4.1.2 机器参数 * 执行DataX的机器参数为: 1. cpu: 2. mem: 3. net: 千兆双网卡 4. disc: DataX 数据不落磁盘,不统计此项 * TDengine数据库机器参数为: 1. cpu: 2. mem: 3. net: 千兆双网卡 4. disc: #### 4.1.3 DataX jvm 参数 -Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError ### 4.2 测试报告 #### 4.2.1 单表测试报告 | 通道数 | DataX速度(Rec/s) | DataX流量(MB/s) | DataX机器网卡流出流量(MB/s) | DataX机器运行负载 | DB网卡进入流量(MB/s) | DB运行负载 | DB TPS | | ------ | ---------------- | --------------- | --------------------------- | ----------------- | -------------------- | ---------- | ------ | | 1 | | | | | | | | | 4 | | | | | | | | | 8 | | | | | | | | | 16 | | | | | | | | | 32 | | | | | | | | 说明: 1. #### 4.2.4 性能测试小结 ## 5 约束限制 1. ## FAQ ### 源表和目标表的字段顺序一致吗? 是的,TDengineWriter按照column中字段的顺序解析来自datax的数据。 ================================================ FILE: tdenginewriter/doc/tdenginewriter.md ================================================ # DataX TDengineWriter [简体中文](./tdenginewriter-CN.md) | English ## 1 Quick Introduction The TDengineWriter plugin enables writing data to the target table of the TDengine database. At the bottom level, TDengineWriter connects TDengine through JDBC, executes insert statement /schemaless statement according to TDengine SQL syntax, and writes data to TDengine. TDengineWriter can be used as a data migration tool for DBAs to import data from other databases into TDengine. ## 2 Implementation TDengineWriter obtains the protocol data generated by Reader through DataX framework, connects to TDengine through JDBC Driver, executes insert statement /schemaless statement, and writes the data to TDengine. In TDengine, table can be divided into super table, sub-table and ordinary table. Super table and sub-table include Column and Tag. The value of tag column of sub-table is fixed value. (details please refer to: [data model](https://www.taosdata.com/docs/cn/v2.0/architecture#model)) The TDengineWriter can write data to super tables, sub-tables, and ordinary tables using the following methods based on the type of the table and whether the column parameter contains TBName: 1. Table is a super table and column specifies tbname: use the automatic insert statement to create the table and use tbname as the name of the sub-table. 2. Table is a super table and column does not contain tbname: use schemaless to write the table. TDengine will auto-create a tbname based on the super table name and tag value. 3. Table is a sub-table: Use insert statement to write, ignoreTagUnmatched parameter is true, ignore data in record whose tag value is inconsistent with that of table. 4. Table is a common table: use insert statement to write data. ## 3 Features Introduction ### 3.1 Sample Configure a job to write to TDengine Create a supertable on TDengine: ```sql create database if not exists test; create table test.weather (ts timestamp, temperature int, humidity double) tags(is_normal bool, device_id binary(100), address nchar(100)); ``` Write data to TDengine using the following Job configuration: ```json { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "type": "string", "value": "tb1" }, { "type": "date", "value": "2022-02-20 12:00:01" }, { "type": "long", "random": "0, 10" }, { "type": "double", "random": "0, 10" }, { "type": "bool", "random": "0, 50" }, { "type": "bytes", "value": "abcABC123" }, { "type": "string", "value": "北京朝阳望京" } ], "sliceRecordCount": 1 } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "temperature", "humidity", "is_normal", "device_id", "address" ], "connection": [ { "table": [ "weather" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/test" } ], "batchSize": 100, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ``` ### 3.2 Configuration * jdbcUrl * Descrption: Data source JDBC connection information, TDengine JDBC information please refer to: [Java connector](https://www.taosdata.com/docs/cn/v2.0/connector/java#url) * Required: yes * Default: none * username * Descrption: username * Required: yes * Default: none * password * Descrption: password of username * Required: yes * Default: none * table * Descrption: A list of table names that should contain all of the columns in the column parameter (except tbname). Note that tbname in column is used as the TDengine sub-table name. * Required: yes * Default: none * column * Descrption: A list of field names, the order of the fields should be the column in the record * Required: yes * Default: none * batchSize * Descrption: Each batchSize record is written to a batch * Required: no * Default: 1 * ignoreTagsUnmatched * Descrption: When table is a sub-table in TDengine, table has a tag value. If the tag value of the data and the tag value of the table are not equal, the data is not written to the table. * Required: no * Default: false #### 3.3 Type Convert Data types in datax that can be mapped to data types in TDengine | DataX Type | TDengine Type | | ---------- | ----------------------------------------- | | INT | TINYINT, SMALLINT, INT | | LONG | TIMESTAMP, TINYINT, SMALLINT, INT, BIGINT | | DOUBLE | FLOAT, DOUBLE | | STRING | TIMESTAMP, BINARY, NCHAR | | BOOL | BOOL | | DATE | TIMESTAMP | | BYTES | BINARY | ### 3.2 From MongoDB to TDengine Here are some examples of data sources migrating to TDengine | Sample | Configuration | | -------------------- | ------------------------------------------------------------ | | TDengine to TDengine | [super table to super table with tbname](../src/test/resources/t2t-1.json) | | TDengine to TDengine | [super table to super table without tbname](../src/test/resources/t2t-2.json) | | TDengine to TDengine | [super table to sub-table](../src/test/resources/t2t-3.json) | | TDengine to TDengine | [table to table](../src/test/resources/t2t-4.json) | | RDBMS to TDengine | [table to super table with tbname](../src/test/resources/dm2t-1.json) | | RDBMS to TDengine | [table to super table without tbname](../src/test/resources/dm2t-2.json) | | RDBMS to TDengine | [table to sub-table](../src/test/resources/dm2t-3.json) | | RDBMS to TDengine | [table to table](../src/test/resources/dm2t-4.json) | | OpenTSDB to TDengine | [metric to table](../src/test/resources/o2t-1.json) | ## 4 Restriction ## FAQ ### Do columns in source table and columns in target table must be in the same order? Yes, TDengineWriter parses the data from the Datax in the order of the fields in the column. ================================================ FILE: tdenginewriter/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 com.alibaba.datax.tdenginewriter tdenginewriter 0.0.1-SNAPSHOT 8 8 com.taosdata.jdbc taos-jdbcdriver 2.0.39 org.apache.commons commons-lang3 ${commons-lang3-version} com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j junit junit ${junit-version} test com.alibaba.datax datax-core 0.0.1-SNAPSHOT test mysql mysql-connector-java 5.1.49 test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single org.apache.maven.plugins maven-surefire-plugin 2.12.4 **/*Test.java true ================================================ FILE: tdenginewriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/tdenginewriter target/ tdenginewriter-0.0.1-SNAPSHOT.jar plugin/writer/tdenginewriter false plugin/writer/tdenginewriter/libs runtime ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/ColumnMeta.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; public class ColumnMeta { String field; String type; int length; String note; boolean isTag; boolean isPrimaryKey; Object value; @Override public String toString() { return "ColumnMeta{" + "field='" + field + '\'' + ", type='" + type + '\'' + ", length=" + length + ", note='" + note + '\'' + ", isTag=" + isTag + ", isPrimaryKey=" + isPrimaryKey + ", value=" + value + '}'; } } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/Constants.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; public class Constants { public static final String DEFAULT_USERNAME = "root"; public static final String DEFAULT_PASSWORD = "taosdata"; public static final int DEFAULT_BATCH_SIZE = 1; public static final boolean DEFAULT_IGNORE_TAGS_UNMATCHED = false; } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/DataHandler.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; public interface DataHandler { int handle(RecordReceiver lineReceiver, TaskPluginCollector collector); } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/DefaultDataHandler.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.taosdata.jdbc.SchemalessWriter; import com.taosdata.jdbc.enums.SchemalessProtocolType; import com.taosdata.jdbc.enums.SchemalessTimestampType; import com.taosdata.jdbc.utils.Utils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.*; import java.util.Date; import java.util.stream.Collectors; import java.util.stream.IntStream; public class DefaultDataHandler implements DataHandler { private static final Logger LOG = LoggerFactory.getLogger(DefaultDataHandler.class); static { try { Class.forName("com.taosdata.jdbc.TSDBDriver"); Class.forName("com.taosdata.jdbc.rs.RestfulDriver"); } catch (ClassNotFoundException e) { LOG.error(e.getMessage(), e); } } private final TaskPluginCollector taskPluginCollector; private String username; private String password; private String jdbcUrl; private int batchSize; private boolean ignoreTagsUnmatched; private List tables; private List columns; private Map tableMetas; private SchemaManager schemaManager; public void setTableMetas(Map tableMetas) { this.tableMetas = tableMetas; } public void setTbnameColumnMetasMap(Map> tbnameColumnMetasMap) { this.tbnameColumnMetasMap = tbnameColumnMetasMap; } public void setSchemaManager(SchemaManager schemaManager) { this.schemaManager = schemaManager; } private Map> tbnameColumnMetasMap; public DefaultDataHandler(Configuration configuration, TaskPluginCollector taskPluginCollector) { this.username = configuration.getString(Key.USERNAME, Constants.DEFAULT_USERNAME); this.password = configuration.getString(Key.PASSWORD, Constants.DEFAULT_PASSWORD); this.jdbcUrl = configuration.getString(Key.JDBC_URL); this.batchSize = configuration.getInt(Key.BATCH_SIZE, Constants.DEFAULT_BATCH_SIZE); this.tables = configuration.getList(Key.TABLE, String.class); this.columns = configuration.getList(Key.COLUMN, String.class); this.ignoreTagsUnmatched = configuration.getBool(Key.IGNORE_TAGS_UNMATCHED, Constants.DEFAULT_IGNORE_TAGS_UNMATCHED); this.taskPluginCollector = taskPluginCollector; } @Override public int handle(RecordReceiver lineReceiver, TaskPluginCollector collector) { int count = 0; int affectedRows = 0; try (Connection conn = DriverManager.getConnection(jdbcUrl, username, password)) { LOG.info("connection[ jdbcUrl: " + jdbcUrl + ", username: " + username + "] established."); // prepare table_name -> table_meta this.schemaManager = new SchemaManager(conn); this.tableMetas = schemaManager.loadTableMeta(tables); // prepare table_name -> column_meta this.tbnameColumnMetasMap = schemaManager.loadColumnMetas(tables); List recordBatch = new ArrayList<>(); Record record; for (int i = 1; (record = lineReceiver.getFromReader()) != null; i++) { if (i % batchSize != 0) { recordBatch.add(record); } else { try { recordBatch.add(record); affectedRows += writeBatch(conn, recordBatch); } catch (SQLException e) { LOG.warn("use one row insert. because:" + e.getMessage()); affectedRows += writeEachRow(conn, recordBatch); } recordBatch.clear(); } count++; } if (!recordBatch.isEmpty()) { try { affectedRows += writeBatch(conn, recordBatch); } catch (SQLException e) { LOG.warn("use one row insert. because:" + e.getMessage()); affectedRows += writeEachRow(conn, recordBatch); } recordBatch.clear(); } } catch (SQLException e) { throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, e.getMessage()); } if (affectedRows != count) { LOG.error("write record missing or incorrect happened, affectedRows: " + affectedRows + ", total: " + count); } return affectedRows; } private int writeEachRow(Connection conn, List recordBatch) { int affectedRows = 0; for (Record record : recordBatch) { List recordList = new ArrayList<>(); recordList.add(record); try { affectedRows += writeBatch(conn, recordList); } catch (SQLException e) { LOG.error(e.getMessage()); this.taskPluginCollector.collectDirtyRecord(record, e); } } return affectedRows; } /** * table: [ "stb1", "stb2", "tb1", "tb2", "t1" ] * stb1[ts,f1,f2] tags:[t1] * stb2[ts,f1,f2,f3] tags:[t1,t2] * 1. tables 表的的类型分成:stb(super table)/tb(sub table)/t(original table) * 2. 对于stb,自动建表/schemaless * 2.1: data中有tbname字段, 例如:data: [ts, f1, f2, f3, t1, t2, tbname] tbColumn: [ts, f1, f2, t1] => insert into tbname using stb1 tags(t1) values(ts, f1, f2) * 2.2: data中没有tbname字段,例如:data: [ts, f1, f2, f3, t1, t2] tbColumn: [ts, f1, f2, t1] => schemaless: stb1,t1=t1 f1=f1,f2=f2 ts, 没有批量写 * 3. 对于tb,拼sql,例如:data: [ts, f1, f2, f3, t1, t2] tbColumn: [ts, f1, f2, t1] => insert into tb(ts, f1, f2) values(ts, f1, f2) * 4. 对于t,拼sql,例如:data: [ts, f1, f2, f3, t1, t2] tbColumn: [ts, f1, f2, f3, t1, t2] insert into t(ts, f1, f2, f3, t1, t2) values(ts, f1, f2, f3, t1, t2) */ public int writeBatch(Connection conn, List recordBatch) throws SQLException { int affectedRows = 0; for (String table : tables) { TableMeta tableMeta = tableMetas.get(table); switch (tableMeta.tableType) { case SUP_TABLE: { if (columns.contains("tbname")) { affectedRows += writeBatchToSupTableBySQL(conn, table, recordBatch); } else { Map tag2Tbname = schemaManager.loadTagTableNameMap(table); affectedRows += writeBatchToSupTableWithoutTbname(conn, table, recordBatch, tag2Tbname); } } break; case SUB_TABLE: affectedRows += writeBatchToSubTable(conn, table, recordBatch); break; case NML_TABLE: default: affectedRows += writeBatchToNormalTable(conn, table, recordBatch); } } return affectedRows; } private int writeBatchToSupTableWithoutTbname(Connection conn, String table, List recordBatch, Map tag2Tbname) throws SQLException { List columnMetas = tbnameColumnMetasMap.get(table); List subTableExist = filterSubTableExistRecords(recordBatch, columnMetas, tag2Tbname); List subTableNotExist = filterSubTableNotExistRecords(recordBatch, columnMetas, tag2Tbname); int affectedRows = 0; Map> subTableRecordsMap = splitRecords(subTableExist, columnMetas, tag2Tbname); List subTables = new ArrayList<>(subTableRecordsMap.keySet()); this.tbnameColumnMetasMap.putAll(schemaManager.loadColumnMetas(subTables)); for (String subTable : subTableRecordsMap.keySet()) { List subTableRecords = subTableRecordsMap.get(subTable); affectedRows += writeBatchToNormalTable(conn, subTable, subTableRecords); } if (!subTableNotExist.isEmpty()) affectedRows += writeBatchToSupTableBySchemaless(conn, table, subTableNotExist); return affectedRows; } private List filterSubTableExistRecords(List recordBatch, List columnMetas, Map tag2Tbname) { return recordBatch.stream().filter(record -> { String tagStr = getTagString(columnMetas, record); return tag2Tbname.containsKey(tagStr); }).collect(Collectors.toList()); } private List filterSubTableNotExistRecords(List recordBatch, List columnMetas, Map tag2Tbname) { return recordBatch.stream().filter(record -> { String tagStr = getTagString(columnMetas, record); return !tag2Tbname.containsKey(tagStr); }).collect(Collectors.toList()); } private Map> splitRecords(List subTableExist, List columnMetas, Map tag2Tbname) { Map> ret = new HashMap<>(); for (Record record : subTableExist) { String tagstr = getTagString(columnMetas, record); String tbname = tag2Tbname.get(tagstr); if (ret.containsKey(tbname)) { ret.get(tbname).add(record); } else { List list = new ArrayList<>(); list.add(record); ret.put(tbname, list); } } return ret; } private String getTagString(List columnMetas, Record record) { return IntStream.range(0, columnMetas.size()).mapToObj(colIndex -> { ColumnMeta columnMeta = columnMetas.get(colIndex); if (columnMeta.isTag) { Column column = record.getColumn(colIndex); switch (columnMeta.type) { case "TINYINT": case "SMALLINT": case "INT": case "BIGINT": return column.asLong().toString(); default: return column.asString(); } } return ""; }).collect(Collectors.joining()); } /** * insert into record[idx(tbname)] using table tags(record[idx(t1)]) (ts, f1, f2, f3) values(record[idx(ts)], record[idx(f1)], ) * record[idx(tbname)] using table tags(record[idx(t1)]) (ts, f1, f2, f3) values(record[idx(ts)], record[idx(f1)], ) * record[idx(tbname)] using table tags(record[idx(t1)]) (ts, f1, f2, f3) values(record[idx(ts)], record[idx(f1)], ) */ private int writeBatchToSupTableBySQL(Connection conn, String table, List recordBatch) throws SQLException { List columnMetas = this.tbnameColumnMetasMap.get(table); StringBuilder sb = new StringBuilder("insert into"); for (Record record : recordBatch) { sb.append(" ").append(record.getColumn(indexOf("tbname")).asString()) .append(" using ").append(table) .append(" tags") .append(columnMetas.stream().filter(colMeta -> columns.contains(colMeta.field)).filter(colMeta -> { return colMeta.isTag; }).map(colMeta -> { return buildColumnValue(colMeta, record); }).collect(Collectors.joining(",", "(", ")"))) .append(" ") .append(columnMetas.stream().filter(colMeta -> columns.contains(colMeta.field)).filter(colMeta -> { return !colMeta.isTag; }).map(colMeta -> { return colMeta.field; }).collect(Collectors.joining(",", "(", ")"))) .append(" values") .append(columnMetas.stream().filter(colMeta -> columns.contains(colMeta.field)).filter(colMeta -> { return !colMeta.isTag; }).map(colMeta -> { return buildColumnValue(colMeta, record); }).collect(Collectors.joining(",", "(", ")"))); } String sql = sb.toString(); return executeUpdate(conn, sql); } private int executeUpdate(Connection conn, String sql) throws SQLException { int count; try (Statement stmt = conn.createStatement()) { LOG.debug(">>> " + sql); count = stmt.executeUpdate(sql); } return count; } private String buildColumnValue(ColumnMeta colMeta, Record record) { Column column = record.getColumn(indexOf(colMeta.field)); TimestampPrecision timestampPrecision = schemaManager.loadDatabasePrecision(); switch (column.getType()) { case DATE: { Date value = column.asDate(); switch (timestampPrecision) { case MILLISEC: return "" + (value.getTime()); case MICROSEC: return "" + (value.getTime() * 1000); case NANOSEC: return "" + (value.getTime() * 1000_000); default: return "'" + column.asString() + "'"; } } case BYTES: case STRING: if (colMeta.type.equals("TIMESTAMP")) return "\"" + column.asString() + "\""; String value = column.asString(); if (value == null) return "NULL"; return "\'" + Utils.escapeSingleQuota(value) + "\'"; case NULL: case BAD: return "NULL"; case BOOL: case DOUBLE: case INT: case LONG: default: return column.asString(); } } /** * table: ["stb1"], column: ["ts", "f1", "f2", "t1"] * data: [ts, f1, f2, f3, t1, t2] tbColumn: [ts, f1, f2, t1] => schemaless: stb1,t1=t1 f1=f1,f2=f2 ts */ private int writeBatchToSupTableBySchemaless(Connection conn, String table, List recordBatch) throws SQLException { int count = 0; TimestampPrecision timestampPrecision = schemaManager.loadDatabasePrecision(); List columnMetaList = this.tbnameColumnMetasMap.get(table); ColumnMeta ts = columnMetaList.stream().filter(colMeta -> colMeta.isPrimaryKey).findFirst().get(); List lines = new ArrayList<>(); for (Record record : recordBatch) { StringBuilder sb = new StringBuilder(); sb.append(table).append(",") .append(columnMetaList.stream().filter(colMeta -> columns.contains(colMeta.field)).filter(colMeta -> { return colMeta.isTag; }).map(colMeta -> { String value = record.getColumn(indexOf(colMeta.field)).asString(); if (value.contains(" ")) value = value.replace(" ", "\\ "); return colMeta.field + "=" + value; }).collect(Collectors.joining(","))) .append(" ") .append(columnMetaList.stream().filter(colMeta -> columns.contains(colMeta.field)).filter(colMeta -> { return !colMeta.isTag && !colMeta.isPrimaryKey; }).map(colMeta -> { return colMeta.field + "=" + buildSchemalessColumnValue(colMeta, record); // return colMeta.field + "=" + record.getColumn(indexOf(colMeta.field)).asString(); }).collect(Collectors.joining(","))) .append(" "); // timestamp Column column = record.getColumn(indexOf(ts.field)); Object tsValue = column.getRawData(); if (column.getType() == Column.Type.DATE && tsValue instanceof Date) { long time = column.asDate().getTime(); switch (timestampPrecision) { case NANOSEC: sb.append(time * 1000000); break; case MICROSEC: sb.append(time * 1000); break; case MILLISEC: default: sb.append(time); } } else if (column.getType() == Column.Type.STRING) { sb.append(Utils.parseTimestamp(column.asString())); } else { sb.append(column.asLong()); } String line = sb.toString(); LOG.debug(">>> " + line); lines.add(line); count++; } SchemalessWriter writer = new SchemalessWriter(conn); SchemalessTimestampType timestampType; switch (timestampPrecision) { case NANOSEC: timestampType = SchemalessTimestampType.NANO_SECONDS; break; case MICROSEC: timestampType = SchemalessTimestampType.MICRO_SECONDS; break; case MILLISEC: timestampType = SchemalessTimestampType.MILLI_SECONDS; break; default: timestampType = SchemalessTimestampType.NOT_CONFIGURED; } writer.write(lines, SchemalessProtocolType.LINE, timestampType); LOG.warn("schemalessWriter does not return affected rows!"); return count; } private long dateAsLong(Column column) { TimestampPrecision timestampPrecision = schemaManager.loadDatabasePrecision(); long time = column.asDate().getTime(); switch (timestampPrecision) { case NANOSEC: return time * 1000000; case MICROSEC: return time * 1000; case MILLISEC: default: return time; } } private String buildSchemalessColumnValue(ColumnMeta colMeta, Record record) { Column column = record.getColumn(indexOf(colMeta.field)); switch (column.getType()) { case DATE: if (colMeta.type.equals("TIMESTAMP")) return dateAsLong(column) + "i64"; return "L'" + column.asString() + "'"; case NULL: case BAD: return "NULL"; case DOUBLE: { if (colMeta.type.equals("FLOAT")) return column.asString() + "f32"; if (colMeta.type.equals("DOUBLE")) return column.asString() + "f64"; } case INT: case LONG: { if (colMeta.type.equals("TINYINT")) return column.asString() + "i8"; if (colMeta.type.equals("SMALLINT")) return column.asString() + "i16"; if (colMeta.type.equals("INT")) return column.asString() + "i32"; if (colMeta.type.equals("BIGINT")) return column.asString() + "i64"; } case BYTES: case STRING: if (colMeta.type.equals("TIMESTAMP")) return column.asString() + "i64"; String value = column.asString(); value = value.replace("\"", "\\\""); if (colMeta.type.startsWith("BINARY")) return "\"" + value + "\""; if (colMeta.type.startsWith("NCHAR")) return "L\"" + value + "\""; case BOOL: default: return column.asString(); } } /** * table: ["tb1"], column: [tbname, ts, f1, f2, t1] * if contains("tbname") and tbname != tb1 continue; * else if t1 != record[idx(t1)] or t2 != record[idx(t2)]... continue; * else * insert into tb1 (ts, f1, f2) values( record[idx(ts)], record[idx(f1)], record[idx(f2)]) */ private int writeBatchToSubTable(Connection conn, String table, List recordBatch) throws SQLException { List columnMetas = this.tbnameColumnMetasMap.get(table); StringBuilder sb = new StringBuilder(); sb.append("insert into ").append(table).append(" ") .append(columnMetas.stream().filter(colMeta -> columns.contains(colMeta.field)).filter(colMeta -> { return !colMeta.isTag; }).map(colMeta -> { return colMeta.field; }).collect(Collectors.joining(",", "(", ")"))) .append(" values"); int validRecords = 0; for (Record record : recordBatch) { if (columns.contains("tbname") && !table.equals(record.getColumn(indexOf("tbname")).asString())) continue; boolean tagsAllMatch = columnMetas.stream().filter(colMeta -> columns.contains(colMeta.field)).filter(colMeta -> { return colMeta.isTag; }).allMatch(colMeta -> { Column column = record.getColumn(indexOf(colMeta.field)); boolean equals = equals(column, colMeta); return equals; }); if (ignoreTagsUnmatched && !tagsAllMatch) continue; sb.append(columnMetas.stream().filter(colMeta -> columns.contains(colMeta.field)).filter(colMeta -> { return !colMeta.isTag; }).map(colMeta -> { return buildColumnValue(colMeta, record); }).collect(Collectors.joining(", ", "(", ") "))); validRecords++; } if (validRecords == 0) { LOG.warn("no valid records in this batch"); return 0; } String sql = sb.toString(); return executeUpdate(conn, sql); } private boolean equals(Column column, ColumnMeta colMeta) { switch (column.getType()) { case BOOL: return column.asBoolean().equals(Boolean.valueOf(colMeta.value.toString())); case INT: case LONG: return column.asLong().equals(Long.valueOf(colMeta.value.toString())); case DOUBLE: return column.asDouble().equals(Double.valueOf(colMeta.value.toString())); case NULL: return colMeta.value == null; case DATE: return column.asDate().getTime() == ((Timestamp) colMeta.value).getTime(); case BAD: case BYTES: return Arrays.equals(column.asBytes(), (byte[]) colMeta.value); case STRING: default: return column.asString().equals(colMeta.value.toString()); } } /** * table: ["weather"], column: ["ts, f1, f2, f3, t1, t2"] * sql: insert into weather (ts, f1, f2, f3, t1, t2) values( record[idx(ts), record[idx(f1)], ...) */ private int writeBatchToNormalTable(Connection conn, String table, List recordBatch) throws SQLException { List columnMetas = this.tbnameColumnMetasMap.get(table); StringBuilder sb = new StringBuilder(); sb.append("insert into ").append(table) .append(" ") .append(columnMetas.stream().filter(colMeta -> !colMeta.isTag).filter(colMeta -> columns.contains(colMeta.field)).map(colMeta -> { return colMeta.field; }).collect(Collectors.joining(",", "(", ")"))) .append(" values "); for (Record record : recordBatch) { sb.append(columnMetas.stream().filter(colMeta -> !colMeta.isTag).filter(colMeta -> columns.contains(colMeta.field)).map(colMeta -> { return buildColumnValue(colMeta, record); }).collect(Collectors.joining(",", "(", ")"))); } String sql = sb.toString(); return executeUpdate(conn, sql); } private int indexOf(String colName) throws DataXException { for (int i = 0; i < columns.size(); i++) { if (columns.get(i).equals(colName)) return i; } throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, "cannot find col: " + colName + " in columns: " + columns); } } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; public class Key { public static final String USERNAME = "username"; public static final String PASSWORD = "password"; public static final String CONNECTION = "connection"; public static final String BATCH_SIZE = "batchSize"; public static final String TABLE = "table"; public static final String JDBC_URL = "jdbcUrl"; public static final String COLUMN = "column"; public static final String IGNORE_TAGS_UNMATCHED = "ignoreTagsUnmatched"; public static final String BEGIN_DATETIME = "beginDateTime"; public static final String END_DATETIME = "endDateTime"; public static final String WHERE = "where"; public static final String QUERY_SQL = "querySql"; public static final String MANDATORY_ENCODING = "mandatoryEncoding"; } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/OpentsdbDataHandler.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.taosdata.jdbc.SchemalessWriter; import com.taosdata.jdbc.enums.SchemalessProtocolType; import com.taosdata.jdbc.enums.SchemalessTimestampType; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; public class OpentsdbDataHandler implements DataHandler { private static final Logger LOG = LoggerFactory.getLogger(OpentsdbDataHandler.class); private SchemalessWriter writer; private String jdbcUrl; private String user; private String password; int batchSize; public OpentsdbDataHandler(Configuration config) { // opentsdb json protocol use JNI and schemaless API to write this.jdbcUrl = config.getString(Key.JDBC_URL); this.user = config.getString(Key.USERNAME, "root"); this.password = config.getString(Key.PASSWORD, "taosdata"); this.batchSize = config.getInt(Key.BATCH_SIZE, Constants.DEFAULT_BATCH_SIZE); } @Override public int handle(RecordReceiver lineReceiver, TaskPluginCollector collector) { int count = 0; try (Connection conn = DriverManager.getConnection(jdbcUrl, user, password);) { LOG.info("connection[ jdbcUrl: " + jdbcUrl + ", username: " + user + "] established."); writer = new SchemalessWriter(conn); count = write(lineReceiver, batchSize); } catch (Exception e) { throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, e); } return count; } private int write(RecordReceiver lineReceiver, int batchSize) throws DataXException { int recordIndex = 1; try { Record record; StringBuilder sb = new StringBuilder(); while ((record = lineReceiver.getFromReader()) != null) { if (batchSize == 1) { String jsonData = recordToString(record); LOG.debug(">>> " + jsonData); writer.write(jsonData, SchemalessProtocolType.JSON, SchemalessTimestampType.NOT_CONFIGURED); } else if (recordIndex % batchSize == 1) { sb.append("[").append(recordToString(record)).append(","); } else if (recordIndex % batchSize == 0) { sb.append(recordToString(record)).append("]"); String jsonData = sb.toString(); LOG.debug(">>> " + jsonData); writer.write(jsonData, SchemalessProtocolType.JSON, SchemalessTimestampType.NOT_CONFIGURED); sb.delete(0, sb.length()); } else { sb.append(recordToString(record)).append(","); } recordIndex++; } if (sb.length() != 0 && sb.charAt(0) == '[') { String jsonData = sb.deleteCharAt(sb.length() - 1).append("]").toString(); System.err.println(jsonData); LOG.debug(">>> " + jsonData); writer.write(jsonData, SchemalessProtocolType.JSON, SchemalessTimestampType.NOT_CONFIGURED); } } catch (Exception e) { throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, e); } return recordIndex - 1; } private String recordToString(Record record) { int recordLength = record.getColumnNumber(); if (0 == recordLength) { return ""; } Column column; StringBuilder sb = new StringBuilder(); for (int i = 0; i < recordLength; i++) { column = record.getColumn(i); sb.append(column.asString()).append("\t"); } sb.setLength(sb.length() - 1); return sb.toString(); } } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/SchemaManager.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.common.exception.DataXException; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.*; import java.util.stream.Collectors; public class SchemaManager { private static final Logger LOG = LoggerFactory.getLogger(SchemaManager.class); // private static final String TAG_TABLE_NAME_MAP_KEY_SPLITTER = "_"; private static final String TAG_TABLE_NAME_MAP_KEY_SPLITTER = ""; private final Connection conn; private TimestampPrecision precision; private Map> tags2tbnameMaps = new HashMap<>(); public SchemaManager(Connection conn) { this.conn = conn; } public TimestampPrecision loadDatabasePrecision() throws DataXException { if (this.precision != null) return this.precision; try (Statement stmt = conn.createStatement()) { ResultSet rs = stmt.executeQuery("select database()"); String dbname = null; while (rs.next()) { dbname = rs.getString("database()"); } if (dbname == null) throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, "Database not specified or available"); rs = stmt.executeQuery("show databases"); while (rs.next()) { String name = rs.getString("name"); if (!name.equalsIgnoreCase(dbname)) continue; String precision = rs.getString("precision"); switch (precision) { case "ns": this.precision = TimestampPrecision.NANOSEC; break; case "us": this.precision = TimestampPrecision.MICROSEC; break; case "ms": default: this.precision = TimestampPrecision.MILLISEC; } } } catch (SQLException e) { throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, e.getMessage()); } return this.precision; } public Map loadTableMeta(List tables) throws DataXException { Map tableMetas = new HashMap(); try (Statement stmt = conn.createStatement()) { ResultSet rs = stmt.executeQuery("show stables"); while (rs.next()) { TableMeta tableMeta = buildSupTableMeta(rs); if (!tables.contains(tableMeta.tbname)) continue; tableMetas.put(tableMeta.tbname, tableMeta); } rs = stmt.executeQuery("show tables"); while (rs.next()) { TableMeta tableMeta = buildSubTableMeta(rs); if (!tables.contains(tableMeta.tbname)) continue; tableMetas.put(tableMeta.tbname, tableMeta); } for (String tbname : tables) { if (!tableMetas.containsKey(tbname)) { throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, "table metadata of " + tbname + " is empty!"); } } } catch (SQLException e) { throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, e.getMessage()); } return tableMetas; } public Map> loadColumnMetas(List tables) throws DataXException { Map> ret = new HashMap<>(); for (String table : tables) { List columnMetaList = new ArrayList<>(); try (Statement stmt = conn.createStatement()) { ResultSet rs = stmt.executeQuery("describe " + table); for (int i = 0; rs.next(); i++) { ColumnMeta columnMeta = buildColumnMeta(rs, i == 0); columnMetaList.add(columnMeta); } } catch (SQLException e) { throw DataXException.asDataXException(TDengineWriterErrorCode.RUNTIME_EXCEPTION, e.getMessage()); } if (columnMetaList.isEmpty()) { LOG.error("column metadata of " + table + " is empty!"); continue; } columnMetaList.stream().filter(colMeta -> colMeta.isTag).forEach(colMeta -> { String sql = "select " + colMeta.field + " from " + table; Object value = null; try (Statement stmt = conn.createStatement()) { ResultSet rs = stmt.executeQuery(sql); for (int i = 0; rs.next(); i++) { value = rs.getObject(colMeta.field); if (i > 0) { value = null; break; } } } catch (SQLException e) { e.printStackTrace(); } colMeta.value = value; }); LOG.debug("load column metadata of " + table + ": " + Arrays.toString(columnMetaList.toArray())); ret.put(table, columnMetaList); } return ret; } private TableMeta buildSupTableMeta(ResultSet rs) throws SQLException { TableMeta tableMeta = new TableMeta(); tableMeta.tableType = TableType.SUP_TABLE; tableMeta.tbname = rs.getString("name"); tableMeta.columns = rs.getInt("columns"); tableMeta.tags = rs.getInt("tags"); tableMeta.tables = rs.getInt("tables"); LOG.debug("load table metadata of " + tableMeta.tbname + ": " + tableMeta); return tableMeta; } private TableMeta buildSubTableMeta(ResultSet rs) throws SQLException { TableMeta tableMeta = new TableMeta(); String stable_name = rs.getString("stable_name"); tableMeta.tableType = StringUtils.isBlank(stable_name) ? TableType.NML_TABLE : TableType.SUB_TABLE; tableMeta.tbname = rs.getString("table_name"); tableMeta.columns = rs.getInt("columns"); tableMeta.stable_name = StringUtils.isBlank(stable_name) ? null : stable_name; LOG.debug("load table metadata of " + tableMeta.tbname + ": " + tableMeta); return tableMeta; } private ColumnMeta buildColumnMeta(ResultSet rs, boolean isPrimaryKey) throws SQLException { ColumnMeta columnMeta = new ColumnMeta(); columnMeta.field = rs.getString("Field"); columnMeta.type = rs.getString("Type"); columnMeta.length = rs.getInt("Length"); columnMeta.note = rs.getString("Note"); columnMeta.isTag = columnMeta.note != null && columnMeta.note.equals("TAG"); columnMeta.isPrimaryKey = isPrimaryKey; return columnMeta; } public Map loadTagTableNameMap(String table) throws SQLException { if (tags2tbnameMaps.containsKey(table)) return tags2tbnameMaps.get(table); Map tags2tbname = new HashMap<>(); try (Statement stmt = conn.createStatement()) { // describe table List tags = new ArrayList<>(); ResultSet rs = stmt.executeQuery("describe " + table); while (rs.next()) { String note = rs.getString("Note"); if ("TAG".equals(note)) { tags.add(rs.getString("Field")); } } // select distinct tbname, t1, t2 from stb rs = stmt.executeQuery("select distinct " + String.join(",", tags) + ",tbname from " + table); while (rs.next()) { ResultSet finalRs = rs; String tagStr = tags.stream().map(t -> { try { return finalRs.getString(t); } catch (SQLException e) { LOG.error(e.getMessage(), e); } return "NULL"; }).collect(Collectors.joining(TAG_TABLE_NAME_MAP_KEY_SPLITTER)); String tbname = rs.getString("tbname"); tags2tbname.put(tagStr, tbname); } } tags2tbnameMaps.put(table, tags2tbname); return tags2tbname; } } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/TDengineWriter.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.List; public class TDengineWriter extends Writer { private static final String PEER_PLUGIN_NAME = "peerPluginName"; public static class Job extends Writer.Job { private Configuration originalConfig; private static final Logger LOG = LoggerFactory.getLogger(Job.class); @Override public void init() { this.originalConfig = super.getPluginJobConf(); this.originalConfig.set(PEER_PLUGIN_NAME, getPeerPluginName()); // check username String user = this.originalConfig.getString(Key.USERNAME); if (StringUtils.isBlank(user)) throw DataXException.asDataXException(TDengineWriterErrorCode.REQUIRED_VALUE, "The parameter [" + Key.USERNAME + "] is not set."); // check password String password = this.originalConfig.getString(Key.PASSWORD); if (StringUtils.isBlank(password)) throw DataXException.asDataXException(TDengineWriterErrorCode.REQUIRED_VALUE, "The parameter [" + Key.PASSWORD + "] is not set."); // check connection List connection = this.originalConfig.getList(Key.CONNECTION); if (connection == null || connection.isEmpty()) throw DataXException.asDataXException(TDengineWriterErrorCode.REQUIRED_VALUE, "The parameter [" + Key.CONNECTION + "] is not set."); if (connection.size() > 1) LOG.warn("connection.size is " + connection.size() + " and only connection[0] will be used."); Configuration conn = Configuration.from(connection.get(0).toString()); String jdbcUrl = conn.getString(Key.JDBC_URL); if (StringUtils.isBlank(jdbcUrl)) throw DataXException.asDataXException(TDengineWriterErrorCode.REQUIRED_VALUE, "The parameter [" + Key.JDBC_URL + "] of connection is not set."); // check column } @Override public void destroy() { } @Override public List split(int mandatoryNumber) { List writerSplitConfigs = new ArrayList<>(); List conns = this.originalConfig.getList(Key.CONNECTION); for (int i = 0; i < mandatoryNumber; i++) { Configuration clone = this.originalConfig.clone(); Configuration conf = Configuration.from(conns.get(0).toString()); String jdbcUrl = conf.getString(Key.JDBC_URL); clone.set(Key.JDBC_URL, jdbcUrl); clone.set(Key.TABLE, conf.getList(Key.TABLE)); clone.remove(Key.CONNECTION); writerSplitConfigs.add(clone); } return writerSplitConfigs; } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration writerSliceConfig; private TaskPluginCollector taskPluginCollector; @Override public void init() { this.writerSliceConfig = getPluginJobConf(); this.taskPluginCollector = super.getTaskPluginCollector(); } @Override public void destroy() { } @Override public void startWrite(RecordReceiver lineReceiver) { String peerPluginName = this.writerSliceConfig.getString(PEER_PLUGIN_NAME); LOG.debug("start to handle record from: " + peerPluginName); DataHandler handler; if (peerPluginName.equals("opentsdbreader")) handler = new OpentsdbDataHandler(this.writerSliceConfig); else handler = new DefaultDataHandler(this.writerSliceConfig, this.taskPluginCollector); long records = handler.handle(lineReceiver, getTaskPluginCollector()); LOG.debug("handle data finished, records: " + records); } } } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/TDengineWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.common.spi.ErrorCode; public enum TDengineWriterErrorCode implements ErrorCode { REQUIRED_VALUE("TDengineWriter-00", "缺失必要的值"), ILLEGAL_VALUE("TDengineWriter-01", "值非法"), RUNTIME_EXCEPTION("TDengineWriter-02", "运行时异常"), TYPE_ERROR("TDengineWriter-03", "Datax类型无法正确映射到TDengine类型"); private final String code; private final String description; TDengineWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/TableMeta.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; public class TableMeta { TableType tableType; String tbname; int columns; int tags; int tables; String stable_name; @Override public String toString() { return "TableMeta{" + "tableType=" + tableType + ", tbname='" + tbname + '\'' + ", columns=" + columns + ", tags=" + tags + ", tables=" + tables + ", stable_name='" + stable_name + '\'' + '}'; } } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/TableType.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; public enum TableType { SUP_TABLE, SUB_TABLE, NML_TABLE } ================================================ FILE: tdenginewriter/src/main/java/com/alibaba/datax/plugin/writer/tdenginewriter/TimestampPrecision.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; public enum TimestampPrecision { MILLISEC, MICROSEC, NANOSEC } ================================================ FILE: tdenginewriter/src/main/resources/plugin.json ================================================ { "name": "tdenginewriter", "class": "com.alibaba.datax.plugin.writer.tdenginewriter.TDengineWriter", "description": { "useScene": "data migration to tdengine", "mechanism": "use taos-jdbcdriver to write data." }, "developer": "support@taosdata.com" } ================================================ FILE: tdenginewriter/src/main/resources/plugin_job_template.json ================================================ { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "" ], "connection": [ { "table": [ "" ], "jdbcUrl": "" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/Csv2TDengineTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.core.Engine; import org.junit.Ignore; import org.junit.Test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Statement; @Ignore public class Csv2TDengineTest { private static final String host = "192.168.56.105"; @Test public void case01() throws Throwable { // given prepareTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/csv2t.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } public void prepareTable() throws SQLException { final String url = "jdbc:TAOS-RS://" + host + ":6041"; try (Connection conn = DriverManager.getConnection(url, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists test"); stmt.execute("create database if not exists test"); stmt.execute("create table test.weather (ts timestamp, temperature bigint, humidity double, is_normal bool) " + "tags(device_id binary(10),address nchar(10))"); } } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/DM2TDengineTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.core.Engine; import org.junit.Before; import org.junit.Test; import java.sql.*; import java.text.SimpleDateFormat; import java.util.Date; import java.util.Random; public class DM2TDengineTest { private String host1 = "192.168.0.72"; private String host2 = "192.168.1.93"; private final Random random = new Random(System.currentTimeMillis()); @Test public void dm2t_case01() throws Throwable { // given createSupTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/dm2t-1.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void dm2t_case02() throws Throwable { // given createSupAndSubTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/dm2t-2.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void dm2t_case03() throws Throwable { // given createTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/dm2t-3.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void dm2t_case04() throws Throwable { // given createSupTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/dm2t-4.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } private void createSupTable() throws SQLException { final String url2 = "jdbc:TAOS-RS://" + host2 + ":6041"; try (Connection conn = DriverManager.getConnection(url2, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db2"); stmt.execute("create database if not exists db2"); stmt.execute("create table db2.stb2(ts timestamp, f2 smallint, f4 bigint,f5 float, " + "f6 double, f7 double, f8 bool, f9 nchar(100), f10 nchar(200)) tags(f1 tinyint,f3 int)"); stmt.close(); } } private void createSupAndSubTable() throws SQLException { final String url2 = "jdbc:TAOS-RS://" + host2 + ":6041"; try (Connection conn = DriverManager.getConnection(url2, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db2"); stmt.execute("create database if not exists db2"); stmt.execute("create table db2.stb2(ts timestamp, f2 smallint, f4 bigint,f5 float, " + "f6 double, f7 double, f8 bool, f9 nchar(100), f10 nchar(200)) tags(f1 tinyint,f3 int)"); for (int i = 0; i < 10; i++) { stmt.execute("create table db2.t" + (i + 1) + "_" + i + " using db2.stb2 tags(" + (i + 1) + "," + i + ")"); } stmt.close(); } } private void createTable() throws SQLException { final String url2 = "jdbc:TAOS-RS://" + host2 + ":6041"; try (Connection conn = DriverManager.getConnection(url2, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db2"); stmt.execute("create database if not exists db2"); stmt.execute("create table db2.stb2(ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint,f5 float, " + "f6 double, f7 double, f8 bool, f9 nchar(100), f10 nchar(200))"); stmt.close(); } } @Before public void before() throws SQLException { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); long ts = System.currentTimeMillis(); final String url = "jdbc:dm://" + host1 + ":5236"; try (Connection conn = DriverManager.getConnection(url, "TESTUSER", "test123456")) { conn.setAutoCommit(true); Statement stmt = conn.createStatement(); stmt.execute("drop table if exists stb1"); stmt.execute("create table stb1(ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint, f5 float, " + "f6 double, f7 NUMERIC(10,2), f8 BIT, f9 VARCHAR(100), f10 VARCHAR2(200))"); for (int i = 0; i < 10; i++) { String sql = "insert into stb1 values('" + sdf.format(new Date(ts + i * 1000)) + "'," + (i + 1) + "," + random.nextInt(100) + "," + i + ",4,5.55,6.666,7.77," + (random.nextBoolean() ? 1 : 0) + ",'abcABC123','北京朝阳望京DM')"; stmt.execute(sql); } } } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/DefaultDataHandlerTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.common.element.DateColumn; import com.alibaba.datax.common.element.LongColumn; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.element.StringColumn; import com.alibaba.datax.common.plugin.TaskPluginCollector; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.core.transport.record.DefaultRecord; import org.junit.AfterClass; import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Statement; import java.util.List; import java.util.Map; import java.util.stream.Collectors; import java.util.stream.IntStream; public class DefaultDataHandlerTest { private static final String host = "192.168.1.93"; private static Connection conn; private final TaskPluginCollector taskPluginCollector = new TDengineWriter.Task().getTaskPluginCollector(); @Test public void writeSupTableBySQL() throws SQLException { // given createSupAndSubTable(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"column\": [\"tbname\", \"ts\", \"f1\", \"f2\", \"t1\"]," + "\"table\":[\"stb1\"]," + "\"jdbcUrl\":\"jdbc:TAOS-RS://" + host + ":6041/test\"," + "\"batchSize\": \"1000\"" + "}"); long current = System.currentTimeMillis(); List recordList = IntStream.range(1, 11).mapToObj(i -> { Record record = new DefaultRecord(); record.addColumn(new StringColumn("tb" + (i + 10))); record.addColumn(new DateColumn(current + 1000 * i)); record.addColumn(new LongColumn(1)); record.addColumn(new LongColumn(2)); record.addColumn(new LongColumn(i)); return record; }).collect(Collectors.toList()); // when DefaultDataHandler handler = new DefaultDataHandler(configuration, taskPluginCollector); List tables = configuration.getList("table", String.class); SchemaManager schemaManager = new SchemaManager(conn); Map tableMetas = schemaManager.loadTableMeta(tables); Map> columnMetas = schemaManager.loadColumnMetas(tables); handler.setTableMetas(tableMetas); handler.setTbnameColumnMetasMap(columnMetas); handler.setSchemaManager(schemaManager); int count = handler.writeBatch(conn, recordList); // then Assert.assertEquals(10, count); } @Test public void writeSupTableBySQL_2() throws SQLException { // given createSupAndSubTable(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"column\": [\"tbname\", \"ts\", \"f1\", \"t1\"]," + "\"table\":[\"stb1\"]," + "\"jdbcUrl\":\"jdbc:TAOS-RS://" + host + ":6041/test\"," + "\"batchSize\": \"1000\"" + "}"); long current = System.currentTimeMillis(); List recordList = IntStream.range(1, 11).mapToObj(i -> { Record record = new DefaultRecord(); record.addColumn(new StringColumn("tb" + (i + 10))); record.addColumn(new DateColumn(current + 1000 * i)); record.addColumn(new LongColumn(1)); record.addColumn(new LongColumn(i)); return record; }).collect(Collectors.toList()); // when DefaultDataHandler handler = new DefaultDataHandler(configuration, taskPluginCollector); List tables = configuration.getList("table", String.class); SchemaManager schemaManager = new SchemaManager(conn); Map tableMetas = schemaManager.loadTableMeta(tables); Map> columnMetas = schemaManager.loadColumnMetas(tables); handler.setTableMetas(tableMetas); handler.setTbnameColumnMetasMap(columnMetas); handler.setSchemaManager(schemaManager); int count = handler.writeBatch(conn, recordList); // then Assert.assertEquals(10, count); } @Test public void writeSupTableBySchemaless() throws SQLException { // given createSupTable(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"column\": [\"ts\", \"f1\", \"f2\", \"t1\"]," + "\"table\":[\"stb1\"]," + "\"jdbcUrl\":\"jdbc:TAOS://" + host + ":6030/scm_test\"," + "\"batchSize\": \"1000\"" + "}"); String jdbcUrl = configuration.getString("jdbcUrl"); Connection connection = DriverManager.getConnection(jdbcUrl, "root", "taosdata"); long current = System.currentTimeMillis(); List recordList = IntStream.range(1, 11).mapToObj(i -> { Record record = new DefaultRecord(); record.addColumn(new DateColumn(current + 1000 * i)); record.addColumn(new LongColumn(1)); record.addColumn(new LongColumn(2)); record.addColumn(new StringColumn("t" + i + " 22")); return record; }).collect(Collectors.toList()); // when DefaultDataHandler handler = new DefaultDataHandler(configuration, taskPluginCollector); List tables = configuration.getList("table", String.class); SchemaManager schemaManager = new SchemaManager(connection); Map tableMetas = schemaManager.loadTableMeta(tables); Map> columnMetas = schemaManager.loadColumnMetas(tables); handler.setTableMetas(tableMetas); handler.setTbnameColumnMetasMap(columnMetas); handler.setSchemaManager(schemaManager); int count = handler.writeBatch(connection, recordList); // then Assert.assertEquals(10, count); } @Test public void writeSubTableWithTableName() throws SQLException { // given createSupAndSubTable(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"column\": [\"tbname\", \"ts\", \"f1\", \"f2\", \"t1\"]," + "\"table\":[\"tb1\"]," + "\"jdbcUrl\":\"jdbc:TAOS-RS://" + host + ":6041/test\"," + "\"batchSize\": \"1000\"" + "}"); long current = System.currentTimeMillis(); List recordList = IntStream.range(1, 11).mapToObj(i -> { Record record = new DefaultRecord(); record.addColumn(new StringColumn("tb" + i)); record.addColumn(new DateColumn(current + 1000 * i)); record.addColumn(new LongColumn(1)); record.addColumn(new LongColumn(2)); record.addColumn(new LongColumn(i)); return record; }).collect(Collectors.toList()); // when DefaultDataHandler handler = new DefaultDataHandler(configuration, taskPluginCollector); List tables = configuration.getList("table", String.class); SchemaManager schemaManager = new SchemaManager(conn); Map tableMetas = schemaManager.loadTableMeta(tables); Map> columnMetas = schemaManager.loadColumnMetas(tables); handler.setTableMetas(tableMetas); handler.setTbnameColumnMetasMap(columnMetas); handler.setSchemaManager(schemaManager); int count = handler.writeBatch(conn, recordList); // then Assert.assertEquals(1, count); } @Test public void writeSubTableWithoutTableName() throws SQLException { // given createSupAndSubTable(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"column\": [\"ts\", \"f1\", \"f2\", \"t1\"]," + "\"table\":[\"tb1\"]," + "\"jdbcUrl\":\"jdbc:TAOS-RS://" + host + ":6041/test\"," + "\"batchSize\": \"1000\"," + "\"ignoreTagsUnmatched\": \"true\"" + "}"); long current = System.currentTimeMillis(); List recordList = IntStream.range(1, 11).mapToObj(i -> { Record record = new DefaultRecord(); record.addColumn(new DateColumn(current + 1000 * i)); record.addColumn(new LongColumn(1)); record.addColumn(new LongColumn(2)); record.addColumn(new LongColumn(i)); return record; }).collect(Collectors.toList()); // when DefaultDataHandler handler = new DefaultDataHandler(configuration, taskPluginCollector); List tables = configuration.getList("table", String.class); SchemaManager schemaManager = new SchemaManager(conn); Map tableMetas = schemaManager.loadTableMeta(tables); Map> columnMetas = schemaManager.loadColumnMetas(tables); handler.setTableMetas(tableMetas); handler.setTbnameColumnMetasMap(columnMetas); handler.setSchemaManager(schemaManager); int count = handler.writeBatch(conn, recordList); // then Assert.assertEquals(1, count); } @Test public void writeNormalTable() throws SQLException { // given createSupAndSubTable(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"column\": [\"ts\", \"f1\", \"f2\", \"t1\"]," + "\"table\":[\"weather\"]," + "\"jdbcUrl\":\"jdbc:TAOS-RS://" + host + ":6041/test\"," + "\"batchSize\": \"1000\"," + "\"ignoreTagsUnmatched\": \"true\"" + "}"); long current = System.currentTimeMillis(); List recordList = IntStream.range(1, 11).mapToObj(i -> { Record record = new DefaultRecord(); record.addColumn(new DateColumn(current + 1000 * i)); record.addColumn(new LongColumn(1)); record.addColumn(new LongColumn(2)); record.addColumn(new LongColumn(i)); return record; }).collect(Collectors.toList()); // when DefaultDataHandler handler = new DefaultDataHandler(configuration, taskPluginCollector); List tables = configuration.getList("table", String.class); SchemaManager schemaManager = new SchemaManager(conn); Map tableMetas = schemaManager.loadTableMeta(tables); Map> columnMetas = schemaManager.loadColumnMetas(tables); handler.setTableMetas(tableMetas); handler.setTbnameColumnMetasMap(columnMetas); handler.setSchemaManager(schemaManager); int count = handler.writeBatch(conn, recordList); // then Assert.assertEquals(10, count); } private void createSupAndSubTable() throws SQLException { try (Statement stmt = conn.createStatement()) { stmt.execute("drop database if exists scm_test"); stmt.execute("create database if not exists scm_test"); stmt.execute("use scm_test"); stmt.execute("create table stb1(ts timestamp, f1 int, f2 int) tags(t1 nchar(32))"); stmt.execute("create table stb2(ts timestamp, f1 int, f2 int, f3 int) tags(t1 int, t2 int)"); stmt.execute("create table tb1 using stb1 tags(1)"); stmt.execute("create table tb2 using stb1 tags(2)"); stmt.execute("create table tb3 using stb2 tags(1,1)"); stmt.execute("create table tb4 using stb2 tags(2,2)"); stmt.execute("create table weather(ts timestamp, f1 int, f2 int, f3 int, t1 int, t2 int)"); } } private void createSupTable() throws SQLException { try (Statement stmt = conn.createStatement()) { stmt.execute("drop database if exists scm_test"); stmt.execute("create database if not exists scm_test"); stmt.execute("use scm_test"); stmt.execute("create table stb1(ts timestamp, f1 int, f2 int) tags(t1 nchar(32))"); } } @BeforeClass public static void beforeClass() throws SQLException { conn = DriverManager.getConnection("jdbc:TAOS-RS://" + host + ":6041", "root", "taosdata"); } @AfterClass public static void afterClass() throws SQLException { if (conn != null) { conn.close(); } } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/Mongo2TDengineTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.core.Engine; import org.junit.Test; public class Mongo2TDengineTest { @Test public void case01() throws Throwable { // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/mongo2t.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/Mysql2TDengineTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.core.Engine; import org.junit.Before; import org.junit.Test; import java.sql.*; import java.text.SimpleDateFormat; import java.util.Random; public class Mysql2TDengineTest { private static final String host1 = "192.168.56.105"; private static final String host2 = "192.168.1.93"; private static final Random random = new Random(System.currentTimeMillis()); @Test public void mysql2tdengine() throws Throwable { String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/m2t-1.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Before public void before() throws SQLException { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); String ts = sdf.format(new Date(System.currentTimeMillis())); final String url = "jdbc:mysql://" + host1 + ":3306/?useSSL=false&useUnicode=true&charset=UTF-8&generateSimpleParameterMetadata=true"; try (Connection conn = DriverManager.getConnection(url, "root", "123456")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db1"); stmt.execute("create database if not exists db1"); stmt.execute("use db1"); stmt.execute("create table stb1(id int primary key AUTO_INCREMENT, " + "f1 tinyint, f2 smallint, f3 int, f4 bigint, " + "f5 float, f6 double, " + "ts timestamp, dt datetime," + "f7 nchar(100), f8 varchar(100))"); for (int i = 1; i <= 10; i++) { String sql = "insert into stb1(f1, f2, f3, f4, f5, f6, ts, dt, f7, f8) values(" + i + "," + random.nextInt(100) + "," + random.nextInt(100) + "," + random.nextInt(100) + "," + random.nextFloat() + "," + random.nextDouble() + ", " + "'" + ts + "', '" + ts + "', " + "'中国北京朝阳望京abc', '中国北京朝阳望京adc')"; stmt.execute(sql); } stmt.close(); } final String url2 = "jdbc:TAOS-RS://" + host2 + ":6041/"; try (Connection conn = DriverManager.getConnection(url2, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db2"); stmt.execute("create database if not exists db2"); stmt.execute("create table db2.stb2(" + "ts timestamp, dt timestamp, " + "f1 tinyint, f2 smallint, f3 int, f4 bigint, " + "f5 float, f6 double, " + "f7 nchar(100), f8 nchar(100))"); stmt.close(); } } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/Opentsdb2TDengineTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.core.Engine; import org.junit.Assert; import org.junit.Test; import java.sql.*; public class Opentsdb2TDengineTest { @Test public void opentsdb2tdengine() throws SQLException { // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/o2t-1.json"}; System.setProperty("datax.home", "../target/datax/datax"); try { Engine.entry(params); } catch (Throwable e) { e.printStackTrace(); } // assert String jdbcUrl = "jdbc:TAOS://192.168.56.105:6030/test?timestampFormat=TIMESTAMP"; try (Connection conn = DriverManager.getConnection(jdbcUrl, "root", "taosdata")) { Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery("select count(*) from weather_temperature"); int rows = 0; while (rs.next()) { rows = rs.getInt("count(*)"); } Assert.assertEquals(5, rows); stmt.close(); } } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/SchemaManagerTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import org.junit.AfterClass; import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Statement; import java.util.Arrays; import java.util.List; import java.util.Map; public class SchemaManagerTest { private static Connection conn; @Test public void loadTableMeta() throws SQLException { // given SchemaManager schemaManager = new SchemaManager(conn); List tables = Arrays.asList("stb1", "stb2", "tb1", "tb3", "weather"); // when Map tableMetaMap = schemaManager.loadTableMeta(tables); // then TableMeta stb1 = tableMetaMap.get("stb1"); Assert.assertEquals(TableType.SUP_TABLE, stb1.tableType); Assert.assertEquals("stb1", stb1.tbname); Assert.assertEquals(3, stb1.columns); Assert.assertEquals(1, stb1.tags); Assert.assertEquals(2, stb1.tables); TableMeta tb3 = tableMetaMap.get("tb3"); Assert.assertEquals(TableType.SUB_TABLE, tb3.tableType); Assert.assertEquals("tb3", tb3.tbname); Assert.assertEquals(4, tb3.columns); Assert.assertEquals("stb2", tb3.stable_name); TableMeta weather = tableMetaMap.get("weather"); Assert.assertEquals(TableType.NML_TABLE, weather.tableType); Assert.assertEquals("weather", weather.tbname); Assert.assertEquals(6, weather.columns); Assert.assertNull(weather.stable_name); } @Test public void loadColumnMetas() { // given SchemaManager schemaManager = new SchemaManager(conn); List tables = Arrays.asList("stb1", "stb2", "tb1", "tb3", "weather"); // when Map> columnMetaMap = schemaManager.loadColumnMetas(tables); // then List stb1 = columnMetaMap.get("stb1"); Assert.assertEquals(4, stb1.size()); } @Test public void loadTagTableNameMap() throws SQLException { // given SchemaManager schemaManager = new SchemaManager(conn); String table = "stb3"; // when Map tagTableMap = schemaManager.loadTagTableNameMap(table); // then Assert.assertEquals(2, tagTableMap.keySet().size()); Assert.assertTrue(tagTableMap.containsKey("11.1abc")); Assert.assertTrue(tagTableMap.containsKey("22.2defg")); Assert.assertEquals("tb5", tagTableMap.get("11.1abc")); Assert.assertEquals("tb6", tagTableMap.get("22.2defg")); } @BeforeClass public static void beforeClass() throws SQLException { conn = DriverManager.getConnection("jdbc:TAOS-RS://192.168.56.105:6041", "root", "taosdata"); try (Statement stmt = conn.createStatement()) { stmt.execute("drop database if exists scm_test"); stmt.execute("create database if not exists scm_test"); stmt.execute("use scm_test"); stmt.execute("create table stb1(ts timestamp, f1 int, f2 int) tags(t1 int)"); stmt.execute("create table stb2(ts timestamp, f1 int, f2 int, f3 int) tags(t1 int, t2 int)"); stmt.execute("insert into tb1 using stb1 tags(1) values(now, 1, 2)"); stmt.execute("insert into tb2 using stb1 tags(2) values(now, 1, 2)"); stmt.execute("insert into tb3 using stb2 tags(1,1) values(now, 1, 2, 3)"); stmt.execute("insert into tb4 using stb2 tags(2,2) values(now, 1, 2, 3)"); stmt.execute("create table weather(ts timestamp, f1 int, f2 int, f3 int, t1 int, t2 int)"); stmt.execute("create table stb3(ts timestamp, f1 int) tags(t1 int, t2 float, t3 nchar(32))"); stmt.execute("insert into tb5 using stb3 tags(1,1.1,'abc') values(now, 1)"); stmt.execute("insert into tb6 using stb3 tags(2,2.2,'defg') values(now, 2)"); } } @AfterClass public static void afterClass() throws SQLException { if (conn != null) { conn.close(); } } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/Stream2TDengineTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.core.Engine; import org.junit.Before; import org.junit.Test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Statement; public class Stream2TDengineTest { private String host2 = "192.168.56.105"; @Test public void s2t_case1() throws Throwable { // given createSupTable("ms"); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/defaultJob.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void s2t_case2() throws Throwable { // given createSupTable("us"); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/defaultJob.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void s2t_case3() throws Throwable { // given createSupTable("ns"); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/defaultJob.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } void createSupTable(String precision) throws SQLException { final String url = "jdbc:TAOS-RS://" + host2 + ":6041/"; try (Connection conn = DriverManager.getConnection(url, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db2"); stmt.execute("create database if not exists db2 precision '" + precision + "'"); stmt.execute("create table db2.stb2(ts1 timestamp, ts2 timestamp,ts3 timestamp,ts4 timestamp,ts5 timestamp," + "ts6 timestamp,ts7 timestamp, ts8 timestamp, ts9 timestamp, ts10 timestamp, f1 tinyint, f2 smallint," + "f3 int, f4 bigint, f5 float, f6 double," + "f7 bool, f8 binary(100), f9 nchar(100)) tags(t1 timestamp,t2 timestamp,t3 timestamp,t4 timestamp," + "t5 timestamp,t6 timestamp,t7 timestamp, t8 tinyint, t9 smallint, t10 int, t11 bigint, t12 float," + "t13 double, t14 bool, t15 binary(100), t16 nchar(100))"); stmt.close(); } } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/TDengine2TDengineTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.core.Engine; import org.junit.Before; import org.junit.Test; import java.sql.*; import java.text.SimpleDateFormat; import java.util.Random; public class TDengine2TDengineTest { private static final String host1 = "192.168.56.105"; private static final String host2 = "192.168.1.93"; private static final Random random = new Random(System.currentTimeMillis()); @Test public void case_01() throws Throwable { // given createSupTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2t-1.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void case_02() throws Throwable { // given createSupTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2t-2.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void case_03() throws Throwable { // given createSupAndSubTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2t-3.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } @Test public void case_04() throws Throwable { // given createTable(); // when String[] params = {"-mode", "standalone", "-jobid", "-1", "-job", "src/test/resources/t2t-4.json"}; System.setProperty("datax.home", "../target/datax/datax"); Engine.entry(params); } private void createTable() throws SQLException { final String url2 = "jdbc:TAOS-RS://" + host2 + ":6041"; try (Connection conn = DriverManager.getConnection(url2, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db2"); stmt.execute("create database if not exists db2"); stmt.execute("create table db2.weather (ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint, " + "f5 float, f6 double, f7 bool, f8 binary(100), f9 nchar(100))"); stmt.close(); } } private void createSupTable() throws SQLException { final String url2 = "jdbc:TAOS-RS://" + host2 + ":6041"; try (Connection conn = DriverManager.getConnection(url2, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db2"); stmt.execute("create database if not exists db2"); stmt.execute("create table db2.stb2 (ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint," + " f5 float, f6 double, f7 bool, f8 binary(100), f9 nchar(100)) tags(t1 timestamp, t2 tinyint, " + "t3 smallint, t4 int, t5 bigint, t6 float, t7 double, t8 bool, t9 binary(100), t10 nchar(1000))"); stmt.close(); } } private void createSupAndSubTable() throws SQLException { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); final String ts = sdf.format(new Date(System.currentTimeMillis())); final String url2 = "jdbc:TAOS-RS://" + host2 + ":6041"; try (Connection conn = DriverManager.getConnection(url2, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db2"); stmt.execute("create database if not exists db2"); stmt.execute("create table db2.stb2 (ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint," + " f5 float, f6 double, f7 bool, f8 binary(100), f9 nchar(100)) tags(t1 timestamp, t2 tinyint, " + "t3 smallint, t4 int, t5 bigint, t6 float, t7 double, t8 bool, t9 binary(100), t10 nchar(1000))"); stmt.execute("create table db2.t1 using db2.stb2 tags('" + ts + "',1,2,3,4,5.0,6.0,true,'abc123ABC','北京朝阳望京')"); stmt.close(); } } @Before public void before() throws SQLException { final String url = "jdbc:TAOS-RS://" + host1 + ":6041"; try (Connection conn = DriverManager.getConnection(url, "root", "taosdata")) { Statement stmt = conn.createStatement(); stmt.execute("drop database if exists db1"); stmt.execute("create database if not exists db1"); stmt.execute("create table db1.stb1 (ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint," + " f5 float, f6 double, f7 bool, f8 binary(100), f9 nchar(100)) tags(t1 timestamp, t2 tinyint, " + "t3 smallint, t4 int, t5 bigint, t6 float, t7 double, t8 bool, t9 binary(100), t10 nchar(1000))"); for (int i = 0; i < 10; i++) { String sql = "insert into db1.t" + (i + 1) + " using db1.stb1 tags(now+" + i + "s," + random.nextInt(100) + "," + random.nextInt(100) + "," + random.nextInt(100) + "," + random.nextInt(100) + "," + random.nextFloat() + "," + random.nextDouble() + "," + random.nextBoolean() + ",'abc123ABC','北京朝阳望京') values(now+" + i + "s, " + random.nextInt(100) + "," + random.nextInt(100) + "," + random.nextInt(100) + "," + random.nextInt(100) + "," + random.nextFloat() + "," + random.nextDouble() + "," + random.nextBoolean() + ",'abc123ABC','北京朝阳望京')"; stmt.execute(sql); } } } } ================================================ FILE: tdenginewriter/src/test/java/com/alibaba/datax/plugin/writer/tdenginewriter/TDengineWriterTest.java ================================================ package com.alibaba.datax.plugin.writer.tdenginewriter; import com.alibaba.datax.common.util.Configuration; import org.junit.Assert; import org.junit.Before; import org.junit.Test; import java.util.List; public class TDengineWriterTest { TDengineWriter.Job job; @Before public void before() { job = new TDengineWriter.Job(); Configuration configuration = Configuration.from("{" + "\"username\": \"root\"," + "\"password\": \"taosdata\"," + "\"column\": [\"ts\", \"f1\", \"f2\", \"t1\"]," + "\"connection\": [{\"table\":[\"weather\"],\"jdbcUrl\":\"jdbc:TAOS-RS://master:6041/test\"}]," + "\"batchSize\": \"1000\"" + "}"); job.setPluginJobConf(configuration); } @Test public void jobInit() { // when job.init(); // assert Configuration conf = job.getPluginJobConf(); Assert.assertEquals("root", conf.getString("username")); Assert.assertEquals("taosdata", conf.getString("password")); Assert.assertEquals("jdbc:TAOS-RS://master:6041/test", conf.getString("connection[0].jdbcUrl")); Assert.assertEquals(new Integer(1000), conf.getInt("batchSize")); Assert.assertEquals("ts", conf.getString("column[0]")); Assert.assertEquals("f2", conf.getString("column[2]")); } @Test public void jobSplit() { // when job.init(); List configurationList = job.split(10); // assert Assert.assertEquals(10, configurationList.size()); for (Configuration conf : configurationList) { Assert.assertEquals("root", conf.getString("username")); Assert.assertEquals("taosdata", conf.getString("password")); Assert.assertEquals("jdbc:TAOS-RS://master:6041/test", conf.getString("jdbcUrl")); Assert.assertEquals(new Integer(1000), conf.getInt("batchSize")); Assert.assertEquals("ts", conf.getString("column[0]")); Assert.assertEquals("f2", conf.getString("column[2]")); } } } ================================================ FILE: tdenginewriter/src/test/resources/csv2t.json ================================================ { "job": { "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": [ "/Users/yangzy/IdeaProjects/DataX/tdenginewriter/src/test/resources/weather.csv" ], "encoding": "UTF-8", "column": [ { "index": 0, "type": "string" }, { "index": 1, "type": "date", "format": "yyy-MM-dd HH:mm:ss.SSS" }, { "index": 2, "type": "long" }, { "index": 3, "type": "double" }, { "index": 4, "type": "long" }, { "index": 5, "type": "string" }, { "index": 6, "type": "String" } ], "fieldDelimiter": "," } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "temperature", "humidity", "is_normal", "device_id", "address" ], "connection": [ { "table": [ "weather" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/test" } ], "batchSize": 100, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/defaultJob.json ================================================ { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "type": "string", "value": "tb1" }, { "type": "date", "value": "2022-02-20 12:00:01" }, { "type": "date", "value": "2022-02-20 12:00:02.123", "dateFormat": "yyyy-MM-dd HH:mm:ss.SSS" }, { "type": "date", "value": "2022-02-20 12:00:03.123456", "dateFormat": "yyyy-MM-dd HH:mm:ss.SSSSSS" }, { "type": "date", "value": "2022-02-20 12:00:04.123456789", "dateFormat": "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" }, { "type": "string", "value": "2022-02-20 12:00:05.123" }, { "type": "string", "value": "2022-02-20 12:00:06.123456" }, { "type": "string", "value": "2022-02-20 12:00:07.123456789" }, { "type": "long", "value": 1645329608000 }, { "type": "long", "value": 1645329609000000 }, { "type": "long", "value": 1645329610000000000 }, { "type": "long", "random": "0, 10" }, { "type": "long", "random": "0, 100" }, { "type": "long", "random": "0, 1000" }, { "type": "long", "random": "0, 10000" }, { "type": "double", "random": "0, 10" }, { "type": "double", "random": "10, 20" }, { "type": "bool", "random": "0, 50" }, { "type": "bytes", "random": "0, 10" }, { "type": "string", "random": "10, 50" }, { "type": "date", "value": "2022-02-20 12:00:01" }, { "type": "date", "value": "2022-02-20 12:00:02.123", "dateFormat": "yyyy-MM-dd HH:mm:ss.SSS" }, { "type": "date", "value": "2022-02-20 12:00:03.123456", "dateFormat": "yyyy-MM-dd HH:mm:ss.SSSSSS" }, { "type": "date", "value": "2022-02-20 12:00:04.123456789", "dateFormat": "yyyy-MM-dd HH:mm:ss.SSSSSSSSS" }, { "type": "string", "value": "2022-02-20 12:00:05.123" }, { "type": "string", "value": "2022-02-20 12:00:06.123456" }, { "type": "string", "value": "2022-02-20 12:00:07.123456789" }, { "type": "long", "value": 1 }, { "type": "long", "value": 2 }, { "type": "long", "value": 3 }, { "type": "long", "value": 4 }, { "type": "double", "value": 5.55 }, { "type": "double", "value": 6.666666 }, { "type": "bool", "value": true }, { "type": "bytes", "value": "abcABC123" }, { "type": "string", "value": "北京朝阳望京" } ], "sliceRecordCount": 10 } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts1", "ts2", "ts3", "ts4", "ts5", "ts6", "ts7", "ts8", "ts9", "ts10", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10", "t11", "t12", "t13", "t14", "t15", "t16" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/db2" } ], "batchSize": 100, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/dm-schema.sql ================================================ select tablespace_name from dba_data_files; create tablespace test datafile '/home/dmdba/dmdbms/data/DAMENG/test.dbf' size 32 autoextend on next 1 maxsize 1024; create user TESTUSER identified by test123456 default tablespace test; grant dba to TESTUSER; select * from user_tables; drop table if exists stb1; create table stb1 ( ts timestamp, f1 tinyint, f2 smallint, f3 int, f4 bigint, f5 float, f6 double, f7 NUMERIC(10, 2), f8 BIT, f9 VARCHAR(100), f10 VARCHAR2(200) ); ================================================ FILE: tdenginewriter/src/test/resources/dm2t-1.json ================================================ { "job": { "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "TESTUSER", "password": "test123456", "connection": [ { "querySql": [ "select concat(concat(concat('t', f1), '_'),f3) as tbname,* from stb1;" ], "jdbcUrl": [ "jdbc:dm://192.168.0.72:5236" ] } ], "fetchSize": 1024 } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/dm2t-2.json ================================================ { "job": { "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "TESTUSER", "password": "test123456", "connection": [ { "querySql": [ "select concat(concat(concat('t', f1), '_'),f3) as tbname,* from stb1;" ], "jdbcUrl": [ "jdbc:dm://192.168.0.72:5236" ] } ], "fetchSize": 1024, } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10" ], "connection": [ { "table": [ "t1_0" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/dm2t-3.json ================================================ { "job": { "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "TESTUSER", "password": "test123456", "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10" ], "splitPk": "f1", "connection": [ { "table": [ "stb1" ], "jdbcUrl": [ "jdbc:dm://192.168.0.72:5236" ] } ], "fetchSize": 1024, "where": "1 = 1" } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/dm2t-4.json ================================================ { "job": { "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "TESTUSER", "password": "test123456", "connection": [ { "querySql": [ "select * from stb1" ], "jdbcUrl": [ "jdbc:dm://192.168.0.72:5236" ] } ], "fetchSize": 1024 } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS://192.168.1.93:6030/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/clean_env.sh ================================================ #!/bin/bash datax_home_dir=$(dirname $(readlink -f "$0")) curl -H 'Authorization: Basic cm9vdDp0YW9zZGF0YQ==' -d 'drop table if exists db2.stb2;' 192.168.1.93:6041/rest/sql curl -H 'Authorization: Basic cm9vdDp0YW9zZGF0YQ==' -d 'create table if not exists db2.stb2 (`ts` TIMESTAMP,`f2` SMALLINT,`f4` BIGINT,`f5` FLOAT,`f6` DOUBLE,`f7` DOUBLE,`f8` BOOL,`f9` NCHAR(100),`f10` NCHAR(200)) TAGS (`f1` TINYINT,`f3` INT);' 192.168.1.93:6041/rest/sql rm -f ${datax_home_dir}/log/* rm -f ${datax_home_dir}/job/*.csv ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/csv2t-jni.json ================================================ { "job": { "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": [ "/root/workspace/tmp/a.txt" ], "encoding": "UTF-8", "column": [ { "index": 0, "type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS" }, { "index": 1, "type": "long" }, { "index": 2, "type": "long" }, { "index": 3, "type": "long" }, { "index": 4, "type": "long" }, { "index": 5, "type": "double" }, { "index": 6, "type": "double" }, { "index": 7, "type": "boolean" }, { "index": 8, "type": "string" }, { "index": 9, "type": "string" }, { "index": 10, "type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS" }, { "index": 11, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "tbname" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS://192.168.1.93:6030/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/csv2t-restful.json ================================================ { "job": { "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": [ "/root/workspace/tmp/a.txt" ], "encoding": "UTF-8", "column": [ "*" ], "fieldDelimiter": "," } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "tbname" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/dm2t-jni.json ================================================ { "job": { "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "TESTUSER", "password": "test123456", "connection": [ { "querySql": [ "select concat(concat(concat('t', f1), '_'),f3) as tbname,* from stb1;" ], "jdbcUrl": [ "jdbc:dm://192.168.0.72:5236" ] } ], "fetchSize": 1024 } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS://192.168.1.93:6030/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/dm2t-restful.json ================================================ { "job": { "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "TESTUSER", "password": "test123456", "connection": [ { "querySql": [ "select concat(concat(concat('t', f1), '_'),f3) as tbname,* from stb1;" ], "jdbcUrl": [ "jdbc:dm://192.168.0.72:5236" ] } ], "fetchSize": 1024 } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/dm2t-update.json ================================================ { "job": { "content": [ { "reader": { "name": "rdbmsreader", "parameter": { "username": "TESTUSER", "password": "test123456", "connection": [ { "querySql": [ "select concat(concat(concat('t', f1), '_'),f3) as tbname,* from stb1" ], "jdbcUrl": [ "jdbc:dm://192.168.0.72:5236" ] } ], "where": "1=1", "fetchSize": 1024 } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/dm2t_sync.sh ================================================ #!/bin/bash set -e #set -x datax_home_dir=$(dirname $(readlink -f "$0")) table_name="stb1" update_key="ts" while getopts "hd:t:" arg; do case $arg in d) datax_home_dir=$(echo $OPTARG) ;; v) table_name=$(echo $OPTARG) ;; h) echo "Usage: $(basename $0) -d [datax_home_dir] -t [table_name] -k [update_key]" echo " -h help" exit 0 ;; ?) #unknow option echo "unkonw argument" exit 1 ;; esac done if [[ -e ${datax_home_dir}/job/${table_name}.csv ]]; then MAX_TIME=$(cat ${datax_home_dir}/job/${table_name}.csv) else MAX_TIME="null" fi current_datetime=$(date +"%Y-%m-%d %H:%M:%S") current_timestamp=$(date +%s) if [ "$MAX_TIME" != "null" ]; then WHERE="${update_key} >= '$MAX_TIME' and ${update_key} < '$current_datetime'" sed "s/1=1/$WHERE/g" ${datax_home_dir}/job/dm2t-update.json >${datax_home_dir}/job/dm2t_${current_timestamp}.json echo "incremental data synchronization, from '${MAX_TIME}' to '${current_datetime}'" python ${datax_home_dir}/bin/datax.py ${datax_home_dir}/job/dm2t_${current_timestamp}.json 1> /dev/null 2>&1 else echo "full data synchronization, to '${current_datetime}'" python ${datax_home_dir}/bin/datax.py ${datax_home_dir}/job/dm2t-update.json 1> /dev/null 2>&1 fi if [[ $? -ne 0 ]]; then echo "datax migration job falied" else echo ${current_datetime} >$datax_home_dir/job/${table_name}.csv echo "datax migration job success" fi rm -rf ${datax_home_dir}/job/dm2t_${current_timestamp}.json #while true; do ./dm2t_sync.sh; sleep 5s; done ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/t2dm-jni.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "column": [ "*" ], "connection": [ { "table": [ "stb1" ], "jdbcUrl": "jdbc:TAOS://192.168.56.105:6030/db1" } ] } }, "writer": { "name": "rdbmswriter", "parameter": { "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:dm://192.168.0.72:5236" } ], "username": "TESTUSER", "password": "test123456", "table": "stb2", "column": [ "*" ] } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/t2dm-restful.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "column": [ "*" ], "connection": [ { "table": [ "stb1" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/db1" } ] } }, "writer": { "name": "rdbmswriter", "parameter": { "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:dm://192.168.0.72:5236" } ], "username": "TESTUSER", "password": "test123456", "table": "stb2", "column": [ "*" ] } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/incremental_sync/upload.sh ================================================ #!/bin/bash scp t2dm-restful.json root@192.168.56.105:/root/workspace/tmp/datax/job scp t2dm-jni.json root@192.168.56.105:/root/workspace/tmp/datax/job scp dm2t-restful.json root@192.168.56.105:/root/workspace/tmp/datax/job scp dm2t-jni.json root@192.168.56.105:/root/workspace/tmp/datax/job scp dm2t-update.json root@192.168.56.105:/root/workspace/tmp/datax/job scp csv2t-restful.json root@192.168.56.105:/root/workspace/tmp/datax/job scp csv2t-jni.json root@192.168.56.105:/root/workspace/tmp/datax/job scp dm2t_sync.sh root@192.168.56.105:/root/workspace/tmp/datax scp clean_env.sh root@192.168.56.105:/root/workspace/tmp/datax ================================================ FILE: tdenginewriter/src/test/resources/m2t-1.json ================================================ { "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "123456", "column": [ "ts", "dt", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8" ], "splitPk": "id", "connection": [ { "table": [ "stb1" ], "jdbcUrl": [ "jdbc:mysql://192.168.56.105:3306/db1?useSSL=false&useUnicode=true&characterEncoding=utf8" ] } ] } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "dt", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/mongo2t.json ================================================ { "job": { "content": [ { "reader": { "name": "mongodbreader", "parameter": { "address": [ "192.168.1.213:27017" ], "userName": "", "userPassword": "", "dbName": "testdb", "collectionName": "monitor_data", "column": [ { "name": "ct", "type": "date" }, { "name": "pv", "type": "float" }, { "name": "tv", "type": "float" }, { "name": "pid", "type": "float" } ] } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "hmdata", "column": [ "ts", "pressure", "temperature", "position_id" ], "connection": [ { "table": [ "pipeline_data" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.213:6041/mongo3040" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/o2t-1.json ================================================ { "job":{ "content":[{ "reader": { "name": "opentsdbreader", "parameter": { "endpoint": "http://192.168.56.105:4242", "column": ["weather_temperature"], "beginDateTime": "2021-01-01 00:00:00", "endDateTime": "2021-01-01 01:00:00" } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "connection": [ { "table": [ "meters" ], "jdbcUrl": "jdbc:TAOS://192.168.56.105:6030/test?timestampFormat=TIMESTAMP" } ], "batchSize": 1000 } } }], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/t2t-1.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "connection": [ { "table": [ "stb1" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/db1?timestampFormat=TIMESTAMP" } ], "column": [ "tbname", "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10" ], "beginDateTime": "2022-02-15 00:00:00", "endDateTime": "2022-02-16 00:00:00", "splitInterval": "1d" } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "tbname", "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2?timestampFormat=TIMESTAMP" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/t2t-2.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "connection": [ { "table": [ "stb1" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/db1?timestampFormat=TIMESTAMP" } ], "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10" ], "beginDateTime": "2022-02-15 00:00:00", "endDateTime": "2022-02-16 00:00:00", "splitInterval": "1d" } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10" ], "connection": [ { "table": [ "stb2" ], "jdbcUrl": "jdbc:TAOS://192.168.1.93:6030/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/t2t-3.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "connection": [ { "table": [ "stb1" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/db1?timestampFormat=TIMESTAMP" } ], "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10" ], "beginDateTime": "2022-02-15 00:00:00", "endDateTime": "2022-02-16 00:00:00", "splitInterval": "1d" } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10" ], "connection": [ { "table": [ "t1" ], "jdbcUrl": "jdbc:TAOS://192.168.1.93:6030/db2?timestampFormat=TIMESTAMP" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/t2t-4.json ================================================ { "job": { "content": [ { "reader": { "name": "tdenginereader", "parameter": { "username": "root", "password": "taosdata", "connection": [ { "table": [ "stb1" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.56.105:6041/db1?timestampFormat=TIMESTAMP" } ], "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9" ], "beginDateTime": "2022-02-15 00:00:00", "endDateTime": "2022-02-16 00:00:00", "splitInterval": "1d" } }, "writer": { "name": "tdenginewriter", "parameter": { "username": "root", "password": "taosdata", "column": [ "ts", "f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9" ], "connection": [ { "table": [ "weather" ], "jdbcUrl": "jdbc:TAOS-RS://192.168.1.93:6041/db2" } ], "batchSize": 1000, "ignoreTagsUnmatched": true } } } ], "setting": { "speed": { "channel": 1 } } } } ================================================ FILE: tdenginewriter/src/test/resources/weather.csv ================================================ tb1,2022-02-20 04:05:59.255,5,8.591868744,1,abcABC123,北京朝阳望京 tb1,2022-02-20 04:58:47.068,3,1.489693641,1,abcABC123,北京朝阳望京 tb1,2022-02-20 06:31:09.408,1,4.026500719,1,abcABC123,北京朝阳望京 tb1,2022-02-20 08:08:00.336,1,9.606400360,1,abcABC123,北京朝阳望京 tb1,2022-02-20 08:28:58.053,9,7.872178184,1,abcABC123123,北京朝阳望京 tb1,2022-02-20 10:23:20.836,9,2.699478524,1,abcABC123,北京朝阳望京 tb1,2022-02-20 11:09:59.739,7,7.906723716,1,abcABC123,北京朝阳望京 tb1,2022-02-20 19:08:29.315,1,5.852338895,1,abcABC123,北京朝阳望京 tb1,2022-02-20 22:10:06.243,10,5.535007901,1,abcABC123,北京朝阳望京 tb1,2022-02-20 23:52:43.683,10,10.642013185,1,abcABC123,北京朝阳望京 ================================================ FILE: transformer/doc/.gitkeep ================================================ ================================================ FILE: transformer/doc/transformer.md ================================================ # DataX Transformer ## Transformer定义 在数据同步、传输过程中,存在用户对于数据传输进行特殊定制化的需求场景,包括裁剪列、转换列等工作,可以借助ETL的T过程实现(Transformer)。DataX包含了完整的E(Extract)、T(Transformer)、L(Load)支持。 ## 运行模型 ![image](http://git.cn-hangzhou.oss.aliyun-inc.com/uploads/datax/datax/b5652c0492c394684958272219ce327c/image.png) ## UDF手册 1. dx_substr * 参数:3个 * 第一个参数:字段编号,对应record中第几个字段。 * 第二个参数:字段值的开始位置。 * 第三个参数:目标字段长度。 * 返回: 从字符串的指定位置(包含)截取指定长度的字符串。如果开始位置非法抛出异常。如果字段为空值,直接返回(即不参与本transformer) * 举例: ``` dx_substr(1,"2","5") column 1的value为“dataxTest”=>"taxTe" dx_substr(1,"5","10") column 1的value为“dataxTest”=>"Test" ``` 2. dx_pad * 参数:4个 * 第一个参数:字段编号,对应record中第几个字段。 * 第二个参数:"l","r", 指示是在头进行pad,还是尾进行pad。 * 第三个参数:目标字段长度。 * 第四个参数:需要pad的字符。 * 返回: 如果源字符串长度小于目标字段长度,按照位置添加pad字符后返回。如果长于,直接截断(都截右边)。如果字段为空值,转换为空字符串进行pad,即最后的字符串全是需要pad的字符 * 举例: ``` dx_pad(1,"l","4","A"), 如果column 1 的值为 xyz=> Axyz, 值为 xyzzzzz => xyzz dx_pad(1,"r","4","A"), 如果column 1 的值为 xyz=> xyzA, 值为 xyzzzzz => xyzz ``` 3. dx_replace * 参数:4个 * 第一个参数:字段编号,对应record中第几个字段。 * 第二个参数:字段值的开始位置。 * 第三个参数:需要替换的字段长度。 * 第四个参数:需要替换的字符串。 * 返回: 从字符串的指定位置(包含)替换指定长度的字符串。如果开始位置非法抛出异常。如果字段为空值,直接返回(即不参与本transformer) * 举例: ``` dx_replace(1,"2","4","****") column 1的value为“dataxTest”=>"da****est" dx_replace(1,"5","10","****") column 1的value为“dataxTest”=>"datax****" ``` 4. dx_filter (关联filter暂不支持,即多个字段的联合判断,函参太过复杂,用户难以使用。) * 参数: * 第一个参数:字段编号,对应record中第几个字段。 * 第二个参数:运算符,支持以下运算符:like, not like, >, =, <, >=, !=, <= * 第三个参数:正则表达式(java正则表达式)、值。 * 返回: * 如果匹配正则表达式,返回Null,表示过滤该行。不匹配表达式时,表示保留该行。(注意是该行)。对于>=<都是对字段直接compare的结果. * like , not like是将字段转换成String,然后和目标正则表达式进行全匹配。 * >, =, <, >=, !=, <= 对于DoubleColumn比较double值,对于LongColumn和DateColumn比较long值,其他StringColumn,BooleanColumn以及ByteColumn均比较的是StringColumn值。 * 如果目标colunn为空(null),对于 = null的过滤条件,将满足条件,被过滤。!=null的过滤条件,null不满足过滤条件,不被过滤。 like,字段为null不满足条件,不被过滤,和not like,字段为null满足条件,被过滤。 * 举例: ``` dx_filter(1,"like","dataTest") dx_filter(1,">=","10") ``` 5. dx_digest * 参数:3个 * 第一个参数:字段编号,对应record中第几个字段。 * 第二个参数:hash类型,md5、sha1 * 第三个参数:hash值大小写 toUpperCase(大写)、toLowerCase(小写) * 返回: 返回指定类型的hashHex,如果字段为空,则转为空字符串,再返回对应hashHex * 举例: ``` dx_digest(1,"md5","toUpperCase"), column 1的值为 xyzzzzz => 9CDFFC4FA4E45A99DB8BBCD762ACFFA2 ``` 6. dx_groovy * 参数。 * 第一个参数: groovy code * 第二个参数(列表或者为空):extraPackage * 备注: * dx_groovy只能调用一次。不能多次调用。 * groovy code中支持java.lang, java.util的包,可直接引用的对象有record,以及element下的各种column(BoolColumn.class,BytesColumn.class,DateColumn.class,DoubleColumn.class,LongColumn.class,StringColumn.class)。不支持其他包,如果用户有需要用到其他包,可设置extraPackage,注意extraPackage不支持第三方jar包。 * groovy code中,返回更新过的Record(比如record.setColumn(columnIndex, new StringColumn(newValue));),或者null。返回null表示过滤此行。 * 用户可以直接调用静态的Util方式(GroovyTransformerStaticUtil),目前GroovyTransformerStaticUtil的方法列表: * md5(String):String * sha1(String):String * 举例: ``` groovy 实现的subStr: String code = "Column column = record.getColumn(1);\n" + " String oriValue = column.asString();\n" + " String newValue = oriValue.substring(0, 3);\n" + " record.setColumn(1, new StringColumn(newValue));\n" + " return record;"; dx_groovy(record); ``` ``` groovy 实现的Replace String code2 = "Column column = record.getColumn(1);\n" + " String oriValue = column.asString();\n" + " String newValue = \"****\" + oriValue.substring(3, oriValue.length());\n" + " record.setColumn(1, new StringColumn(newValue));\n" + " return record;"; ``` ``` groovy 实现的Pad String code3 = "Column column = record.getColumn(1);\n" + " String oriValue = column.asString();\n" + " String padString = \"12345\";\n" + " String finalPad = \"\";\n" + " int NeedLength = 8 - oriValue.length();\n" + " while (NeedLength > 0) {\n" + "\n" + " if (NeedLength >= padString.length()) {\n" + " finalPad += padString;\n" + " NeedLength -= padString.length();\n" + " } else {\n" + " finalPad += padString.substring(0, NeedLength);\n" + " NeedLength = 0;\n" + " }\n" + " }\n" + " String newValue= finalPad + oriValue;\n" + " record.setColumn(1, new StringColumn(newValue));\n" + " return record;"; ``` ## Job定义 * 本例中,配置4个UDF。 ``` { "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0 } }, "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [ { "value": "DataX", "type": "string" }, { "value": 1724154616370, "type": "long" }, { "value": "2024-01-01 00:00:00", "type": "date" }, { "value": true, "type": "bool" }, { "value": "TestRawData", "type": "bytes" } ], "sliceRecordCount": 100 } }, "writer": { "name": "streamwriter", "parameter": { "print": false, "encoding": "UTF-8" } }, "transformer": [ { "name": "dx_substr", "parameter": { "columnIndex": 5, "paras": [ "1", "3" ] } }, { "name": "dx_replace", "parameter": { "columnIndex": 4, "paras": [ "3", "4", "****" ] } }, { "name": "dx_digest", "parameter": { "columnIndex": 3, "paras": [ "md5", "toLowerCase" ] } }, { "name": "dx_groovy", "parameter": { "code": "//groovy code//", "extraPackage": [ "import somePackage1;", "import somePackage2;" ] } } ] } ] } } ``` ## 计量和脏数据 Transform过程涉及到数据的转换,可能造成数据的增加或减少,因此更加需要精确度量,包括: * Transform的入参Record条数、字节数。 * Transform的出参Record条数、字节数。 * Transform的脏数据Record条数、字节数。 * 如果是多个Transform,某一个发生脏数据,将不会再进行后面的transform,直接统计为脏数据。 * 目前只提供了所有Transform的计量(成功,失败,过滤的count,以及transform的消耗时间)。 涉及到运行过程的计量数据展现定义如下: ``` Total 1000000 records, 22000000 bytes | Transform 100000 records(in), 10000 records(out) | Speed 2.10MB/s, 100000 records/s | Error 0 records, 0 bytes | Percentage 100.00% ``` **注意,这里主要记录转换的输入输出,需要检测数据输入输出的记录数量变化。** 涉及到最终作业的计量数据展现定义如下: ``` 任务启动时刻 : 2015-03-10 17:34:21 任务结束时刻 : 2015-03-10 17:34:31 任务总计耗时 : 10s 任务平均流量 : 2.10MB/s 记录写入速度 : 100000rec/s 转换输入总数 : 1000000 转换输出总数 : 1000000 读出记录总数 : 1000000 同步失败总数 : 0 ``` **注意,这里主要记录转换的输入输出,需要检测数据输入输出的记录数量变化。** ================================================ FILE: transformer/pom.xml ================================================ datax-all com.alibaba.datax 0.0.1-SNAPSHOT 4.0.0 datax-transformer jar datax-transformer UTF-8 0.0.1-SNAPSHOT com.alibaba.datax datax-common ${datax-version} slf4j-log4j12 org.slf4j maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: transformer/src/main/assembly/package.xml ================================================ dir false target/ datax-transformer-0.0.1-SNAPSHOT.jar /lib false /lib runtime ================================================ FILE: transformer/src/main/java/com/alibaba/datax/transformer/ComplexTransformer.java ================================================ package com.alibaba.datax.transformer; import com.alibaba.datax.common.element.Record; import java.util.Map; /** * no comments. * Created by liqiang on 16/3/3. */ public abstract class ComplexTransformer { //transformerName的唯一性在datax中检查,或者提交到插件中心检查。 private String transformerName; public String getTransformerName() { return transformerName; } public void setTransformerName(String transformerName) { this.transformerName = transformerName; } /** * @param record 行记录,UDF进行record的处理后,更新相应的record * @param tContext transformer运行的配置项 * @param paras transformer函数参数 */ abstract public Record evaluate(Record record, Map tContext, Object... paras); } ================================================ FILE: transformer/src/main/java/com/alibaba/datax/transformer/Transformer.java ================================================ package com.alibaba.datax.transformer; import com.alibaba.datax.common.element.Record; /** * no comments. * Created by liqiang on 16/3/3. */ public abstract class Transformer { //transformerName的唯一性在datax中检查,或者提交到插件中心检查。 private String transformerName; public String getTransformerName() { return transformerName; } public void setTransformerName(String transformerName) { this.transformerName = transformerName; } /** * @param record 行记录,UDF进行record的处理后,更新相应的record * @param paras transformer函数参数 */ abstract public Record evaluate(Record record, Object... paras); } ================================================ FILE: tsdbreader/doc/tsdbreader.md ================================================ # TSDBReader 插件文档 ___ ## 1 快速介绍 TSDBReader 插件实现了从阿里云 TSDB 读取数据。阿里云时间序列数据库 ( **T**ime **S**eries **D**ata**b**ase , 简称 TSDB) 是一种集时序数据高效读写,压缩存储,实时计算能力为一体的数据库服务,可广泛应用于物联网和互联网领域,实现对设备及业务服务的实时监控,实时预测告警。详见 TSDB 的阿里云[官网](https://cn.aliyun.com/product/hitsdb)。 ## 2 实现原理 在底层实现上,TSDBReader 通过 HTTP 请求链接到 阿里云 TSDB 实例,利用 `/api/query` 或者 `/api/mquery` 接口将数据点扫描出来(更多细节详见:[时序数据库 TSDB - HTTP API 概览](https://help.aliyun.com/document_detail/63557.html))。而整个同步的过程,是通过时间线和查询时间线范围进行切分。 ## 3 功能说明 ### 3.1 配置样例 * 配置一个从 阿里云 TSDB 数据库同步抽取数据到本地的作业,并以**时序数据**的格式输出: 时序数据样例: ```json {"metric":"m","tags":{"app":"a19","cluster":"c5","group":"g10","ip":"i999","zone":"z1"},"timestamp":1546272263,"value":1} ``` ```json { "job": { "content": [ { "reader": { "name": "tsdbreader", "parameter": { "sinkDbType": "TSDB", "endpoint": "http://localhost:8242", "column": [ "m" ], "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 3 } } } } ``` * 配置一个从 阿里云 TSDB 数据库同步抽取数据到本地的作业,并以**关系型数据**的格式输出: 关系型数据样例: ```txt m 1546272125 a1 c1 g2 i3021 z4 1.0 ``` ```json { "job": { "content": [ { "reader": { "name": "tsdbreader", "parameter": { "sinkDbType": "RDB", "endpoint": "http://localhost:8242", "column": [ "__metric__", "__ts__", "app", "cluster", "group", "ip", "zone", "__value__" ], "metric": [ "m" ], "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 3 } } } } ``` * 配置一个从 阿里云 TSDB 数据库同步抽取**单值**数据到 ADB 的作业: ```json { "job": { "content": [ { "reader": { "name": "tsdbreader", "parameter": { "sinkDbType": "RDB", "endpoint": "http://localhost:8242", "column": [ "__metric__", "__ts__", "app", "cluster", "group", "ip", "zone", "__value__" ], "metric": [ "m" ], "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } }, "writer": { "name": "adswriter", "parameter": { "username": "******", "password": "******", "column": [ "`metric`", "`ts`", "`app`", "`cluster`", "`group`", "`ip`", "`zone`", "`value`" ], "url": "http://localhost:3306", "schema": "datax_test", "table": "datax_test", "writeMode": "insert", "opIndex": "0", "batchSize": "2" } } } ], "setting": { "speed": { "channel": 3 } } } } ``` * 配置一个从 阿里云 TSDB 数据库同步抽取**多值**数据到 ADB 的作业: ```json { "job": { "content": [ { "reader": { "name": "tsdbreader", "parameter": { "sinkDbType": "RDB", "endpoint": "http://localhost:8242", "column": [ "__metric__", "__ts__", "app", "cluster", "group", "ip", "zone", "load", "memory", "cpu" ], "metric": [ "m_field" ], "field": { "m_field": [ "load", "memory", "cpu" ] }, "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } }, "writer": { "name": "adswriter", "parameter": { "username": "******", "password": "******", "column": [ "`metric`", "`ts`", "`app`", "`cluster`", "`group`", "`ip`", "`zone`", "`load`", "`memory`", "`cpu`" ], "url": "http://localhost:3306", "schema": "datax_test", "table": "datax_test_multi_field", "writeMode": "insert", "opIndex": "0", "batchSize": "2" } } } ], "setting": { "speed": { "channel": 3 } } } } ``` * 配置一个从 阿里云 TSDB 数据库同步抽取**单值**数据到 ADB 的作业,并指定过滤部分时间线: ```json { "job": { "content": [ { "reader": { "name": "tsdbreader", "parameter": { "sinkDbType": "RDB", "endpoint": "http://localhost:8242", "column": [ "__metric__", "__ts__", "app", "cluster", "group", "ip", "zone", "__value__" ], "metric": [ "m" ], "tag": { "m": { "app": "a1", "cluster": "c1" } }, "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } }, "writer": { "name": "adswriter", "parameter": { "username": "******", "password": "******", "column": [ "`metric`", "`ts`", "`app`", "`cluster`", "`group`", "`ip`", "`zone`", "`value`" ], "url": "http://localhost:3306", "schema": "datax_test", "table": "datax_test", "writeMode": "insert", "opIndex": "0", "batchSize": "2" } } } ], "setting": { "speed": { "channel": 3 } } } } ``` * 配置一个从 阿里云 TSDB 数据库同步抽取**多值**数据到 ADB 的作业,并指定过滤部分时间线: ```json { "job": { "content": [ { "reader": { "name": "tsdbreader", "parameter": { "sinkDbType": "RDB", "endpoint": "http://localhost:8242", "column": [ "__metric__", "__ts__", "app", "cluster", "group", "ip", "zone", "load", "memory", "cpu" ], "metric": [ "m_field" ], "field": { "m_field": [ "load", "memory", "cpu" ] }, "tag": { "m_field": { "ip": "i999" } }, "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } }, "writer": { "name": "adswriter", "parameter": { "username": "******", "password": "******", "column": [ "`metric`", "`ts`", "`app`", "`cluster`", "`group`", "`ip`", "`zone`", "`load`", "`memory`", "`cpu`" ], "url": "http://localhost:3306", "schema": "datax_test", "table": "datax_test_multi_field", "writeMode": "insert", "opIndex": "0", "batchSize": "2" } } } ], "setting": { "speed": { "channel": 3 } } } } ``` * 配置一个从 阿里云 TSDB 数据库同步抽取**单值**数据到另一个 阿里云 TSDB 数据库 的作业: ```json { "job": { "content": [ { "reader": { "name": "tsdbreader", "parameter": { "sinkDbType": "TSDB", "endpoint": "http://localhost:8242", "column": [ "m" ], "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } }, "writer": { "name": "tsdbwriter", "parameter": { "endpoint": "http://localhost:8240" } } } ], "setting": { "speed": { "channel": 3 } } } } ``` * 配置一个从 阿里云 TSDB 数据库同步抽取**多值**数据到另一个 阿里云 TSDB 数据库 的作业: ```json { "job": { "content": [ { "reader": { "name": "tsdbreader", "parameter": { "sinkDbType": "TSDB", "endpoint": "http://localhost:8242", "column": [ "m_field" ], "field": { "m_field": [ "load", "memory", "cpu" ] }, "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } }, "writer": { "name": "tsdbwriter", "parameter": { "multiField": true, "endpoint": "http://localhost:8240" } } } ], "setting": { "speed": { "channel": 3 } } } } ``` ### 3.2 参数说明 * **name** * 描述:本插件的名称 * 必选:是 * 默认值:tsdbreader * **parameter** * **sinkDbType** * 描述:目标数据库的类型 * 必选:否 * 默认值:TSDB * 注意:目前支持 TSDB 和 RDB 两个取值。其中,TSDB 包括 阿里云 TSDB、OpenTSDB、InfluxDB、Prometheus 和 TimeScale。RDB 包括 ADB、MySQL、Oracle、PostgreSQL 和 DRDS 等。 * **endpoint** * 描述:阿里云 TSDB 的 HTTP 连接地址 * 必选:是 * 格式:http://IP:Port * 默认值:无 * **column** * 描述:TSDB 场景下:数据迁移任务需要迁移的 Metric 列表;RDB 场景下:映射到关系型数据库中的表字段,且增加 `__metric__`、`__ts__` 和 `__value__` 三个字段,其中 `__metric__` 用于映射度量字段,`__ts__` 用于映射 timestamp 字段,而 `__value__` 仅适用于单值场景,用于映射度量值,多值场景下,直接指定 field 字段即可 * 必选:是 * 默认值:无 * **metric** * 描述:仅适用于 RDB 场景下,表示数据迁移任务需要迁移的 Metric 列表 * 必选:否 * 默认值:无 * **field** * 描述:仅适用于多值场景下,表示数据迁移任务需要迁移的 Field 列表 * 必选:否 * 默认值:无 * **tag** * 描述:数据迁移任务需要迁移的 TagK 和 TagV,用于进一步过滤时间线 * 必选:否 * 默认值:无 * **splitIntervalMs** * 描述:用于 DataX 内部切分 Task,每个 Task 只查询一小部分的时间段 * 必选:是 * 默认值:无 * 注意:单位是 ms 毫秒 * **beginDateTime** * 描述:和 endDateTime 配合使用,用于指定哪个时间段内的数据点,需要被迁移 * 必选:是 * 格式:`yyyy-MM-dd HH:mm:ss` * 默认值:无 * 注意:指定起止时间会自动忽略分钟和秒,转为整点时刻,例如 2019-4-18 的 [3:35, 4:55) 会被转为 [3:00, 4:00) * **endDateTime** * 描述:和 beginDateTime 配合使用,用于指定哪个时间段内的数据点,需要被迁移 * 必选:是 * 格式:`yyyy-MM-dd HH:mm:ss` * 默认值:无 * 注意:指定起止时间会自动忽略分钟和秒,转为整点时刻,例如 2019-4-18 的 [3:35, 4:55) 会被转为 [3:00, 4:00) ### 3.3 类型转换 | DataX 内部类型 | TSDB 数据类型 | | -------------- | ------------------------------------------------------------ | | String | TSDB 数据点序列化字符串,包括 timestamp、metric、tags、fields 和 value | ## 4 约束限制 ### 4.2 如果存在某一个 Metric 下在一个小时范围内的数据量过大,可能需要通过 `-j` 参数调整 JVM 内存大小 考虑到下游 Writer 如果写入速度不及 TSDB Reader 的查询数据,可能会存在积压的情况,因此需要适当地调整 JVM 参数。以"从 阿里云 TSDB 数据库同步抽取数据到本地的作业"为例,启动命令如下: ```bash python datax/bin/datax.py tsdb2stream.json -j "-Xms4096m -Xmx4096m" ``` ### 4.3 指定起止时间会自动被转为整点时刻 指定起止时间会自动被转为整点时刻,例如 2019-4-18 的 `[3:35, 3:55)` 会被转为 `[3:00, 4:00)` ================================================ FILE: tsdbreader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT tsdbreader tsdbreader jar UTF-8 3.3.2 4.5 2.4 4.13.1 2.9.9 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j commons-math3 org.apache.commons org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.commons commons-lang3 ${commons-lang3.version} org.apache.httpcomponents httpclient ${httpclient.version} commons-io commons-io ${commons-io.version} org.apache.httpcomponents fluent-hc ${httpclient.version} com.alibaba.fastjson2 fastjson2 joda-time joda-time ${joda-time.version} junit junit ${junit4.version} test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: tsdbreader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/tsdbreader target/ tsdbreader-0.0.1-SNAPSHOT.jar plugin/reader/tsdbreader false plugin/reader/tsdbreader/libs runtime ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader; import java.util.HashSet; import java.util.Set; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Constant * * @author Benedict Jin * @since 2019-10-21 */ public final class Constant { static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; public static final String METRIC_SPECIFY_KEY = "__metric__"; public static final String METRIC_SPECIFY_KEY_PREFIX = METRIC_SPECIFY_KEY + "."; public static final int METRIC_SPECIFY_KEY_PREFIX_LENGTH = METRIC_SPECIFY_KEY_PREFIX.length(); public static final String TS_SPECIFY_KEY = "__ts__"; public static final String VALUE_SPECIFY_KEY = "__value__"; static final Set MUST_CONTAINED_SPECIFY_KEYS = new HashSet<>(); static { MUST_CONTAINED_SPECIFY_KEYS.add(METRIC_SPECIFY_KEY); MUST_CONTAINED_SPECIFY_KEYS.add(TS_SPECIFY_KEY); // __value__ 在多值场景下,可以不指定 } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/Key.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader; import java.util.HashSet; import java.util.Set; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Key * * @author Benedict Jin * @since 2019-10-21 */ public class Key { // TSDB for OpenTSDB / InfluxDB / TimeScale / Prometheus etc. // RDB for MySQL / ADB etc. static final String SINK_DB_TYPE = "sinkDbType"; static final String ENDPOINT = "endpoint"; static final String USERNAME = "username"; static final String PASSWORD = "password"; static final String COLUMN = "column"; static final String METRIC = "metric"; static final String FIELD = "field"; static final String TAG = "tag"; static final String COMBINE = "combine"; static final String INTERVAL_DATE_TIME = "splitIntervalMs"; static final String BEGIN_DATE_TIME = "beginDateTime"; static final String END_DATE_TIME = "endDateTime"; static final String HINT = "hint"; static final Boolean COMBINE_DEFAULT_VALUE = false; static final Integer INTERVAL_DATE_TIME_DEFAULT_VALUE = 60; static final String TYPE_DEFAULT_VALUE = "TSDB"; static final Set TYPE_SET = new HashSet<>(); static { TYPE_SET.add("TSDB"); TYPE_SET.add("RDB"); } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/TSDBReader.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.reader.tsdbreader.conn.TSDBConnection; import com.alibaba.datax.plugin.reader.tsdbreader.util.TimeUtils; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.joda.time.DateTime; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Reader * * @author Benedict Jin * @since 2019-10-21 */ @SuppressWarnings("unused") public class TSDBReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig; @Override public void init() { this.originalConfig = super.getPluginJobConf(); String type = originalConfig.getString(Key.SINK_DB_TYPE, Key.TYPE_DEFAULT_VALUE); if (StringUtils.isBlank(type)) { throw DataXException.asDataXException( TSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.SINK_DB_TYPE + "] is not set."); } if (!Key.TYPE_SET.contains(type)) { throw DataXException.asDataXException( TSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.SINK_DB_TYPE + "] should be one of [" + JSON.toJSONString(Key.TYPE_SET) + "]."); } String address = originalConfig.getString(Key.ENDPOINT); if (StringUtils.isBlank(address)) { throw DataXException.asDataXException( TSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.ENDPOINT + "] is not set."); } String username = originalConfig.getString(Key.USERNAME, null); if (StringUtils.isBlank(username)) { LOG.warn("The parameter [" + Key.USERNAME + "] is blank."); } String password = originalConfig.getString(Key.PASSWORD, null); if (StringUtils.isBlank(password)) { LOG.warn("The parameter [" + Key.PASSWORD + "] is blank."); } // tagK / field could be empty if ("TSDB".equals(type)) { List columns = originalConfig.getList(Key.COLUMN, String.class); if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException( TSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.COLUMN + "] is not set."); } } else { List columns = originalConfig.getList(Key.COLUMN, String.class); if (columns == null || columns.isEmpty()) { throw DataXException.asDataXException( TSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.COLUMN + "] is not set."); } for (String specifyKey : Constant.MUST_CONTAINED_SPECIFY_KEYS) { boolean containSpecifyKey = false; for (String column : columns) { if (column.startsWith(specifyKey)) { containSpecifyKey = true; break; } } if (!containSpecifyKey) { throw DataXException.asDataXException( TSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.COLUMN + "] should contain " + JSON.toJSONString(Constant.MUST_CONTAINED_SPECIFY_KEYS) + "."); } } final List metrics = originalConfig.getList(Key.METRIC, String.class); if (metrics == null || metrics.isEmpty()) { throw DataXException.asDataXException( TSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.METRIC + "] is not set."); } } Integer splitIntervalMs = originalConfig.getInt(Key.INTERVAL_DATE_TIME, Key.INTERVAL_DATE_TIME_DEFAULT_VALUE); if (splitIntervalMs <= 0) { throw DataXException.asDataXException( TSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.INTERVAL_DATE_TIME + "] should be great than zero."); } Boolean isCombine = originalConfig.getBool(Key.COMBINE, Key.COMBINE_DEFAULT_VALUE); SimpleDateFormat format = new SimpleDateFormat(Constant.DEFAULT_DATA_FORMAT); String startTime = originalConfig.getString(Key.BEGIN_DATE_TIME); Long startDate; if (startTime == null || startTime.trim().length() == 0) { throw DataXException.asDataXException( TSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.BEGIN_DATE_TIME + "] is not set."); } else { try { startDate = format.parse(startTime).getTime(); } catch (ParseException e) { throw DataXException.asDataXException(TSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.BEGIN_DATE_TIME + "] needs to conform to the [" + Constant.DEFAULT_DATA_FORMAT + "] format."); } } String endTime = originalConfig.getString(Key.END_DATE_TIME); Long endDate; if (endTime == null || endTime.trim().length() == 0) { throw DataXException.asDataXException( TSDBReaderErrorCode.REQUIRED_VALUE, "The parameter [" + Key.END_DATE_TIME + "] is not set."); } else { try { endDate = format.parse(endTime).getTime(); } catch (ParseException e) { throw DataXException.asDataXException(TSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.END_DATE_TIME + "] needs to conform to the [" + Constant.DEFAULT_DATA_FORMAT + "] format."); } } if (startDate >= endDate) { throw DataXException.asDataXException(TSDBReaderErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.BEGIN_DATE_TIME + "] should be less than the parameter [" + Key.END_DATE_TIME + "]."); } } @Override public void prepare() { } @Override public List split(int adviceNumber) { List configurations = new ArrayList<>(); // get metrics String type = originalConfig.getString(Key.SINK_DB_TYPE, Key.TYPE_DEFAULT_VALUE); List columns4TSDB = null; List columns4RDB = null; List metrics = null; if ("TSDB".equals(type)) { columns4TSDB = originalConfig.getList(Key.COLUMN, String.class); } else { columns4RDB = originalConfig.getList(Key.COLUMN, String.class); metrics = originalConfig.getList(Key.METRIC, String.class); } // get time interval Integer splitIntervalMs = originalConfig.getInt(Key.INTERVAL_DATE_TIME, Key.INTERVAL_DATE_TIME_DEFAULT_VALUE); // get time range SimpleDateFormat format = new SimpleDateFormat(Constant.DEFAULT_DATA_FORMAT); long startTime; try { startTime = format.parse(originalConfig.getString(Key.BEGIN_DATE_TIME)).getTime(); } catch (ParseException e) { throw DataXException.asDataXException( TSDBReaderErrorCode.ILLEGAL_VALUE, "Analysis [" + Key.BEGIN_DATE_TIME + "] failed.", e); } long endTime; try { endTime = format.parse(originalConfig.getString(Key.END_DATE_TIME)).getTime(); } catch (ParseException e) { throw DataXException.asDataXException( TSDBReaderErrorCode.ILLEGAL_VALUE, "Analysis [" + Key.END_DATE_TIME + "] failed.", e); } if (TimeUtils.isSecond(startTime)) { startTime *= 1000; } if (TimeUtils.isSecond(endTime)) { endTime *= 1000; } DateTime startDateTime = new DateTime(TimeUtils.getTimeInHour(startTime)); DateTime endDateTime = new DateTime(TimeUtils.getTimeInHour(endTime)); final Boolean isCombine = originalConfig.getBool(Key.COMBINE, Key.COMBINE_DEFAULT_VALUE); if ("TSDB".equals(type)) { if (isCombine) { // split by time in hour while (startDateTime.isBefore(endDateTime)) { Configuration clone = this.originalConfig.clone(); clone.set(Key.COLUMN, columns4TSDB); clone.set(Key.BEGIN_DATE_TIME, startDateTime.getMillis()); startDateTime = startDateTime.plusMillis(splitIntervalMs); // Make sure the time interval is [start, end). clone.set(Key.END_DATE_TIME, startDateTime.getMillis() - 1); configurations.add(clone); LOG.info("Configuration: {}", JSON.toJSONString(clone)); } } else { // split by time in hour while (startDateTime.isBefore(endDateTime)) { // split by metric for (String column : columns4TSDB) { Configuration clone = this.originalConfig.clone(); clone.set(Key.COLUMN, Collections.singletonList(column)); clone.set(Key.BEGIN_DATE_TIME, startDateTime.getMillis()); startDateTime = startDateTime.plusMillis(splitIntervalMs); // Make sure the time interval is [start, end). clone.set(Key.END_DATE_TIME, startDateTime.getMillis() - 1); configurations.add(clone); LOG.info("Configuration: {}", JSON.toJSONString(clone)); } } } } else { if (isCombine) { while (startDateTime.isBefore(endDateTime)) { Configuration clone = this.originalConfig.clone(); clone.set(Key.COLUMN, columns4RDB); clone.set(Key.METRIC, metrics); clone.set(Key.BEGIN_DATE_TIME, startDateTime.getMillis()); startDateTime = startDateTime.plusMillis(splitIntervalMs); // Make sure the time interval is [start, end). clone.set(Key.END_DATE_TIME, startDateTime.getMillis() - 1); configurations.add(clone); LOG.info("Configuration: {}", JSON.toJSONString(clone)); } } else { // split by time in hour while (startDateTime.isBefore(endDateTime)) { // split by metric for (String metric : metrics) { Configuration clone = this.originalConfig.clone(); clone.set(Key.COLUMN, columns4RDB); clone.set(Key.METRIC, Collections.singletonList(metric)); clone.set(Key.BEGIN_DATE_TIME, startDateTime.getMillis()); startDateTime = startDateTime.plusMillis(splitIntervalMs); // Make sure the time interval is [start, end). clone.set(Key.END_DATE_TIME, startDateTime.getMillis() - 1); configurations.add(clone); LOG.info("Configuration: {}", JSON.toJSONString(clone)); } } } } return configurations; } @Override public void post() { } @Override public void destroy() { } } public static class Task extends Reader.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private String type; private List columns4TSDB = null; private List columns4RDB = null; private List metrics = null; private Map fields; private Map tags; private TSDBConnection conn; private Long startTime; private Long endTime; private Boolean isCombine; private Map hint; @Override public void init() { Configuration readerSliceConfig = super.getPluginJobConf(); LOG.info("getPluginJobConf: {}", JSON.toJSONString(readerSliceConfig)); this.type = readerSliceConfig.getString(Key.SINK_DB_TYPE); if ("TSDB".equals(type)) { columns4TSDB = readerSliceConfig.getList(Key.COLUMN, String.class); } else { columns4RDB = readerSliceConfig.getList(Key.COLUMN, String.class); metrics = readerSliceConfig.getList(Key.METRIC, String.class); } this.fields = readerSliceConfig.getMap(Key.FIELD); this.tags = readerSliceConfig.getMap(Key.TAG); String address = readerSliceConfig.getString(Key.ENDPOINT); String username = readerSliceConfig.getString(Key.USERNAME); String password = readerSliceConfig.getString(Key.PASSWORD); conn = new TSDBConnection(address, username, password); this.startTime = readerSliceConfig.getLong(Key.BEGIN_DATE_TIME); this.endTime = readerSliceConfig.getLong(Key.END_DATE_TIME); this.isCombine = readerSliceConfig.getBool(Key.COMBINE, Key.COMBINE_DEFAULT_VALUE); this.hint = readerSliceConfig.getMap(Key.HINT); } @Override public void prepare() { } @Override @SuppressWarnings("unchecked") public void startRead(RecordSender recordSender) { try { if ("TSDB".equals(type)) { for (String metric : columns4TSDB) { final Map tags = this.tags == null ? null : (Map) this.tags.get(metric); if (fields == null || !fields.containsKey(metric)) { conn.sendDPs(metric, tags, this.startTime, this.endTime, recordSender, hint); } else { conn.sendDPs(metric, (List) fields.get(metric), tags, this.startTime, this.endTime, recordSender, hint); } } } else { if (isCombine) { final Map tags = this.tags == null ? null : (Map) this.tags.get(metrics.get(0)); conn.sendRecords(metrics, tags, startTime, endTime, columns4RDB, recordSender, hint); } else { for (String metric : metrics) { final Map tags = this.tags == null ? null : (Map) this.tags.get(metric); if (fields == null || !fields.containsKey(metric)) { conn.sendRecords(metric, tags, startTime, endTime, columns4RDB, isCombine, recordSender, hint); } else { conn.sendRecords(metric, (List) fields.get(metric), tags, startTime, endTime, columns4RDB, recordSender, hint); } } } } } catch (Exception e) { throw DataXException.asDataXException( TSDBReaderErrorCode.ILLEGAL_VALUE, "Error in getting or sending data point!", e); } } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/TSDBReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader; import com.alibaba.datax.common.spi.ErrorCode; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Reader Error Code * * @author Benedict Jin * @since 2019-10-21 */ public enum TSDBReaderErrorCode implements ErrorCode { REQUIRED_VALUE("TSDBReader-00", "缺失必要的值"), ILLEGAL_VALUE("TSDBReader-01", "值非法"); private final String code; private final String description; TSDBReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/Connection4TSDB.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import com.alibaba.datax.common.plugin.RecordSender; import java.util.List; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Connection for TSDB-like databases * * @author Benedict Jin * @since 2019-10-21 */ public interface Connection4TSDB { /** * Get the address of Database. * * @return host+ip */ String address(); /** * Get the address of Database. * * @return host+ip */ String username(); /** * Get the address of Database. * * @return host+ip */ String password(); /** * Get the version of Database. * * @return version */ String version(); /** * Get these configurations. * * @return configs */ String config(); /** * Get the list of supported version. * * @return version list */ String[] getSupportVersionPrefix(); /** * Send data points for TSDB with single field. */ void sendDPs(String metric, Map tags, Long start, Long end, RecordSender recordSender, Map hint) throws Exception; /** * Send data points for TSDB with multi fields. */ void sendDPs(String metric, List fields, Map tags, Long start, Long end, RecordSender recordSender, Map hint) throws Exception; /** * Send data points for RDB with single field. */ void sendRecords(String metric, Map tags, Long start, Long end, List columns4RDB, Boolean isCombine, RecordSender recordSender, Map hint) throws Exception; /** * Send data points for RDB with multi fields. */ void sendRecords(String metric, List fields, Map tags, Long start, Long end, List columns4RDB, RecordSender recordSender, Map hint) throws Exception; /** * Send data points for RDB with single fields on combine mode. */ void sendRecords(List metrics, Map tags, Long start, Long end, List columns4RDB, RecordSender recordSender, Map hint) throws Exception; /** * Put data point. * * @param dp data point * @return whether the data point is written successfully */ boolean put(DataPoint4TSDB dp); /** * Put data points. * * @param dps data points * @return whether the data point is written successfully */ boolean put(List dps); /** * Whether current version is supported. * * @return true: supported; false: not yet! */ boolean isSupported(); } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4MultiFieldsTSDB.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import com.alibaba.fastjson2.JSON; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:DataPoint for TSDB with Multi Fields * * @author Benedict Jin * @since 2019-10-21 */ public class DataPoint4MultiFieldsTSDB { private long timestamp; private String metric; private Map tags; private Map fields; public DataPoint4MultiFieldsTSDB() { } public DataPoint4MultiFieldsTSDB(long timestamp, String metric, Map tags, Map fields) { this.timestamp = timestamp; this.metric = metric; this.tags = tags; this.fields = fields; } public long getTimestamp() { return timestamp; } public void setTimestamp(long timestamp) { this.timestamp = timestamp; } public String getMetric() { return metric; } public void setMetric(String metric) { this.metric = metric; } public Map getTags() { return tags; } public void setTags(Map tags) { this.tags = tags; } public Map getFields() { return fields; } public void setFields(Map fields) { this.fields = fields; } @Override public String toString() { return JSON.toJSONString(this); } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/DataPoint4TSDB.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import com.alibaba.fastjson2.JSON; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:DataPoint for TSDB * * @author Benedict Jin * @since 2019-10-21 */ public class DataPoint4TSDB { private long timestamp; private String metric; private Map tags; private Object value; public DataPoint4TSDB() { } public DataPoint4TSDB(long timestamp, String metric, Map tags, Object value) { this.timestamp = timestamp; this.metric = metric; this.tags = tags; this.value = value; } public long getTimestamp() { return timestamp; } public void setTimestamp(long timestamp) { this.timestamp = timestamp; } public String getMetric() { return metric; } public void setMetric(String metric) { this.metric = metric; } public Map getTags() { return tags; } public void setTags(Map tags) { this.tags = tags; } public Object getValue() { return value; } public void setValue(Object value) { this.value = value; } @Override public String toString() { return JSON.toJSONString(this); } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/MultiFieldQueryResult.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import java.util.List; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Multi Field Query Result * * @author Benedict Jin * @since 2019-10-22 */ public class MultiFieldQueryResult { private String metric; private Map tags; private List aggregatedTags; private List columns; private List> values; public MultiFieldQueryResult() { } public String getMetric() { return metric; } public void setMetric(String metric) { this.metric = metric; } public Map getTags() { return tags; } public void setTags(Map tags) { this.tags = tags; } public List getAggregatedTags() { return aggregatedTags; } public void setAggregatedTags(List aggregatedTags) { this.aggregatedTags = aggregatedTags; } public List getColumns() { return columns; } public void setColumns(List columns) { this.columns = columns; } public List> getValues() { return values; } public void setValues(List> values) { this.values = values; } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/QueryResult.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import java.util.List; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Query Result * * @author Benedict Jin * @since 2019-09-19 */ public class QueryResult { private String metricName; private Map tags; private List groupByTags; private List aggregatedTags; private Map dps; public QueryResult() { } public String getMetricName() { return metricName; } public void setMetricName(String metricName) { this.metricName = metricName; } public Map getTags() { return tags; } public void setTags(Map tags) { this.tags = tags; } public List getGroupByTags() { return groupByTags; } public void setGroupByTags(List groupByTags) { this.groupByTags = groupByTags; } public List getAggregatedTags() { return aggregatedTags; } public void setAggregatedTags(List aggregatedTags) { this.aggregatedTags = aggregatedTags; } public Map getDps() { return dps; } public void setDps(Map dps) { this.dps = dps; } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBConnection.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.tsdbreader.util.TSDBUtils; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import java.util.List; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Connection * * @author Benedict Jin * @since 2019-10-21 */ public class TSDBConnection implements Connection4TSDB { private String address; private String username; private String password; public TSDBConnection(String address, String username, String password) { this.address = address; this.username = username; this.password = password; } @Override public String address() { return address; } @Override public String username() { return username; } @Override public String password() { return password; } @Override public String version() { return TSDBUtils.version(address, username, password); } @Override public String config() { return TSDBUtils.config(address, username, password); } @Override public String[] getSupportVersionPrefix() { return new String[]{"2.4", "2.5"}; } @Override public void sendDPs(String metric, Map tags, Long start, Long end, RecordSender recordSender, Map hint) throws Exception { TSDBDump.dump4TSDB(this, metric, tags, start, end, recordSender, hint); } @Override public void sendDPs(String metric, List fields, Map tags, Long start, Long end, RecordSender recordSender, Map hint) throws Exception { TSDBDump.dump4TSDB(this, metric, fields, tags, start, end, recordSender, hint); } @Override public void sendRecords(String metric, Map tags, Long start, Long end, List columns4RDB, Boolean isCombine, RecordSender recordSender, Map hint) throws Exception { TSDBDump.dump4RDB(this, metric, tags, start, end, columns4RDB, recordSender, hint); } @Override public void sendRecords(List metrics, Map tags, Long start, Long end, List columns4RDB, RecordSender recordSender, Map hint) throws Exception { TSDBDump.dump4RDB(this, metrics, tags, start, end, columns4RDB, recordSender, hint); } @Override public void sendRecords(String metric, List fields, Map tags, Long start, Long end, List columns4RDB, RecordSender recordSender, Map hint) throws Exception { TSDBDump.dump4RDB(this, metric, fields, tags, start, end, columns4RDB, recordSender, hint); } @Override public boolean put(DataPoint4TSDB dp) { return false; } @Override public boolean put(List dps) { return false; } @Override public boolean isSupported() { String versionJson = version(); if (StringUtils.isBlank(versionJson)) { throw new RuntimeException("Cannot get the version!"); } String version = JSON.parseObject(versionJson).getString("version"); if (StringUtils.isBlank(version)) { return false; } for (String prefix : getSupportVersionPrefix()) { if (version.startsWith(prefix)) { return true; } } return false; } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBDump.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import com.alibaba.datax.common.element.*; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.reader.tsdbreader.Constant; import com.alibaba.datax.plugin.reader.tsdbreader.util.HttpUtils; import com.alibaba.fastjson2.JSON; import com.alibaba.fastjson2.JSONReader; import com.alibaba.fastjson2.JSONReader.Feature; import com.alibaba.fastjson2.JSONWriter; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.*; import static com.alibaba.datax.plugin.reader.tsdbreader.Constant.METRIC_SPECIFY_KEY_PREFIX_LENGTH; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Dump * * @author Benedict Jin * @since 2019-10-21 */ final class TSDBDump { private static final Logger LOG = LoggerFactory.getLogger(TSDBDump.class); private static final String QUERY = "/api/query"; private static final String QUERY_MULTI_FIELD = "/api/mquery"; static { JSON.config(Feature.UseBigDecimalForDoubles); } private TSDBDump() { } static void dump4TSDB(TSDBConnection conn, String metric, Map tags, Long start, Long end, RecordSender sender, Map hint) throws Exception { LOG.info("conn address: {}, metric: {}, start: {}, end: {}", conn.address(), metric, start, end); String res = queryRange4SingleField(conn, metric, tags, start, end, hint); List dps = getDps4TSDB(metric, res); if (dps == null || dps.isEmpty()) { return; } sendTSDBDps(sender, dps); } static void dump4TSDB(TSDBConnection conn, String metric, List fields, Map tags, Long start, Long end, RecordSender sender, Map hint) throws Exception { LOG.info("conn address: {}, metric: {}, start: {}, end: {}", conn.address(), metric, start, end); String res = queryRange4MultiFields(conn, metric, fields, tags, start, end, hint); List dps = getDps4TSDB(metric, fields, res); if (dps == null || dps.isEmpty()) { return; } sendTSDBDps(sender, dps); } static void dump4RDB(TSDBConnection conn, String metric, Map tags, Long start, Long end, List columns4RDB, RecordSender sender, Map hint) throws Exception { LOG.info("conn address: {}, metric: {}, start: {}, end: {}", conn.address(), metric, start, end); String res = queryRange4SingleField(conn, metric, tags, start, end, hint); List dps = getDps4RDB(metric, res); if (dps == null || dps.isEmpty()) { return; } for (DataPoint4TSDB dp : dps) { final Record record = sender.createRecord(); final Map tagKV = dp.getTags(); for (String column : columns4RDB) { if (Constant.METRIC_SPECIFY_KEY.equals(column)) { record.addColumn(new StringColumn(dp.getMetric())); } else if (Constant.TS_SPECIFY_KEY.equals(column)) { record.addColumn(new LongColumn(dp.getTimestamp())); } else if (Constant.VALUE_SPECIFY_KEY.equals(column)) { record.addColumn(getColumn(dp.getValue())); } else { final Object tagk = tagKV.get(column); if (tagk == null) { continue; } record.addColumn(getColumn(tagk)); } } sender.sendToWriter(record); } } public static void dump4RDB(TSDBConnection conn, List metrics, Map tags, Long start, Long end, List columns4RDB, RecordSender sender, Map hint) throws Exception { LOG.info("conn address: {}, metric: {}, start: {}, end: {}", conn.address(), metrics, start, end); List dps = new LinkedList<>(); for (String metric : metrics) { String res = queryRange4SingleField(conn, metric, tags, start, end, hint); final List dpList = getDps4RDB(metric, res); if (dpList == null || dpList.isEmpty()) { continue; } dps.addAll(dpList); } if (dps.isEmpty()) { return; } Map> dpsCombinedByTs = new LinkedHashMap<>(); for (DataPoint4TSDB dp : dps) { final long ts = dp.getTimestamp(); final Map dpsWithSameTs = dpsCombinedByTs.computeIfAbsent(ts, k -> new LinkedHashMap<>()); dpsWithSameTs.put(dp.getMetric(), dp); } for (Map.Entry> entry : dpsCombinedByTs.entrySet()) { final Long ts = entry.getKey(); final Map metricAndDps = entry.getValue(); final Record record = sender.createRecord(); DataPoint4TSDB tmpDp = null; for (final String column : columns4RDB) { if (column.startsWith(Constant.METRIC_SPECIFY_KEY)) { final String m = column.substring(METRIC_SPECIFY_KEY_PREFIX_LENGTH); tmpDp = metricAndDps.get(m); if (tmpDp == null) { continue; } record.addColumn(getColumn(tmpDp.getValue())); } else if (Constant.TS_SPECIFY_KEY.equals(column)) { record.addColumn(new LongColumn(ts)); } else if (Constant.VALUE_SPECIFY_KEY.equals(column)) { // combine 模式下,不应该定义 __value__ 字段,因为 __metric__.xxx 字段会输出对应的 value 值 throw new RuntimeException("The " + Constant.VALUE_SPECIFY_KEY + " column should not be specified in combine mode!"); } else { // combine 模式下,应该确保 __metric__.xxx 字段的定义,放在 column 数组的最前面,以保证获取到 metric if (tmpDp == null) { throw new RuntimeException("These " + Constant.METRIC_SPECIFY_KEY_PREFIX + " column should be placed first in the column array in combine mode!"); } final Object tagv = tmpDp.getTags().get(column); if (tagv == null) { continue; } record.addColumn(getColumn(tagv)); } } sender.sendToWriter(record); } } static void dump4RDB(TSDBConnection conn, String metric, List fields, Map tags, Long start, Long end, List columns4RDB, RecordSender sender, Map hint) throws Exception { LOG.info("conn address: {}, metric: {}, start: {}, end: {}", conn.address(), metric, start, end); String res = queryRange4MultiFields(conn, metric, fields, tags, start, end, hint); List dps = getDps4RDB(metric, fields, res); if (dps == null || dps.isEmpty()) { return; } for (DataPoint4TSDB dp : dps) { final Record record = sender.createRecord(); final Map tagKV = dp.getTags(); for (String column : columns4RDB) { if (Constant.METRIC_SPECIFY_KEY.equals(column)) { record.addColumn(new StringColumn(dp.getMetric())); } else if (Constant.TS_SPECIFY_KEY.equals(column)) { record.addColumn(new LongColumn(dp.getTimestamp())); } else { final Object tagvOrField = tagKV.get(column); if (tagvOrField == null) { continue; } record.addColumn(getColumn(tagvOrField)); } } sender.sendToWriter(record); } } private static Column getColumn(Object value) throws Exception { Column valueColumn; if (value instanceof Double) { valueColumn = new DoubleColumn((Double) value); } else if (value instanceof Long) { valueColumn = new LongColumn((Long) value); } else if (value instanceof String) { valueColumn = new StringColumn((String) value); } else if (value instanceof Integer) { valueColumn = new LongColumn(((Integer)value).longValue()); } else { throw new Exception(String.format("value not supported type: [%s]", value.getClass().getSimpleName())); } return valueColumn; } private static String queryRange4SingleField(TSDBConnection conn, String metric, Map tags, Long start, Long end, Map hint) throws Exception { String tagKV = getFilterByTags(tags); String body = "{\n" + " \"start\": " + start + ",\n" + " \"end\": " + end + ",\n" + " \"queries\": [\n" + " {\n" + " \"aggregator\": \"none\",\n" + " \"metric\": \"" + metric + "\"\n" + (tagKV == null ? "" : tagKV) + (hint == null ? "" : (", \"hint\": " + JSON.toJSONString(hint))) + " }\n" + " ]\n" + "}"; return HttpUtils.post(conn.address() + QUERY, conn.username(), conn.password(), body); } private static String queryRange4MultiFields(TSDBConnection conn, String metric, List fields, Map tags, Long start, Long end, Map hint) throws Exception { // fields StringBuilder fieldBuilder = new StringBuilder(); fieldBuilder.append("\"fields\":["); for (int i = 0; i < fields.size(); i++) { fieldBuilder.append("{\"field\": \"").append(fields.get(i)).append("\",\"aggregator\": \"none\"}"); if (i != fields.size() - 1) { fieldBuilder.append(","); } } fieldBuilder.append("]"); // tagkv String tagKV = getFilterByTags(tags); String body = "{\n" + " \"start\": " + start + ",\n" + " \"end\": " + end + ",\n" + " \"queries\": [\n" + " {\n" + " \"aggregator\": \"none\",\n" + " \"metric\": \"" + metric + "\",\n" + fieldBuilder.toString() + (tagKV == null ? "" : tagKV) + (hint == null ? "" : (", \"hint\": " + JSON.toJSONString(hint))) + " }\n" + " ]\n" + "}"; return HttpUtils.post(conn.address() + QUERY_MULTI_FIELD, conn.username(), conn.password(), body); } private static String getFilterByTags(Map tags) { if (tags != null && !tags.isEmpty()) { // tagKV = ",\"tags:\":" + JSON.toJSONString(tags); StringBuilder tagBuilder = new StringBuilder(); tagBuilder.append(",\"filters\":["); int count = 1; final int size = tags.size(); for (Map.Entry entry : tags.entrySet()) { final String tagK = entry.getKey(); final String tagV = entry.getValue(); tagBuilder.append("{\"type\":\"literal_or\",\"tagk\":\"").append(tagK) .append("\",\"filter\":\"").append(tagV).append("\",\"groupBy\":false}"); if (count != size) { tagBuilder.append(","); } count++; } tagBuilder.append("]"); return tagBuilder.toString(); } return null; } private static List getDps4TSDB(String metric, String dps) { final List jsonArray = JSON.parseArray(dps, QueryResult.class); if (jsonArray.size() == 0) { return null; } List dpsArr = new LinkedList<>(); for (QueryResult queryResult : jsonArray) { final Map tags = queryResult.getTags(); final Map points = queryResult.getDps(); for (Map.Entry entry : points.entrySet()) { final String ts = entry.getKey(); final Object value = entry.getValue(); DataPoint4TSDB dp = new DataPoint4TSDB(); dp.setMetric(metric); dp.setTags(tags); dp.setTimestamp(Long.parseLong(ts)); dp.setValue(value); dpsArr.add(dp.toString()); } } return dpsArr; } private static List getDps4TSDB(String metric, List fields, String dps) { final List jsonArray = JSON.parseArray(dps, MultiFieldQueryResult.class); if (jsonArray.size() == 0) { return null; } List dpsArr = new LinkedList<>(); for (MultiFieldQueryResult queryResult : jsonArray) { final Map tags = queryResult.getTags(); final List> values = queryResult.getValues(); for (List value : values) { final String ts = value.get(0).toString(); Map fieldsAndValues = new HashMap<>(); for (int i = 0; i < fields.size(); i++) { fieldsAndValues.put(fields.get(i), value.get(i + 1)); } final DataPoint4MultiFieldsTSDB dp = new DataPoint4MultiFieldsTSDB(); dp.setMetric(metric); dp.setTimestamp(Long.parseLong(ts)); dp.setTags(tags); dp.setFields(fieldsAndValues); dpsArr.add(dp.toString()); } } return dpsArr; } private static List getDps4RDB(String metric, String dps) { final List jsonArray = JSON.parseArray(dps, QueryResult.class); if (jsonArray.size() == 0) { return null; } List dpsArr = new LinkedList<>(); for (QueryResult queryResult : jsonArray) { final Map tags = queryResult.getTags(); final Map points = queryResult.getDps(); for (Map.Entry entry : points.entrySet()) { final String ts = entry.getKey(); final Object value = entry.getValue(); final DataPoint4TSDB dp = new DataPoint4TSDB(); dp.setMetric(metric); dp.setTags(tags); dp.setTimestamp(Long.parseLong(ts)); dp.setValue(value); dpsArr.add(dp); } } return dpsArr; } private static List getDps4RDB(String metric, List fields, String dps) { final List jsonArray = JSON.parseArray(dps, MultiFieldQueryResult.class); if (jsonArray.size() == 0) { return null; } List dpsArr = new LinkedList<>(); for (MultiFieldQueryResult queryResult : jsonArray) { final Map tags = queryResult.getTags(); final List> values = queryResult.getValues(); for (List value : values) { final String ts = value.get(0).toString(); Map tagsTmp = new HashMap<>(tags); for (int i = 0; i < fields.size(); i++) { tagsTmp.put(fields.get(i), value.get(i + 1)); } final DataPoint4TSDB dp = new DataPoint4TSDB(); dp.setMetric(metric); dp.setTimestamp(Long.parseLong(ts)); dp.setTags(tagsTmp); dpsArr.add(dp); } } return dpsArr; } private static void sendTSDBDps(RecordSender sender, List dps) { for (String dp : dps) { StringColumn tsdbColumn = new StringColumn(dp); Record record = sender.createRecord(); record.addColumn(tsdbColumn); sender.sendToWriter(record); } } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/util/HttpUtils.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.util; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.apache.http.client.fluent.Content; import org.apache.http.client.fluent.Request; import org.apache.http.entity.ContentType; import java.nio.charset.StandardCharsets; import java.util.Base64; import java.util.Map; import java.util.concurrent.TimeUnit; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:HttpUtils * * @author Benedict Jin * @since 2019-10-21 */ public final class HttpUtils { public final static int CONNECT_TIMEOUT_DEFAULT_IN_MILL = (int) TimeUnit.SECONDS.toMillis(60); public final static int SOCKET_TIMEOUT_DEFAULT_IN_MILL = (int) TimeUnit.SECONDS.toMillis(60); private static final String CREDENTIALS_FORMAT = "%s:%s"; private static final String BASIC_AUTHENTICATION_FORMAT = "Basic %s"; private HttpUtils() { } public static String get(String url, String username, String password) throws Exception { final Request request = Request.Get(url) .connectTimeout(CONNECT_TIMEOUT_DEFAULT_IN_MILL) .socketTimeout(SOCKET_TIMEOUT_DEFAULT_IN_MILL); addAuth(request, username, password); Content content = request .execute() .returnContent(); if (content == null) { return null; } return content.asString(StandardCharsets.UTF_8); } public static String post(String url, String username, String password, Map params) throws Exception { return post(url, username, password, JSON.toJSONString(params), CONNECT_TIMEOUT_DEFAULT_IN_MILL, SOCKET_TIMEOUT_DEFAULT_IN_MILL); } public static String post(String url, String username, String password, String params) throws Exception { return post(url, username, password, params, CONNECT_TIMEOUT_DEFAULT_IN_MILL, SOCKET_TIMEOUT_DEFAULT_IN_MILL); } public static String post(String url, String username, String password, String params, int connectTimeoutInMill, int socketTimeoutInMill) throws Exception { Request request = Request.Post(url) .connectTimeout(connectTimeoutInMill) .socketTimeout(socketTimeoutInMill); addAuth(request, username, password); Content content = request .addHeader("Content-Type", "application/json") .bodyString(params, ContentType.APPLICATION_JSON) .execute() .returnContent(); if (content == null) { return null; } return content.asString(StandardCharsets.UTF_8); } private static void addAuth(Request request, String username, String password) { String authorization = generateHttpAuthorization(username, password); if (authorization != null) { request.setHeader("Authorization", authorization); } } private static String generateHttpAuthorization(String username, String password) { if (StringUtils.isBlank(username) || StringUtils.isBlank(password)) { return null; } String credentials = String.format(CREDENTIALS_FORMAT, username, password); credentials = Base64.getEncoder().encodeToString(credentials.getBytes()); return String.format(BASIC_AUTHENTICATION_FORMAT, credentials); } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/util/TSDBUtils.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.util; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Utils * * @author Benedict Jin * @since 2019-10-21 */ public final class TSDBUtils { private TSDBUtils() { } public static String version(String address, String username, String password) { String url = String.format("%s/api/version", address); String rsp; try { rsp = HttpUtils.get(url, username, password); } catch (Exception e) { throw new RuntimeException(e); } return rsp; } public static String config(String address, String username, String password) { String url = String.format("%s/api/config", address); String rsp; try { rsp = HttpUtils.get(url, username, password); } catch (Exception e) { throw new RuntimeException(e); } return rsp; } } ================================================ FILE: tsdbreader/src/main/java/com/alibaba/datax/plugin/reader/tsdbreader/util/TimeUtils.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.util; import java.util.concurrent.TimeUnit; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TimeUtils * * @author Benedict Jin * @since 2019-10-21 */ public final class TimeUtils { private TimeUtils() { } private static final long SECOND_MASK = 0xFFFFFFFF00000000L; private static final long HOUR_IN_MILL = TimeUnit.HOURS.toMillis(1); /** * Weather the timestamp is second. * * @param ts timestamp */ public static boolean isSecond(long ts) { return (ts & SECOND_MASK) == 0; } /** * Get the hour. * * @param ms time in millisecond */ public static long getTimeInHour(long ms) { return ms - ms % HOUR_IN_MILL; } } ================================================ FILE: tsdbreader/src/main/resources/plugin.json ================================================ { "name": "tsdbreader", "class": "com.alibaba.datax.plugin.reader.tsdbreader.TSDBReader", "description": { "useScene": "从 TSDB 中摄取数据点", "mechanism": "通过 /api/query 接口查询出符合条件的数据点", "warn": "指定起止时间会自动忽略分钟和秒,转为整点时刻,例如 2019-4-18 的 [3:35, 4:55) 会被转为 [3:00, 4:00)" }, "developer": "alibaba" } ================================================ FILE: tsdbreader/src/main/resources/plugin_job_template.json ================================================ { "name": "tsdbreader", "parameter": { "sinkDbType": "RDB", "endpoint": "http://localhost:8242", "column": [ "__metric__", "__ts__", "app", "cluster", "group", "ip", "zone", "__value__" ], "metric": [ "m" ], "tag": { "m": { "app": "a1", "cluster": "c1" } }, "splitIntervalMs": 60000, "beginDateTime": "2019-01-01 00:00:00", "endDateTime": "2019-01-01 01:00:00" } } ================================================ FILE: tsdbreader/src/test/java/com/alibaba/datax/plugin/reader/tsdbreader/conn/TSDBConnectionTest.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.conn; import org.junit.Assert; import org.junit.Ignore; import org.junit.Test; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Connection4TSDB Test * * @author Benedict Jin * @since 2019-10-21 */ @Ignore public class TSDBConnectionTest { private static final String TSDB_ADDRESS = "http://localhost:8242"; @Test public void testVersion() { String version = new TSDBConnection(TSDB_ADDRESS,null,null).version(); Assert.assertNotNull(version); } @Test public void testIsSupported() { Assert.assertTrue(new TSDBConnection(TSDB_ADDRESS,null,null).isSupported()); } } ================================================ FILE: tsdbreader/src/test/java/com/alibaba/datax/plugin/reader/tsdbreader/util/Const.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.util; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Const * * @author Benedict Jin * @since 2019-10-21 */ final class Const { private Const() { } static final String TSDB_ADDRESS = "http://localhost:8242"; } ================================================ FILE: tsdbreader/src/test/java/com/alibaba/datax/plugin/reader/tsdbreader/util/TimeUtilsTest.java ================================================ package com.alibaba.datax.plugin.reader.tsdbreader.util; import org.junit.Assert; import org.junit.Test; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:com.alibaba.datax.common.util * * @author Benedict Jin * @since 2019-10-21 */ public class TimeUtilsTest { @Test public void testIsSecond() { Assert.assertFalse(TimeUtils.isSecond(System.currentTimeMillis())); Assert.assertTrue(TimeUtils.isSecond(System.currentTimeMillis() / 1000)); } @Test public void testGetTimeInHour() throws ParseException { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); Date date = sdf.parse("2019-04-18 15:32:33"); long timeInHour = TimeUtils.getTimeInHour(date.getTime()); Assert.assertEquals("2019-04-18 15:00:00", sdf.format(timeInHour)); } } ================================================ FILE: tsdbwriter/doc/tsdbhttpwriter.md ================================================ # TSDBWriter 插件文档 ___ ## 1 快速介绍 TSDBWriter 插件实现了将数据点写入到阿里巴巴云原生多模数据库Lindorm TSDB数据库中(后续简称 TSDB)。 时间序列数据库(Time Series Database , 简称 TSDB)是一种高性能,低成本,稳定可靠的在线时序数据库服务;提供高效读写,高压缩比存储、时序数据插值及聚合计算,广泛应用于物联网(IoT)设备监控系统 ,企业能源管理系统(EMS),生产安全监控系统,电力检测系统等行业场景。 TSDB 提供千万级时序数据秒级写入,高压缩比低成本存储、预降采样、插值、多维聚合计算,查询结果可视化功能;解决由于设备采集点数量巨大,数据采集频率高,造成的存储成本高,写入和查询分析效率低的问题。更多关于 TSDB 的介绍,详见[阿里云 Lindorm TSDB 官网](https://help.aliyun.com/document_detail/174600.html)。 注意:阿里巴巴自研HiTSDB已全新升级为云原生多模数据库Lindorm TSDB。Lindorm TSDB兼容大部分HiTSDB的HTTP API并提供原生SQL能力,TSDBWriter插件使用HTTP API方式写入,要使用原生SQL能力需要提前在Lindorm TSDB进行建表。详细参见[与旧版TSDB的比较](https://help.aliyun.com/document_detail/387477.html) ## 2 实现原理 通过TSDB客户端 hitsdb-client 连接 TSDB 实例,并将数据点通过HTTP API方式写入。关于写入接口详见TSDB 的[SDK 参考](https://help.aliyun.com/document_detail/61587.html)。 ## 3 功能说明 ### 3.1 配置样例 * 配置TSDB Writer: ```json { "name": "tsdbwriter", "parameter": { "endpoint": "http://localhost:8242", "sourceDbType": "RDB", "batchSize": 256, "columnType": [ "tag", "tag", "field_string", "field_double", "timestamp", "field_bool" ], "column": [ "tag1", "tag2", "field1", "field2", "timestamp", "field3" ], "multiField":"true", "table":"testmetric", "username":"xxx", "password":"xxx", "ignoreWriteError":"false", "database":"default" } } ``` * 配置一个从 支持 OpenTSDB 协议的数据库同步抽取数据到 TSDB: ```json { "job": { "content": [ { "reader": { "name": "opentsdbreader", "parameter": { "endpoint": "http://localhost:4242", "column": [ "m" ], "startTime": "2019-01-01 00:00:00", "endTime": "2019-01-01 03:00:00" } }, "writer": { "name": "tsdbwriter", "parameter": { "endpoint": "http://localhost:8242" } } } ], "setting": { "speed": { "channel": 1 } } } } ``` * 使用 OpenTSDB (单值)协议写入TSDB(不推荐): ```json { "name": "tsdbwriter", "endpoint": "http://localhost:8242", "sourceDbType": "RDB", "parameter": { "batchSize": 256, "columnType": [ "tag", "tag", "field_string", "field_double", "timestamp", "field_boolean" ], "column": [ "tag1", "tag2", "field_metric_1", "field_metric_2", "timestamp", "field_metric_3" ], "username":"tsdb", "password":"enxU^", "ignoreWriteError":"false" } } ``` 转换到的TSDB 表名(metric)由column中field对应的列名决定:对于上诉配置一行关系型数据将会写入三个metric(field_metric_1,field_metric_2,field_metric_3) ### 3.2 参数说明 * **name** * 描述:本插件的名称 * 必选:是 * 默认值:tsdbhttpwriter * **parameter** * **endpoint** * 描述:TSDB 的 HTTP 连接地址 * 必选:是 * 格式:http://IP:Port * 默认值:无 * **sourceDbType** * 描述:源端数据类型 * 必选:否 * 格式:string [RDB或者TSDB] * 默认值:TSDB * **multiField** * 描述:使用HTTP API多值(多个field)方式写入,目前TSDB版本使用多值写入,需要指定为true * 必选:是 * 格式:bool * 默认值:false (单值) * 说明: 如果使用Lindorm TSDB原生SQL能力访问HTTP API方式写入的数据,需要在TSDB进行预建表,否则只能使用HiTSDB HTTP API方式[查询数据](https://help.aliyun.com/document_detail/107576.html)。 * **column** * 描述:关系型数据库中表的字段名 * 必选:当sourceDbType为RDB时必选 * 格式:string * 默认值:无 * 说明: 此处的字段顺序,需要和Reader插件中配置的column字段的顺序保持一致。 * **columnType** * 描述:关系型数据库中表字段,映射到TSDB中的类型。支持的类型如下所示: * timestamp:该字段为时间戳 * tag:该字段为tag * field_string: 该Field的value是字符串类型 * field_double: 该Field的value是数值类型 * field_boolean: 该Field的value是布尔类型 * 必选:当sourceDbType为RDB时必选 * 格式:string [RDB或者TSDB] * 默认值: 无 * 说明: 此处的字段顺序,需要和column配置中的字段顺序保持一致 * **table** * 描述:TSDB对应表名(metric) * 必选:当sourceDbType为RDB时且multiField为true时必选 * 格式:string * 默认值:无 * 说明: 要导入的TSDB表名,如果multiField为falase,不需要填写,对应的metric需要写到column字段 * **batchSize** * 描述:每次批量数据的条数 * 必选:否 * 格式:int,需要保证大于 0 * 默认值:100 * **ignoreWriteError** * 描述:如果设置为 true,则忽略写入错误,继续写入;否则,多次重试后仍写入失败的话,则会终止写入任务 * 必选:否 * 格式:bool * 默认值:false * **username** * 描述:数据库用户名 * 必选:否 * 格式:string * 默认值:无 * 说明: TSDB配置了鉴权需要填写 * **password** * 描述:数据库密码 * 必选:否 * 格式:string * 默认值:无 * 说明: TSDB配置了鉴权需要填写 * **database** * 描述:导入的数据库 * 必选:否 * 格式:string * 默认值:default * 说明: TSDB需要提前创建数据库 * **maxRetryTime** * 描述:失败后重试的次数 * 必选:否 * 格式:int,需要保证大于 1 * 默认值:3 ### 3.3 类型转换 | DataX 内部类型 | TSDB 数据类型 | | -------------- | ------------------------------------------------------------ | | String | TSDB 数据点序列化字符串,包括 timestamp、metric、tags 和 value | ## 4 性能报告 ### 4.1 环境准备 #### 4.1.1 数据特征 从 Metric、时间线、Value 和 采集周期 四个方面来描述: ##### metric 固定指定一个 metric 为 `m`。 ##### tagkv 前四个 tagkv 全排列,形成 `10 * 20 * 100 * 100 = 2000000` 条时间线,最后 IP 对应 2000000 条时间线从 1 开始自增。 | **tag_k** | **tag_v** | | --------- | ------------- | | zone | z1~z10 | | cluster | c1~c20 | | group | g1~100 | | app | a1~a100 | | ip | ip1~ip2000000 | ##### value 度量值为 [1, 100] 区间内的随机值 ##### interval 采集周期为 10 秒,持续摄入 3 小时,总数据量为 `3 * 60 * 60 / 10 * 2000000 = 2,160,000,000` 个数据点。 #### 4.1.2 机器参数 TSDB Writer 机型: 64C256G HBase 机型: 8C16G * 5 #### 4.1.3 DataX jvm 参数 "-Xms4096m -Xmx4096m" ### 4.2 测试报告 | 通道数 | DataX 速度 (Rec/s) | DataX 流量 (MB/s) | | ------ | ------------------ | ----------------- | | 1 | 129753 | 15.45 | | 2 | 284953 | 33.70 | | 3 | 385868 | 45.71 | ## 5 约束限制 ### 5.1 目前支持Lindorm TSDB全部版本 以及 HiTSDB 2.4.x 及以上版本 其他版本暂不保证兼容 ## 6 FAQ ================================================ FILE: tsdbwriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT tsdbwriter tsdbwriter jar UTF-8 3.3.2 4.5 2.4 4.13.1 com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j commons-math3 org.apache.commons org.slf4j slf4j-api ch.qos.logback logback-classic org.apache.commons commons-lang3 ${commons-lang3.version} org.apache.httpcomponents httpclient ${httpclient.version} commons-io commons-io ${commons-io.version} org.apache.httpcomponents fluent-hc ${httpclient.version} com.alibaba.fastjson2 fastjson2 com.aliyun hitsdb-client 0.3.7 junit junit ${junit4.version} test maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: tsdbwriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/tsdbwriter target/ tsdbwriter-0.0.1-SNAPSHOT.jar plugin/writer/tsdbwriter false plugin/writer/tsdbwriter/libs runtime ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/Connection4TSDB.java ================================================ package com.alibaba.datax.plugin.writer.conn; import com.alibaba.datax.common.plugin.RecordSender; import java.util.List; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Connection for TSDB-like databases * * @author Benedict Jin * @since 2019-03-29 */ public interface Connection4TSDB { /** * Get the address of Database. * * @return host+ip */ String address(); /** * Get the setted database name. * * @return database */ String database(); /** * Get the username of Database. * * @return username */ String username(); /** * Get the password of Database. * * @return password */ String password(); /** * Get the version of Database. * * @return version */ String version(); /** * Get these configurations. * * @return configs */ String config(); /** * Get the list of supported version. * * @return version list */ String[] getSupportVersionPrefix(); /** * Send data points by metric & start time & end time. * * @param metric metric * @param start startTime * @param end endTime * @param recordSender sender */ void sendDPs(String metric, Long start, Long end, RecordSender recordSender) throws Exception; /** * Put data point. * * @param dp data point * @return whether the data point is written successfully */ boolean put(DataPoint4TSDB dp); /** * Put data points. * * @param dps data points * @return whether the data point is written successfully */ boolean put(List dps); /** * Put data points with single field. * * @param dps data points * @return whether the data point is written successfully */ boolean put(String dps); /** * Put data points with multi fields. * * @param dps data points * @return whether the data point is written successfully */ boolean mput(String dps); /** * Whether current version is supported. * * @return true: supported; false: not yet! */ boolean isSupported(); } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/DataPoint4TSDB.java ================================================ package com.alibaba.datax.plugin.writer.conn; import com.alibaba.fastjson2.JSON; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:DataPoint for TSDB * * @author Benedict Jin * @since 2019-04-10 */ public class DataPoint4TSDB { private long timestamp; private String metric; private Map tags; private Object value; public DataPoint4TSDB() { } public DataPoint4TSDB(long timestamp, String metric, Map tags, Object value) { this.timestamp = timestamp; this.metric = metric; this.tags = tags; this.value = value; } public long getTimestamp() { return timestamp; } public void setTimestamp(long timestamp) { this.timestamp = timestamp; } public String getMetric() { return metric; } public void setMetric(String metric) { this.metric = metric; } public Map getTags() { return tags; } public void setTags(Map tags) { this.tags = tags; } public Object getValue() { return value; } public void setValue(Object value) { this.value = value; } @Override public String toString() { return JSON.toJSONString(this); } } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/conn/TSDBConnection.java ================================================ package com.alibaba.datax.plugin.writer.conn; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.plugin.writer.util.TSDBUtils; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import java.util.List; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Connection * * @author Benedict Jin * @since 2019-03-29 */ public class TSDBConnection implements Connection4TSDB { private String address; private String username; private String password; private String database; public TSDBConnection(String address, String database, String username, String password) { if (StringUtils.isBlank(address)) { throw new RuntimeException("TSDBConnection init failed because address is blank!"); } this.address = address; this.database = database; this.username = username; this.password = password; } @Override public String address() { return address; } @Override public String username() { return username; } @Override public String database() { return database; } @Override public String password() { return password; } @Override public String version() { return TSDBUtils.version(address, username, password); } @Override public String config() { return TSDBUtils.config(address, username, password); } @Override public String[] getSupportVersionPrefix() { return new String[]{"2.4.1", "2.4.2"}; } @Override public void sendDPs(String metric, Long start, Long end, RecordSender recordSender) { throw new RuntimeException("Not support yet!"); } @Override public boolean put(DataPoint4TSDB dp) { return TSDBUtils.put(address, database, username, password, dp); } @Override public boolean put(List dps) { return TSDBUtils.put(address, database, username, password, dps); } @Override public boolean put(String dps) { return TSDBUtils.put(address, database, username, password, dps); } @Override public boolean mput(String dps) { return TSDBUtils.mput(address, database, username, password, dps); } @Override public boolean isSupported() { String versionJson = version(); if (StringUtils.isBlank(versionJson)) { throw new RuntimeException("Cannot get the version!"); } String version = JSON.parseObject(versionJson).getString("version"); if (StringUtils.isBlank(version)) { return false; } for (String prefix : getSupportVersionPrefix()) { if (version.startsWith(prefix)) { return true; } } return false; } } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/Constant.java ================================================ package com.alibaba.datax.plugin.writer.tsdbwriter; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Key * * @author Benedict Jin * @since 2019-04-18 */ public final class Constant { static final int DEFAULT_BATCH_SIZE = 100; static final int DEFAULT_TRY_SIZE = 3; static final boolean DEFAULT_IGNORE_WRITE_ERROR = false; } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.tsdbwriter; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Key * * @author Benedict Jin * @since 2019-04-18 */ public class Key { static final String SOURCE_DB_TYPE = "sourceDbType"; static final String MULTI_FIELD = "multiField"; // common static final String ENDPOINT = "endpoint"; static final String USERNAME = "username"; static final String PASSWORD = "password"; static final String IGNORE_WRITE_ERROR = "ignoreWriteError"; static final String DATABASE = "database"; // for tsdb static final String BATCH_SIZE = "batchSize"; static final String MAX_RETRY_TIME = "maxRetryTime"; // for rdb static final String COLUMN = "column"; static final String COLUMN_TYPE = "columnType"; static final String TABLE = "table"; } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/SourceDBType.java ================================================ package com.alibaba.datax.plugin.writer.tsdbwriter; public enum SourceDBType { TSDB, RDB } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/TSDBConverter.java ================================================ package com.alibaba.datax.plugin.writer.tsdbwriter; import com.alibaba.datax.common.element.Column; import com.alibaba.datax.common.element.Record; import com.alibaba.fastjson2.JSON; import com.aliyun.hitsdb.client.value.request.MultiFieldPoint; import com.aliyun.hitsdb.client.value.request.Point; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; class TSDBConverter { private static final Logger LOG = LoggerFactory.getLogger(TSDBConverter.class); private List columnName; private List columnType; TSDBConverter(List columnName, List columnType) { this.columnName = columnName; this.columnType = columnType; LOG.info("columnName: {}, columnType: {}", JSON.toJSONString(columnName), JSON.toJSONString(columnType)); } List transRecord2Point(List records) { List dps = new ArrayList(); for (Record record : records) { List metricBuilders = new ArrayList(); Map tags = new HashMap(); Long time = 0L; for (int i = 0; i < columnType.size(); i++) { String type = columnType.get(i); String name = columnName.get(i); Column column = record.getColumn(i); if (TSDBModel.TSDB_TAG.equals(type)) { tags.put(name, column.asString()); } else if (TSDBModel.TSDB_FIELD_DOUBLE.equals(type)) { metricBuilders.add(new Point.MetricBuilder(name).value(column.asDouble())); } else if (TSDBModel.TSDB_FIELD_STRING.equals(type)) { metricBuilders.add(new Point.MetricBuilder(name).value(column.asString())); } else if (TSDBModel.TSDB_FIELD_BOOL.equals(type)) { metricBuilders.add(new Point.MetricBuilder(name).value(column.asBoolean())); } else if (TSDBModel.TSDB_TIMESTAMP.equals(type)) { time = column.asLong(); } else if (TSDBModel.TSDB_METRIC_NUM.equals(type)) { // compatible with previous usage of TSDB_METRIC_NUM metricBuilders.add(new Point.MetricBuilder(name).value(column.asDouble())); } else if (TSDBModel.TSDB_METRIC_STRING.equals(type)) { // compatible with previous usage of TSDB_METRIC_STRING metricBuilders.add(new Point.MetricBuilder(name).value(column.asString())); } } for (Point.MetricBuilder metricBuilder : metricBuilders) { dps.add(metricBuilder.tag(tags).timestamp(time).build(false)); } } return dps; } List transRecord2MultiFieldPoint(List records, String tableName) { List dps = new ArrayList(); for (Record record : records) { MultiFieldPoint.MetricBuilder builder = MultiFieldPoint.metric(tableName); for (int i = 0; i < columnType.size(); i++) { String type = columnType.get(i); String name = columnName.get(i); Column column = record.getColumn(i); if (TSDBModel.TSDB_TAG.equals(type)) { builder.tag(name, column.asString()); } else if (TSDBModel.TSDB_FIELD_DOUBLE.equals(type)) { builder.field(name, column.asDouble()); } else if (TSDBModel.TSDB_FIELD_STRING.equals(type)) { builder.field(name, column.asString()); } else if (TSDBModel.TSDB_FIELD_BOOL.equals(type)) { builder.field(name, column.asBoolean()); } else if (TSDBModel.TSDB_TIMESTAMP.equals(type)) { builder.timestamp(column.asLong()); } else if (TSDBModel.TSDB_METRIC_NUM.equals(type)) { // compatible with previous usage of TSDB_METRIC_NUM builder.field(name, column.asDouble()); } else if (TSDBModel.TSDB_METRIC_STRING.equals(type)) { // compatible with previous usage of TSDB_METRIC_STRING builder.field(name, column.asString()); } } MultiFieldPoint point = builder.build(false); dps.add(point); } return dps; } } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/TSDBModel.java ================================================ package com.alibaba.datax.plugin.writer.tsdbwriter; class TSDBModel { static final String TSDB_METRIC_NUM = "metric_num"; static final String TSDB_METRIC_STRING = "metric_string"; static final String TSDB_TAG = "tag"; static final String TSDB_TIMESTAMP = "timestamp"; static final String TSDB_FIELD_DOUBLE = "field_double"; static final String TSDB_FIELD_STRING = "field_string"; static final String TSDB_FIELD_BOOL = "field_bool"; } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/TSDBWriter.java ================================================ package com.alibaba.datax.plugin.writer.tsdbwriter; import com.alibaba.datax.common.element.Record; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.common.util.ConfigurationUtil; import com.alibaba.datax.common.util.RetryUtil; import com.alibaba.datax.plugin.writer.conn.TSDBConnection; import com.aliyun.hitsdb.client.TSDB; import com.aliyun.hitsdb.client.TSDBClientFactory; import com.aliyun.hitsdb.client.TSDBConfig; import com.aliyun.hitsdb.client.value.request.MultiFieldPoint; import com.aliyun.hitsdb.client.value.request.Point; import com.aliyun.hitsdb.client.value.response.batch.IgnoreErrorsResult; import com.aliyun.hitsdb.client.value.response.batch.MultiFieldIgnoreErrorsResult; import com.aliyun.hitsdb.client.value.response.batch.SummaryResult; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; import java.util.*; import java.util.concurrent.Callable; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Http Writer * * @author Benedict Jin * @since 2019-04-18 */ @SuppressWarnings("unused") public class TSDBWriter extends Writer { private static SourceDBType DB_TYPE; private static TSDB tsdb = null; public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originalConfig; @Override public void init() { originalConfig = super.getPluginJobConf(); // check source db type String sourceDbType = originalConfig.getString(Key.SOURCE_DB_TYPE); if (StringUtils.isBlank(sourceDbType)) { sourceDbType = SourceDBType.TSDB.name(); originalConfig.set(Key.SOURCE_DB_TYPE, sourceDbType); LOG.info("The parameter [" + Key.SOURCE_DB_TYPE + "] will be default value: " + SourceDBType.TSDB); } try { DB_TYPE = SourceDBType.valueOf(sourceDbType); } catch (Exception e) { throw DataXException.asDataXException(TSDBWriterErrorCode.REQUIRED_VALUE, "The parameter [" + Key.SOURCE_DB_TYPE + "] is invalid, which should be one of [" + Arrays.toString(SourceDBType.values()) + "]."); } // for tsdb if (DB_TYPE == SourceDBType.TSDB) { String address = originalConfig.getString(Key.ENDPOINT); if (StringUtils.isBlank(address)) { throw DataXException.asDataXException(TSDBWriterErrorCode.REQUIRED_VALUE, "The parameter [" + Key.ENDPOINT + "] is not set."); } String username = originalConfig.getString(Key.USERNAME, null); if (StringUtils.isBlank(username)) { LOG.warn("The parameter [" + Key.USERNAME + "] is blank."); } String password = originalConfig.getString(Key.PASSWORD, null); if (StringUtils.isBlank(password)) { LOG.warn("The parameter [" + Key.PASSWORD + "] is blank."); } Integer batchSize = originalConfig.getInt(Key.BATCH_SIZE); if (batchSize == null || batchSize < 1) { originalConfig.set(Key.BATCH_SIZE, Constant.DEFAULT_BATCH_SIZE); LOG.info("The parameter [" + Key.BATCH_SIZE + "] will be default value: " + Constant.DEFAULT_BATCH_SIZE); } Integer retrySize = originalConfig.getInt(Key.MAX_RETRY_TIME); if (retrySize == null || retrySize < 0) { originalConfig.set(Key.MAX_RETRY_TIME, Constant.DEFAULT_TRY_SIZE); LOG.info("The parameter [" + Key.MAX_RETRY_TIME + "] will be default value: " + Constant.DEFAULT_TRY_SIZE); } Boolean ignoreWriteError = originalConfig.getBool(Key.IGNORE_WRITE_ERROR); if (ignoreWriteError == null) { originalConfig.set(Key.IGNORE_WRITE_ERROR, Constant.DEFAULT_IGNORE_WRITE_ERROR); LOG.info("The parameter [" + Key.IGNORE_WRITE_ERROR + "] will be default value: " + Constant.DEFAULT_IGNORE_WRITE_ERROR); } } else if (DB_TYPE == SourceDBType.RDB) { // for rdb originalConfig.getNecessaryValue(Key.ENDPOINT, TSDBWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.COLUMN_TYPE, TSDBWriterErrorCode.REQUIRED_VALUE); originalConfig.getNecessaryValue(Key.COLUMN, TSDBWriterErrorCode.REQUIRED_VALUE); String endpoint = originalConfig.getString(Key.ENDPOINT); String[] split = endpoint.split(":"); if (split.length != 3) { throw DataXException.asDataXException(TSDBWriterErrorCode.REQUIRED_VALUE, "The parameter [" + Key.ENDPOINT + "] is invalid, which should be [http://IP:Port]."); } String ip = split[1].substring(2); int port = Integer.parseInt(split[2]); String username = originalConfig.getString(Key.USERNAME, null); if (StringUtils.isBlank(username)) { LOG.warn("The parameter [" + Key.USERNAME + "] is blank."); } String password = originalConfig.getString(Key.PASSWORD, null); if (StringUtils.isBlank(password)) { LOG.warn("The parameter [" + Key.PASSWORD + "] is blank."); } if (!StringUtils.isBlank(password) && !StringUtils.isBlank(username)) { tsdb = TSDBClientFactory.connect(TSDBConfig.address(ip, port).basicAuth(username, password).config()); } else { tsdb = TSDBClientFactory.connect(TSDBConfig.address(ip, port).config()); } String database = originalConfig.getString(Key.DATABASE, null); if (StringUtils.isBlank(database)) { LOG.info("The parameter [" + Key.DATABASE + "] is blank."); } else { LOG.warn("The parameter [" + Key.DATABASE + "] : {} is ignored."); // tsdb.useDatabase(database); } LOG.info("Tsdb config: {}", ConfigurationUtil.filterSensitive(originalConfig).toJSON()); } } @Override public void prepare() { } @Override public List split(int mandatoryNumber) { ArrayList configurations = new ArrayList(mandatoryNumber); for (int i = 0; i < mandatoryNumber; i++) { configurations.add(originalConfig.clone()); } return configurations; } @Override public void post() { } @Override public void destroy() { if (DB_TYPE == SourceDBType.RDB) { if (tsdb != null) { try { tsdb.close(); } catch (IOException ignored) { } } } } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private TSDBConnection conn; private boolean multiField; private int batchSize; private int retrySize; private boolean ignoreWriteError; private String tableName; private TSDBConverter tsdbConverter; @Override public void init() { Configuration writerSliceConfig = getPluginJobConf(); // single field | multi fields this.multiField = writerSliceConfig.getBool(Key.MULTI_FIELD, false); this.ignoreWriteError = writerSliceConfig.getBool(Key.IGNORE_WRITE_ERROR); // for tsdb if (DB_TYPE == SourceDBType.TSDB) { String address = writerSliceConfig.getString(Key.ENDPOINT); String database = writerSliceConfig.getString(Key.DATABASE); String username = writerSliceConfig.getString(Key.USERNAME); String password = writerSliceConfig.getString(Key.PASSWORD); this.conn = new TSDBConnection(address, database, username, password); this.batchSize = writerSliceConfig.getInt(Key.BATCH_SIZE); this.retrySize = writerSliceConfig.getInt(Key.MAX_RETRY_TIME); } else if (DB_TYPE == SourceDBType.RDB) { // for rdb int timeSize = 0; int fieldSize = 0; int tagSize = 0; batchSize = writerSliceConfig.getInt(Key.BATCH_SIZE, 100); List columnName = writerSliceConfig.getList(Key.COLUMN, String.class); List columnType = writerSliceConfig.getList(Key.COLUMN_TYPE, String.class); Set typeSet = new HashSet(columnType); if (columnName.size() != columnType.size()) { throw DataXException.asDataXException(TSDBWriterErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.COLUMN_TYPE + "] should has same length with [" + Key.COLUMN + "]."); } for (String type : columnType) { if (TSDBModel.TSDB_TAG.equals(type)) { tagSize ++; } else if (TSDBModel.TSDB_FIELD_DOUBLE.equals(type) || TSDBModel.TSDB_FIELD_STRING.equals(type) || TSDBModel.TSDB_FIELD_BOOL.equals(type)) { fieldSize++; } else if (TSDBModel.TSDB_TIMESTAMP.equals(type)) { timeSize++; } } if (fieldSize == 0) { // compatible with previous usage of TSDB_METRIC_NUM and TSDB_METRIC_STRING if (!typeSet.contains(TSDBModel.TSDB_METRIC_NUM) && !typeSet.contains(TSDBModel.TSDB_METRIC_STRING)) { throw DataXException.asDataXException(TSDBWriterErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.COLUMN_TYPE + "] is invalid, must set at least one of " + TSDBModel.TSDB_FIELD_DOUBLE + ", " + TSDBModel.TSDB_FIELD_STRING + " or " + TSDBModel.TSDB_FIELD_BOOL + "."); } } if (tagSize == 0) { throw DataXException.asDataXException(TSDBWriterErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.COLUMN_TYPE + "] is invalid, must set " + TSDBModel.TSDB_TAG + ". "); } if (timeSize != 1) { throw DataXException.asDataXException(TSDBWriterErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.COLUMN_TYPE + "] is invalid, must set one and only one " + TSDBModel.TSDB_TIMESTAMP + "."); } if (multiField) { // check source db type tableName = writerSliceConfig.getString(Key.TABLE); if (StringUtils.isBlank(tableName)) { throw DataXException.asDataXException(TSDBWriterErrorCode.ILLEGAL_VALUE, "The parameter [" + Key.TABLE + "] h must set when use multi field input."); } } tsdbConverter = new TSDBConverter(columnName, columnType); } } @Override public void prepare() { } @Override public void startWrite(RecordReceiver recordReceiver) { // for tsdb if (DB_TYPE == SourceDBType.TSDB) { try { Record lastRecord = null; Record record; int count = 0; StringBuilder dps = new StringBuilder(); while ((record = recordReceiver.getFromReader()) != null) { final int recordLength = record.getColumnNumber(); for (int i = 0; i < recordLength; i++) { dps.append(record.getColumn(i).asString()); dps.append(","); count++; if (count == batchSize) { count = 0; batchPut(record, "[" + dps.substring(0, dps.length() - 1) + "]"); dps = new StringBuilder(); } } lastRecord = record; } if (StringUtils.isNotBlank(dps.toString())) { batchPut(lastRecord, "[" + dps.substring(0, dps.length() - 1) + "]"); } } catch (Exception e) { throw DataXException.asDataXException(TSDBWriterErrorCode.RUNTIME_EXCEPTION, e); } } else if (DB_TYPE == SourceDBType.RDB) { // for rdb List writerBuffer = new ArrayList(this.batchSize); Record record; long total = 0; while ((record = recordReceiver.getFromReader()) != null) { writerBuffer.add(record); if (writerBuffer.size() >= this.batchSize) { total += doBatchInsert(writerBuffer); writerBuffer.clear(); } } if (!writerBuffer.isEmpty()) { total += doBatchInsert(writerBuffer); writerBuffer.clear(); } getTaskPluginCollector().collectMessage("write size", total + ""); LOG.info("Task finished, write size: {}", total); } } private void batchPut(final Record record, final String dps) { try { RetryUtil.executeWithRetry(new Callable() { @Override public Integer call() { final boolean success = multiField ? conn.mput(dps) : conn.put(dps); if (success) { return 0; } getTaskPluginCollector().collectDirtyRecord(record, "Put data points failed!"); throw DataXException.asDataXException(TSDBWriterErrorCode.RUNTIME_EXCEPTION, "Put data points failed!"); } }, retrySize, 60000L, true); } catch (Exception e) { if (ignoreWriteError) { LOG.warn("Ignore write exceptions and continue writing."); } else { throw DataXException.asDataXException(TSDBWriterErrorCode.RETRY_WRITER_EXCEPTION, e); } } } private long doBatchInsert(final List writerBuffer) { int size; if (ignoreWriteError) { if (multiField) { List points = tsdbConverter.transRecord2MultiFieldPoint(writerBuffer, tableName); size = points.size(); MultiFieldIgnoreErrorsResult ignoreErrorsResult = tsdb.multiFieldPutSync(points, MultiFieldIgnoreErrorsResult.class); if (ignoreErrorsResult == null) { LOG.error("Unexpected inner error for insert"); } else if (ignoreErrorsResult.getFailed() > 0) { LOG.error("write TSDB failed num:" + ignoreErrorsResult.getFailed()); } } else { List points = tsdbConverter.transRecord2Point(writerBuffer); size = points.size(); IgnoreErrorsResult ignoreErrorsResult = tsdb.putSync(points, IgnoreErrorsResult.class); if (ignoreErrorsResult == null) { LOG.error("Unexpected inner error for insert"); } else if (ignoreErrorsResult.getFailed() > 0) { LOG.error("write TSDB failed num:" + ignoreErrorsResult.getFailed()); } } } else { SummaryResult summaryResult; if (multiField) { List points = tsdbConverter.transRecord2MultiFieldPoint(writerBuffer, tableName); size = points.size(); summaryResult = tsdb.multiFieldPutSync(points, SummaryResult.class); } else { List points = tsdbConverter.transRecord2Point(writerBuffer); size = points.size(); summaryResult = tsdb.putSync(points, SummaryResult.class); } if (summaryResult.getFailed() > 0) { LOG.error("write TSDB failed num:" + summaryResult.getFailed()); throw DataXException.asDataXException(TSDBWriterErrorCode.RUNTIME_EXCEPTION, "Write TSDB failed", new Exception()); } } return size; } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/tsdbwriter/TSDBWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.tsdbwriter; import com.alibaba.datax.common.spi.ErrorCode; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Http Writer Error Code * * @author Benedict Jin * @since 2019-04-18 */ public enum TSDBWriterErrorCode implements ErrorCode { REQUIRED_VALUE("TSDBWriter-00", "Missing the necessary value"), ILLEGAL_VALUE("TSDBWriter-01", "Illegal value"), RUNTIME_EXCEPTION("TSDBWriter-01", "Runtime exception"), RETRY_WRITER_EXCEPTION("TSDBWriter-02", "After repeated attempts, the write still fails"); private final String code; private final String description; TSDBWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s]. ", this.code, this.description); } } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/HttpUtils.java ================================================ package com.alibaba.datax.plugin.writer.util; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.apache.http.client.fluent.Content; import org.apache.http.client.fluent.Request; import org.apache.http.entity.ContentType; import java.nio.charset.Charset; import java.nio.charset.StandardCharsets; import java.util.Base64; import java.util.Map; import java.util.concurrent.TimeUnit; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:HttpUtils * * @author Benedict Jin * @since 2019-03-29 */ public final class HttpUtils { public final static int CONNECT_TIMEOUT_DEFAULT_IN_MILL = (int) TimeUnit.SECONDS.toMillis(60); public final static int SOCKET_TIMEOUT_DEFAULT_IN_MILL = (int) TimeUnit.SECONDS.toMillis(60); private static final String CREDENTIALS_FORMAT = "%s:%s"; private static final String BASIC_AUTHENTICATION_FORMAT = "Basic %s"; private HttpUtils() { } public static String get(String url, String username, String password) throws Exception { final Request request = Request.Get(url) .connectTimeout(CONNECT_TIMEOUT_DEFAULT_IN_MILL) .socketTimeout(SOCKET_TIMEOUT_DEFAULT_IN_MILL); addAuth(request, username, password); Content content = request .execute() .returnContent(); if (content == null) { return null; } return content.asString(StandardCharsets.UTF_8); } public static String post(String url, String username, String password, Map params) throws Exception { return post(url, username, password, JSON.toJSONString(params), CONNECT_TIMEOUT_DEFAULT_IN_MILL, SOCKET_TIMEOUT_DEFAULT_IN_MILL); } public static String post(String url, String username, String password, String params) throws Exception { return post(url, username, password, params, CONNECT_TIMEOUT_DEFAULT_IN_MILL, SOCKET_TIMEOUT_DEFAULT_IN_MILL); } public static String post(String url, String username, String password, String params, int connectTimeoutInMill, int socketTimeoutInMill) throws Exception { Request request = Request.Post(url) .connectTimeout(connectTimeoutInMill) .socketTimeout(socketTimeoutInMill); addAuth(request, username, password); Content content = request .addHeader("Content-Type", "application/json") .bodyString(params, ContentType.APPLICATION_JSON) .execute() .returnContent(); if (content == null) { return null; } return content.asString(StandardCharsets.UTF_8); } private static void addAuth(Request request, String username, String password) { String authorization = generateHttpAuthorization(username, password); if (authorization != null) { request.setHeader("Authorization", authorization); } } private static String generateHttpAuthorization(String username, String password) { if (StringUtils.isBlank(username) || StringUtils.isBlank(password)) { return null; } String credentials = String.format(CREDENTIALS_FORMAT, username, password); credentials = Base64.getEncoder().encodeToString(credentials.getBytes()); return String.format(BASIC_AUTHENTICATION_FORMAT, credentials); } } ================================================ FILE: tsdbwriter/src/main/java/com/alibaba/datax/plugin/writer/util/TSDBUtils.java ================================================ package com.alibaba.datax.plugin.writer.util; import com.alibaba.datax.plugin.writer.conn.DataPoint4TSDB; import com.alibaba.fastjson2.JSON; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Utils * * @author Benedict Jin * @since 2019-03-29 */ public final class TSDBUtils { private static final Logger LOG = LoggerFactory.getLogger(TSDBUtils.class); private TSDBUtils() { } public static String version(String address, String username, String password) { String url = String.format("%s/api/version", address); String rsp; try { rsp = HttpUtils.get(url, username, password); } catch (Exception e) { throw new RuntimeException(e); } return rsp; } public static String config(String address, String username, String password) { String url = String.format("%s/api/config", address); String rsp; try { rsp = HttpUtils.get(url, username, password); } catch (Exception e) { throw new RuntimeException(e); } return rsp; } public static boolean put(String address, String database, String username, String password, List dps) { return put(address, database, username, password, JSON.toJSON(dps)); } public static boolean put(String address, String database, String username, String password, DataPoint4TSDB dp) { return put(address, database, username, password, JSON.toJSON(dp)); } private static boolean put(String address, String database, String username, String password, Object o) { return put(address, database, username, password, o.toString()); } public static boolean put(String address, String database, String username, String password, String s) { return put(address, database, username, password, s, false); } public static boolean mput(String address, String database, String username, String password, String s) { return put(address, database, username, password, s, true); } public static boolean put(String address, String database, String username, String password, String s, boolean multiField) { String url = address + (multiField ? "/api/mput" : "/api/put"); if (!StringUtils.isBlank(database)) { url = url.concat("?db=" + database); } String rsp; try { rsp = HttpUtils.post(url, username, password, s); // If successful, the returned content should be null. assert rsp == null; } catch (Exception e) { LOG.error("Address: {}, DataPoints: {}", url, s); throw new RuntimeException(e); } return true; } } ================================================ FILE: tsdbwriter/src/main/resources/plugin.json ================================================ { "name": "tsdbwriter", "class": "com.alibaba.datax.plugin.writer.tsdbwriter.TSDBWriter", "description": { "useScene": "往 TSDB 中摄入数据点", "mechanism": "调用 TSDB 的 /api/put 接口,实现数据点的写入", "warn": "" }, "developer": "alibaba" } ================================================ FILE: tsdbwriter/src/main/resources/plugin_job_template.json ================================================ { "name": "tsdbwriter", "parameter": { "endpoint": "http://localhost:8242" } } ================================================ FILE: tsdbwriter/src/test/java/com/alibaba/datax/plugin/writer/conn/TSDBConnectionTest.java ================================================ package com.alibaba.datax.plugin.writer.conn; import org.junit.Assert; import org.junit.Ignore; import org.junit.Test; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDBConnection Test * * @author Benedict Jin * @since 2019-03-29 */ @Ignore public class TSDBConnectionTest { private static final String TSDB_ADDRESS = "http://localhost:8240"; @Test public void testVersion() { String version = new TSDBConnection(TSDB_ADDRESS,null,null,null).version(); Assert.assertNotNull(version); } @Test public void testIsSupported() { Assert.assertTrue(new TSDBConnection(TSDB_ADDRESS,null,null,null).isSupported()); } } ================================================ FILE: tsdbwriter/src/test/java/com/alibaba/datax/plugin/writer/util/Const.java ================================================ package com.alibaba.datax.plugin.writer.util; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:Const * * @author Benedict Jin * @since 2019-03-29 */ final class Const { private Const() { } static final String OPENTSDB_ADDRESS = "http://localhost:8242"; static final String TSDB_ADDRESS = "http://localhost:8240"; } ================================================ FILE: tsdbwriter/src/test/java/com/alibaba/datax/plugin/writer/util/HttpUtilsTest.java ================================================ package com.alibaba.datax.plugin.writer.util; import org.junit.Assert; import org.junit.Ignore; import org.junit.Test; import java.util.HashMap; import java.util.Map; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:HttpUtils Test * * @author Benedict Jin * @since 2019-03-29 */ @Ignore public class HttpUtilsTest { @Test public void testSimpleCase() throws Exception { String url = "https://httpbin.org/post"; Map params = new HashMap(); params.put("foo", "bar"); String rsp = HttpUtils.post(url, null,null,params); System.out.println(rsp); Assert.assertNotNull(rsp); } @Test public void testGet() throws Exception { String url = String.format("%s/api/version", Const.OPENTSDB_ADDRESS); String rsp = HttpUtils.get(url,null,null); System.out.println(rsp); Assert.assertNotNull(rsp); } } ================================================ FILE: tsdbwriter/src/test/java/com/alibaba/datax/plugin/writer/util/TSDBTest.java ================================================ package com.alibaba.datax.plugin.writer.util; import org.junit.Assert; import org.junit.Ignore; import org.junit.Test; /** * Copyright @ 2019 alibaba.com * All right reserved. * Function:TSDB Test * * @author Benedict Jin * @since 2019-04-11 */ @Ignore public class TSDBTest { @Test public void testVersion() { String version = TSDBUtils.version(Const.TSDB_ADDRESS,null,null); Assert.assertNotNull(version); System.out.println(version); version = TSDBUtils.version(Const.OPENTSDB_ADDRESS,null,null); Assert.assertNotNull(version); System.out.println(version); } } ================================================ FILE: txtfilereader/doc/txtfilereader.md ================================================ # DataX TxtFileReader 说明 ------------ ## 1 快速介绍 TxtFileReader提供了读取本地文件系统数据存储的能力。在底层实现上,TxtFileReader获取本地文件数据,并转换为DataX传输协议传递给Writer。 **本地文件内容存放的是一张逻辑意义上的二维表,例如CSV格式的文本信息。** ## 2 功能与限制 TxtFileReader实现了从本地文件读取数据并转为DataX协议的功能,本地文件本身是无结构化数据存储,对于DataX而言,TxtFileReader实现上类比OSSReader,有诸多相似之处。目前TxtFileReader支持功能如下: 1. 支持且仅支持读取TXT的文件,且要求TXT中shema为一张二维表。 2. 支持类CSV格式文件,自定义分隔符。 3. 支持多种类型数据读取(使用String表示),支持列裁剪,支持列常量 4. 支持递归读取、支持文件名过滤。 5. 支持文本压缩,现有压缩格式为zip、gzip、bzip2。 6. 多个File可以支持并发读取。 我们暂时不能做到: 1. 单个File支持多线程并发读取,这里涉及到单个File内部切分算法。二期考虑支持。 2. 单个File在压缩情况下,从技术上无法支持多线程并发读取。 ## 3 功能说明 ### 3.1 配置样例 ```json { "setting": {}, "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": ["/home/haiwei.luo/case00/data"], "encoding": "UTF-8", "column": [ { "index": 0, "type": "long" }, { "index": 1, "type": "boolean" }, { "index": 2, "type": "double" }, { "index": 3, "type": "string" }, { "index": 4, "type": "date", "format": "yyyy.MM.dd" } ], "fieldDelimiter": "," } }, "writer": { "name": "txtfilewriter", "parameter": { "path": "/home/haiwei.luo/case00/result", "fileName": "luohw", "writeMode": "truncate", "format": "yyyy-MM-dd" } } } ] } } ``` ### 3.2 参数说明 * **path** * 描述:本地文件系统的路径信息,注意这里可以支持填写多个路径。
    当指定单个本地文件,TxtFileReader暂时只能使用单线程进行数据抽取。二期考虑在非压缩文件情况下针对单个File可以进行多线程并发读取。 当指定多个本地文件,TxtFileReader支持使用多线程进行数据抽取。线程并发数通过通道数指定。 当指定通配符,TxtFileReader尝试遍历出多个文件信息。例如: 指定/*代表读取/目录下所有的文件,指定/bazhen/\*代表读取bazhen目录下游所有的文件。**TxtFileReader目前只支持\*作为文件通配符。** **特别需要注意的是,DataX会将一个作业下同步的所有Text File视作同一张数据表。用户必须自己保证所有的File能够适配同一套schema信息。读取文件用户必须保证为类CSV格式,并且提供给DataX权限可读。** **特别需要注意的是,如果Path指定的路径下没有符合匹配的文件抽取,DataX将报错。** * 必选:是
    * 默认值:无
    * **column** * 描述:读取字段列表,type指定源数据的类型,index指定当前列来自于文本第几列(以0开始),value指定当前类型为常量,不从源头文件读取数据,而是根据value值自动生成对应的列。
    默认情况下,用户可以全部按照String类型读取数据,配置如下: ```json "column": ["*"] ``` 用户可以指定Column字段信息,配置如下: ```json { "type": "long", "index": 0 //从本地文件文本第一列获取int字段 }, { "type": "string", "value": "alibaba" //从TxtFileReader内部生成alibaba的字符串字段作为当前字段 } ``` 对于用户指定Column信息,type必须填写,index/value必须选择其一。 * 必选:是
    * 默认值:全部按照string类型读取
    * **fieldDelimiter** * 描述:读取的字段分隔符
    * 必选:是
    * 默认值:,
    * **compress** * 描述:文本压缩类型,默认不填写意味着没有压缩。支持压缩类型为zip、gzip、bzip2。
    * 必选:否
    * 默认值:没有压缩
    * **encoding** * 描述:读取文件的编码配置。
    * 必选:否
    * 默认值:utf-8
    * **skipHeader** * 描述:类CSV格式文件可能存在表头为标题情况,需要跳过。默认不跳过。
    * 必选:否
    * 默认值:false
    * **nullFormat** * 描述:文本文件中无法使用标准字符串定义null(空指针),DataX提供nullFormat定义哪些字符串可以表示为null。
    例如如果用户配置: nullFormat:"\N",那么如果源头数据是"\N",DataX视作null字段。 * 必选:否
    * 默认值:\N
    * **csvReaderConfig** * 描述:读取CSV类型文件参数配置,Map类型。读取CSV类型文件使用的CsvReader进行读取,会有很多配置,不配置则使用默认值。
    * 必选:否
    * 默认值:无
    常见配置: ```json "csvReaderConfig":{ "safetySwitch": false, "skipEmptyRecords": false, "useTextQualifier": false } ``` 所有配置项及默认值,配置时 csvReaderConfig 的map中请**严格按照以下字段名字进行配置**: ``` boolean caseSensitive = true; char textQualifier = 34; boolean trimWhitespace = true; boolean useTextQualifier = true;//是否使用csv转义字符 char delimiter = 44;//分隔符 char recordDelimiter = 0; char comment = 35; boolean useComments = false; int escapeMode = 1; boolean safetySwitch = true;//单列长度是否限制100000字符 boolean skipEmptyRecords = true;//是否跳过空行 boolean captureRawRecord = true; ``` ### 3.3 类型转换 本地文件本身不提供数据类型,该类型是DataX TxtFileReader定义: | DataX 内部类型| 本地文件 数据类型 | | -------- | ----- | | | Long |Long | | Double |Double| | String |String| | Boolean |Boolean | | Date |Date | 其中: * 本地文件 Long是指本地文件文本中使用整形的字符串表示形式,例如"19901219"。 * 本地文件 Double是指本地文件文本中使用Double的字符串表示形式,例如"3.1415"。 * 本地文件 Boolean是指本地文件文本中使用Boolean的字符串表示形式,例如"true"、"false"。不区分大小写。 * 本地文件 Date是指本地文件文本中使用Date的字符串表示形式,例如"2014-12-31",Date可以指定format格式。 ## 4 性能报告 ## 5 约束限制 略 ## 6 FAQ 略 ================================================ FILE: txtfilereader/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT txtfilereader txtfilereader TxtFileReader提供了本地读取TEXT功能,并可以根据用户配置的类型进行类型转换,建议开发、测试环境使用。 jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: txtfilereader/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/reader/txtfilereader target/ txtfilereader-0.0.1-SNAPSHOT.jar plugin/reader/txtfilereader false plugin/reader/txtfilereader/libs runtime ================================================ FILE: txtfilereader/src/main/java/com/alibaba/datax/plugin/reader/txtfilereader/Constant.java ================================================ package com.alibaba.datax.plugin.reader.txtfilereader; /** * Created by haiwei.luo on 14-9-20. */ public class Constant { public static final String SOURCE_FILES = "sourceFiles"; } ================================================ FILE: txtfilereader/src/main/java/com/alibaba/datax/plugin/reader/txtfilereader/Key.java ================================================ package com.alibaba.datax.plugin.reader.txtfilereader; /** * Created by haiwei.luo on 14-9-20. */ public class Key { public static final String PATH = "path"; } ================================================ FILE: txtfilereader/src/main/java/com/alibaba/datax/plugin/reader/txtfilereader/TxtFileReader.java ================================================ package com.alibaba.datax.plugin.reader.txtfilereader; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordSender; import com.alibaba.datax.common.spi.Reader; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderErrorCode; import com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil; import com.google.common.collect.Sets; import org.apache.commons.io.Charsets; import org.apache.commons.io.IOUtils; import org.apache.commons.lang3.BooleanUtils; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.InputStream; import java.nio.charset.UnsupportedCharsetException; import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; import java.util.regex.Pattern; /** * Created by haiwei.luo on 14-9-20. */ public class TxtFileReader extends Reader { public static class Job extends Reader.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration originConfig = null; private List path = null; private List sourceFiles; private Map pattern; private Map isRegexPath; @Override public void init() { this.originConfig = this.getPluginJobConf(); this.pattern = new HashMap(); this.isRegexPath = new HashMap(); this.validateParameter(); } private void validateParameter() { // Compatible with the old version, path is a string before String pathInString = this.originConfig.getNecessaryValue(Key.PATH, TxtFileReaderErrorCode.REQUIRED_VALUE); if (StringUtils.isBlank(pathInString)) { throw DataXException.asDataXException( TxtFileReaderErrorCode.REQUIRED_VALUE, "您需要指定待读取的源目录或文件"); } if (!pathInString.startsWith("[") && !pathInString.endsWith("]")) { path = new ArrayList(); path.add(pathInString); } else { path = this.originConfig.getList(Key.PATH, String.class); if (null == path || path.size() == 0) { throw DataXException.asDataXException( TxtFileReaderErrorCode.REQUIRED_VALUE, "您需要指定待读取的源目录或文件"); } } String encoding = this.originConfig .getString( com.alibaba.datax.plugin.unstructuredstorage.reader.Key.ENCODING, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.DEFAULT_ENCODING); if (StringUtils.isBlank(encoding)) { this.originConfig .set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.ENCODING, com.alibaba.datax.plugin.unstructuredstorage.reader.Constant.DEFAULT_ENCODING); } else { try { encoding = encoding.trim(); this.originConfig .set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.ENCODING, encoding); Charsets.toCharset(encoding); } catch (UnsupportedCharsetException uce) { throw DataXException.asDataXException( TxtFileReaderErrorCode.ILLEGAL_VALUE, String.format("不支持您配置的编码格式 : [%s]", encoding), uce); } catch (Exception e) { throw DataXException.asDataXException( TxtFileReaderErrorCode.CONFIG_INVALID_EXCEPTION, String.format("编码配置异常, 请联系我们: %s", e.getMessage()), e); } } // column: 1. index type 2.value type 3.when type is Date, may have // format List columns = this.originConfig .getListConfiguration(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN); // handle ["*"] if (null != columns && 1 == columns.size()) { String columnsInStr = columns.get(0).toString(); if ("\"*\"".equals(columnsInStr) || "'*'".equals(columnsInStr)) { this.originConfig .set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COLUMN, null); columns = null; } } if (null != columns && columns.size() != 0) { for (Configuration eachColumnConf : columns) { eachColumnConf .getNecessaryValue( com.alibaba.datax.plugin.unstructuredstorage.reader.Key.TYPE, TxtFileReaderErrorCode.REQUIRED_VALUE); Integer columnIndex = eachColumnConf .getInt(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.INDEX); String columnValue = eachColumnConf .getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.VALUE); if (null == columnIndex && null == columnValue) { throw DataXException.asDataXException( TxtFileReaderErrorCode.NO_INDEX_VALUE, "由于您配置了type, 则至少需要配置 index 或 value"); } if (null != columnIndex && null != columnValue) { throw DataXException.asDataXException( TxtFileReaderErrorCode.MIXED_INDEX_VALUE, "您混合配置了index, value, 每一列同时仅能选择其中一种"); } if (null != columnIndex && columnIndex < 0) { throw DataXException.asDataXException( TxtFileReaderErrorCode.ILLEGAL_VALUE, String .format("index需要大于等于0, 您配置的index为[%s]", columnIndex)); } } } // only support compress types String compress = this.originConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COMPRESS); if (StringUtils.isBlank(compress)) { this.originConfig .set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COMPRESS, null); } else { Set supportedCompress = Sets .newHashSet("gzip", "bzip2", "zip"); compress = compress.toLowerCase().trim(); if (!supportedCompress.contains(compress)) { throw DataXException .asDataXException( TxtFileReaderErrorCode.ILLEGAL_VALUE, String.format( "仅支持 gzip, bzip2, zip 文件压缩格式 , 不支持您配置的文件压缩格式: [%s]", compress)); } this.originConfig .set(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.COMPRESS, compress); } String delimiterInStr = this.originConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.reader.Key.FIELD_DELIMITER); // warn: if have, length must be one if (null != delimiterInStr && 1 != delimiterInStr.length()) { throw DataXException.asDataXException( UnstructuredStorageReaderErrorCode.ILLEGAL_VALUE, String.format("仅仅支持单字符切分, 您配置的切分为 : [%s]", delimiterInStr)); } UnstructuredStorageReaderUtil.validateCsvReaderConfig(this.originConfig); } @Override public void prepare() { LOG.debug("prepare() begin..."); // warn:make sure this regex string // warn:no need trim for (String eachPath : this.path) { String regexString = eachPath.replace("*", ".*").replace("?", ".?"); Pattern patt = Pattern.compile(regexString); this.pattern.put(eachPath, patt); this.sourceFiles = this.buildSourceTargets(); } LOG.info(String.format("您即将读取的文件数为: [%s]", this.sourceFiles.size())); } @Override public void post() { } @Override public void destroy() { } // warn: 如果源目录为空会报错,拖空目录意图=>空文件显示指定此意图 @Override public List split(int adviceNumber) { LOG.debug("split() begin..."); List readerSplitConfigs = new ArrayList(); // warn:每个slice拖且仅拖一个文件, // int splitNumber = adviceNumber; int splitNumber = this.sourceFiles.size(); if (0 == splitNumber) { throw DataXException.asDataXException( TxtFileReaderErrorCode.EMPTY_DIR_EXCEPTION, String .format("未能找到待读取的文件,请确认您的配置项path: %s", this.originConfig.getString(Key.PATH))); } List> splitedSourceFiles = this.splitSourceFiles( this.sourceFiles, splitNumber); for (List files : splitedSourceFiles) { Configuration splitedConfig = this.originConfig.clone(); splitedConfig.set(Constant.SOURCE_FILES, files); readerSplitConfigs.add(splitedConfig); } LOG.debug("split() ok and end..."); return readerSplitConfigs; } // validate the path, path must be a absolute path private List buildSourceTargets() { // for eath path Set toBeReadFiles = new HashSet(); for (String eachPath : this.path) { int endMark; for (endMark = 0; endMark < eachPath.length(); endMark++) { if ('*' != eachPath.charAt(endMark) && '?' != eachPath.charAt(endMark)) { continue; } else { this.isRegexPath.put(eachPath, true); break; } } String parentDirectory; if (BooleanUtils.isTrue(this.isRegexPath.get(eachPath))) { int lastDirSeparator = eachPath.substring(0, endMark) .lastIndexOf(IOUtils.DIR_SEPARATOR); parentDirectory = eachPath.substring(0, lastDirSeparator + 1); } else { this.isRegexPath.put(eachPath, false); parentDirectory = eachPath; } this.buildSourceTargetsEathPath(eachPath, parentDirectory, toBeReadFiles); } return Arrays.asList(toBeReadFiles.toArray(new String[0])); } private void buildSourceTargetsEathPath(String regexPath, String parentDirectory, Set toBeReadFiles) { // 检测目录是否存在,错误情况更明确 try { File dir = new File(parentDirectory); boolean isExists = dir.exists(); if (!isExists) { String message = String.format("您设定的目录不存在 : [%s]", parentDirectory); LOG.error(message); throw DataXException.asDataXException( TxtFileReaderErrorCode.FILE_NOT_EXISTS, message); } } catch (SecurityException se) { String message = String.format("您没有权限查看目录 : [%s]", parentDirectory); LOG.error(message); throw DataXException.asDataXException( TxtFileReaderErrorCode.SECURITY_NOT_ENOUGH, message); } directoryRover(regexPath, parentDirectory, toBeReadFiles); } private void directoryRover(String regexPath, String parentDirectory, Set toBeReadFiles) { File directory = new File(parentDirectory); // is a normal file if (!directory.isDirectory()) { if (this.isTargetFile(regexPath, directory.getAbsolutePath())) { toBeReadFiles.add(parentDirectory); LOG.info(String.format( "add file [%s] as a candidate to be read.", parentDirectory)); } } else { // 是目录 try { // warn:对于没有权限的目录,listFiles 返回null,而不是抛出SecurityException File[] files = directory.listFiles(); if (null != files) { for (File subFileNames : files) { directoryRover(regexPath, subFileNames.getAbsolutePath(), toBeReadFiles); } } else { // warn: 对于没有权限的文件,是直接throw DataXException String message = String.format("您没有权限查看目录 : [%s]", directory); LOG.error(message); throw DataXException.asDataXException( TxtFileReaderErrorCode.SECURITY_NOT_ENOUGH, message); } } catch (SecurityException e) { String message = String.format("您没有权限查看目录 : [%s]", directory); LOG.error(message); throw DataXException.asDataXException( TxtFileReaderErrorCode.SECURITY_NOT_ENOUGH, message, e); } } } // 正则过滤 private boolean isTargetFile(String regexPath, String absoluteFilePath) { if (this.isRegexPath.get(regexPath)) { return this.pattern.get(regexPath).matcher(absoluteFilePath) .matches(); } else { return true; } } private List> splitSourceFiles(final List sourceList, int adviceNumber) { List> splitedList = new ArrayList>(); int averageLength = sourceList.size() / adviceNumber; averageLength = averageLength == 0 ? 1 : averageLength; for (int begin = 0, end = 0; begin < sourceList.size(); begin = end) { end = begin + averageLength; if (end > sourceList.size()) { end = sourceList.size(); } splitedList.add(sourceList.subList(begin, end)); } return splitedList; } } public static class Task extends Reader.Task { private static Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration readerSliceConfig; private List sourceFiles; @Override public void init() { this.readerSliceConfig = this.getPluginJobConf(); this.sourceFiles = this.readerSliceConfig.getList( Constant.SOURCE_FILES, String.class); } @Override public void prepare() { } @Override public void post() { } @Override public void destroy() { } @Override public void startRead(RecordSender recordSender) { LOG.debug("start read source files..."); for (String fileName : this.sourceFiles) { LOG.info(String.format("reading file : [%s]", fileName)); InputStream inputStream; try { inputStream = new FileInputStream(fileName); UnstructuredStorageReaderUtil.readFromStream(inputStream, fileName, this.readerSliceConfig, recordSender, this.getTaskPluginCollector()); recordSender.flush(); } catch (FileNotFoundException e) { // warn: sock 文件无法read,能影响所有文件的传输,需要用户自己保证 String message = String .format("找不到待读取的文件 : [%s]", fileName); LOG.error(message); throw DataXException.asDataXException( TxtFileReaderErrorCode.OPEN_FILE_ERROR, message); } } LOG.debug("end read source files..."); } } } ================================================ FILE: txtfilereader/src/main/java/com/alibaba/datax/plugin/reader/txtfilereader/TxtFileReaderErrorCode.java ================================================ package com.alibaba.datax.plugin.reader.txtfilereader; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by haiwei.luo on 14-9-20. */ public enum TxtFileReaderErrorCode implements ErrorCode { REQUIRED_VALUE("TxtFileReader-00", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("TxtFileReader-01", "您填写的参数值不合法."), MIXED_INDEX_VALUE("TxtFileReader-02", "您的列信息配置同时包含了index,value."), NO_INDEX_VALUE("TxtFileReader-03","您明确的配置列信息,但未填写相应的index,value."), FILE_NOT_EXISTS("TxtFileReader-04", "您配置的目录文件路径不存在."), OPEN_FILE_WITH_CHARSET_ERROR("TxtFileReader-05", "您配置的文件编码和实际文件编码不符合."), OPEN_FILE_ERROR("TxtFileReader-06", "您配置的文件在打开时异常,建议您检查源目录是否有隐藏文件,管道文件等特殊文件."), READ_FILE_IO_ERROR("TxtFileReader-07", "您配置的文件在读取时出现IO异常."), SECURITY_NOT_ENOUGH("TxtFileReader-08", "您缺少权限执行相应的文件操作."), CONFIG_INVALID_EXCEPTION("TxtFileReader-09", "您的参数配置错误."), RUNTIME_EXCEPTION("TxtFileReader-10", "出现运行时异常, 请联系我们"), EMPTY_DIR_EXCEPTION("TxtFileReader-11", "您尝试读取的文件目录为空."),; private final String code; private final String description; private TxtFileReaderErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: txtfilereader/src/main/resources/plugin.json ================================================ { "name": "txtfilereader", "class": "com.alibaba.datax.plugin.reader.txtfilereader.TxtFileReader", "description": "useScene: test. mechanism: use datax framework to transport data from txt file. warn: The more you know about the data, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: txtfilereader/src/main/resources/plugin_job_template.json ================================================ { "name": "txtfilereader", "parameter": { "path": [], "encoding": "", "column": [], "fieldDelimiter": "" } } ================================================ FILE: txtfilewriter/doc/txtfilewriter.md ================================================ # DataX TxtFileWriter 说明 ------------ ## 1 快速介绍 TxtFileWriter提供了向本地文件写入类CSV格式的一个或者多个表文件。TxtFileWriter服务的用户主要在于DataX开发、测试同学。 **写入本地文件内容存放的是一张逻辑意义上的二维表,例如CSV格式的文本信息。** ## 2 功能与限制 TxtFileWriter实现了从DataX协议转为本地TXT文件功能,本地文件本身是无结构化数据存储,TxtFileWriter如下几个方面约定: 1. 支持且仅支持写入 TXT的文件,且要求TXT中shema为一张二维表。 2. 支持类CSV格式文件,自定义分隔符。 3. 支持文本压缩,现有压缩格式为gzip、bzip2。 6. 支持多线程写入,每个线程写入不同子文件。 7. 文件支持滚动,当文件大于某个size值或者行数值,文件需要切换。 [暂不支持] 我们不能做到: 1. 单个文件不能支持并发写入。 ## 3 功能说明 ### 3.1 配置样例 ```json { "setting": {}, "job": { "setting": { "speed": { "channel": 2 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": ["/home/haiwei.luo/case00/data"], "encoding": "UTF-8", "column": [ { "index": 0, "type": "long" }, { "index": 1, "type": "boolean" }, { "index": 2, "type": "double" }, { "index": 3, "type": "string" }, { "index": 4, "type": "date", "format": "yyyy.MM.dd" } ], "fieldDelimiter": "," } }, "writer": { "name": "txtfilewriter", "parameter": { "path": "/home/haiwei.luo/case00/result", "fileName": "luohw", "writeMode": "truncate", "dateFormat": "yyyy-MM-dd" } } } ] } } ``` ### 3.2 参数说明 * **path** * 描述:本地文件系统的路径信息,TxtFileWriter会写入Path目录下属多个文件。
    * 必选:是
    * 默认值:无
    * **fileName** * 描述:TxtFileWriter写入的文件名,该文件名会添加随机的后缀作为每个线程写入实际文件名。
    * 必选:是
    * 默认值:无
    * **writeMode** * 描述:TxtFileWriter写入前数据清理处理模式:
    * truncate,写入前清理目录下一fileName前缀的所有文件。 * append,写入前不做任何处理,DataX TxtFileWriter直接使用filename写入,并保证文件名不冲突。 * nonConflict,如果目录下有fileName前缀的文件,直接报错。 * 必选:是
    * 默认值:无
    * **fieldDelimiter** * 描述:读取的字段分隔符
    * 必选:否
    * 默认值:,
    * **compress** * 描述:文本压缩类型,默认不填写意味着没有压缩。支持压缩类型为zip、lzo、lzop、tgz、bzip2。
    * 必选:否
    * 默认值:无压缩
    * **encoding** * 描述:读取文件的编码配置。
    * 必选:否
    * 默认值:utf-8
    * **nullFormat** * 描述:文本文件中无法使用标准字符串定义null(空指针),DataX提供nullFormat定义哪些字符串可以表示为null。
    例如如果用户配置: nullFormat="\N",那么如果源头数据是"\N",DataX视作null字段。 * 必选:否
    * 默认值:\N
    * **dateFormat** * 描述:日期类型的数据序列化到文件中时的格式,例如 "dateFormat": "yyyy-MM-dd"。
    * 必选:否
    * 默认值:无
    * **fileFormat** * 描述:文件写出的格式,包括csv (http://zh.wikipedia.org/wiki/%E9%80%97%E5%8F%B7%E5%88%86%E9%9A%94%E5%80%BC) 和text两种,csv是严格的csv格式,如果待写数据包括列分隔符,则会按照csv的转义语法转义,转义符号为双引号";text格式是用列分隔符简单分割待写数据,对于待写数据包括列分隔符情况下不做转义。
    * 必选:否
    * 默认值:text
    * **header** * 描述:txt写出时的表头,示例['id', 'name', 'age']。
    * 必选:否
    * 默认值:无
    ### 3.3 类型转换 本地文件本身不提供数据类型,该类型是DataX TxtFileWriter定义: | DataX 内部类型| 本地文件 数据类型 | | -------- | ----- | | Long |Long | | Double |Double| | String |String| | Boolean |Boolean | | Date |Date | 其中: * 本地文件 Long是指本地文件文本中使用整形的字符串表示形式,例如"19901219"。 * 本地文件 Double是指本地文件文本中使用Double的字符串表示形式,例如"3.1415"。 * 本地文件 Boolean是指本地文件文本中使用Boolean的字符串表示形式,例如"true"、"false"。不区分大小写。 * 本地文件 Date是指本地文件文本中使用Date的字符串表示形式,例如"2014-12-31",Date可以指定format格式。 ## 4 性能报告 ## 5 约束限制 略 ## 6 FAQ 略 ================================================ FILE: txtfilewriter/pom.xml ================================================ 4.0.0 com.alibaba.datax datax-all 0.0.1-SNAPSHOT txtfilewriter txtfilewriter TxtFileWriter提供了本地写入TEXT功能,建议开发、测试环境使用。 jar com.alibaba.datax datax-common ${datax-project-version} slf4j-log4j12 org.slf4j com.alibaba.datax plugin-unstructured-storage-util ${datax-project-version} org.slf4j slf4j-api ch.qos.logback logback-classic com.google.guava guava 16.0.1 maven-compiler-plugin ${jdk-version} ${jdk-version} ${project-sourceEncoding} maven-assembly-plugin src/main/assembly/package.xml datax dwzip package single ================================================ FILE: txtfilewriter/src/main/assembly/package.xml ================================================ dir false src/main/resources plugin.json plugin_job_template.json plugin/writer/txtfilewriter target/ txtfilewriter-0.0.1-SNAPSHOT.jar plugin/writer/txtfilewriter false plugin/writer/txtfilewriter/libs runtime ================================================ FILE: txtfilewriter/src/main/java/com/alibaba/datax/plugin/writer/txtfilewriter/Key.java ================================================ package com.alibaba.datax.plugin.writer.txtfilewriter; /** * Created by haiwei.luo on 14-9-17. */ public class Key { // must have public static final String PATH = "path"; } ================================================ FILE: txtfilewriter/src/main/java/com/alibaba/datax/plugin/writer/txtfilewriter/TxtFileWriter.java ================================================ package com.alibaba.datax.plugin.writer.txtfilewriter; import com.alibaba.datax.common.exception.DataXException; import com.alibaba.datax.common.plugin.RecordReceiver; import com.alibaba.datax.common.spi.Writer; import com.alibaba.datax.common.util.Configuration; import com.alibaba.datax.plugin.unstructuredstorage.writer.UnstructuredStorageWriterUtil; import org.apache.commons.io.FileUtils; import org.apache.commons.io.IOUtils; import org.apache.commons.io.filefilter.PrefixFileFilter; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.File; import java.io.FileOutputStream; import java.io.FilenameFilter; import java.io.IOException; import java.io.OutputStream; import java.util.ArrayList; import java.util.Arrays; import java.util.HashSet; import java.util.List; import java.util.Set; import java.util.UUID; /** * Created by haiwei.luo on 14-9-17. */ public class TxtFileWriter extends Writer { public static class Job extends Writer.Job { private static final Logger LOG = LoggerFactory.getLogger(Job.class); private Configuration writerSliceConfig = null; @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.validateParameter(); String dateFormatOld = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FORMAT); String dateFormatNew = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.DATE_FORMAT); if (null == dateFormatNew) { this.writerSliceConfig .set(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.DATE_FORMAT, dateFormatOld); } if (null != dateFormatOld) { LOG.warn("您使用format配置日期格式化, 这是不推荐的行为, 请优先使用dateFormat配置项, 两项同时存在则使用dateFormat."); } UnstructuredStorageWriterUtil .validateParameter(this.writerSliceConfig); } private void validateParameter() { this.writerSliceConfig .getNecessaryValue( com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME, TxtFileWriterErrorCode.REQUIRED_VALUE); String path = this.writerSliceConfig.getNecessaryValue(Key.PATH, TxtFileWriterErrorCode.REQUIRED_VALUE); try { // warn: 这里用户需要配一个目录 File dir = new File(path); if (dir.isFile()) { throw DataXException .asDataXException( TxtFileWriterErrorCode.ILLEGAL_VALUE, String.format( "您配置的path: [%s] 不是一个合法的目录, 请您注意文件重名, 不合法目录名等情况.", path)); } if (!dir.exists()) { boolean createdOk = dir.mkdirs(); if (!createdOk) { throw DataXException .asDataXException( TxtFileWriterErrorCode.CONFIG_INVALID_EXCEPTION, String.format("您指定的文件路径 : [%s] 创建失败.", path)); } } } catch (SecurityException se) { throw DataXException.asDataXException( TxtFileWriterErrorCode.SECURITY_NOT_ENOUGH, String.format("您没有权限创建文件路径 : [%s] ", path), se); } } @Override public void prepare() { String path = this.writerSliceConfig.getString(Key.PATH); String fileName = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME); String writeMode = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.WRITE_MODE); // truncate option handler if ("truncate".equals(writeMode)) { LOG.info(String.format( "由于您配置了writeMode truncate, 开始清理 [%s] 下面以 [%s] 开头的内容", path, fileName)); File dir = new File(path); // warn:需要判断文件是否存在,不存在时,不能删除 try { if (dir.exists()) { // warn:不要使用FileUtils.deleteQuietly(dir); FilenameFilter filter = new PrefixFileFilter(fileName); File[] filesWithFileNamePrefix = dir.listFiles(filter); for (File eachFile : filesWithFileNamePrefix) { LOG.info(String.format("delete file [%s].", eachFile.getName())); FileUtils.forceDelete(eachFile); } // FileUtils.cleanDirectory(dir); } } catch (NullPointerException npe) { throw DataXException .asDataXException( TxtFileWriterErrorCode.Write_FILE_ERROR, String.format("您配置的目录清空时出现空指针异常 : [%s]", path), npe); } catch (IllegalArgumentException iae) { throw DataXException.asDataXException( TxtFileWriterErrorCode.SECURITY_NOT_ENOUGH, String.format("您配置的目录参数异常 : [%s]", path)); } catch (SecurityException se) { throw DataXException.asDataXException( TxtFileWriterErrorCode.SECURITY_NOT_ENOUGH, String.format("您没有权限查看目录 : [%s]", path)); } catch (IOException e) { throw DataXException.asDataXException( TxtFileWriterErrorCode.Write_FILE_ERROR, String.format("无法清空目录 : [%s]", path), e); } } else if ("append".equals(writeMode)) { LOG.info(String .format("由于您配置了writeMode append, 写入前不做清理工作, [%s] 目录下写入相应文件名前缀 [%s] 的文件", path, fileName)); } else if ("nonConflict".equals(writeMode)) { LOG.info(String.format( "由于您配置了writeMode nonConflict, 开始检查 [%s] 下面的内容", path)); // warn: check two times about exists, mkdirs File dir = new File(path); try { if (dir.exists()) { if (dir.isFile()) { throw DataXException .asDataXException( TxtFileWriterErrorCode.ILLEGAL_VALUE, String.format( "您配置的path: [%s] 不是一个合法的目录, 请您注意文件重名, 不合法目录名等情况.", path)); } // fileName is not null FilenameFilter filter = new PrefixFileFilter(fileName); File[] filesWithFileNamePrefix = dir.listFiles(filter); if (filesWithFileNamePrefix.length > 0) { List allFiles = new ArrayList(); for (File eachFile : filesWithFileNamePrefix) { allFiles.add(eachFile.getName()); } LOG.error(String.format("冲突文件列表为: [%s]", StringUtils.join(allFiles, ","))); throw DataXException .asDataXException( TxtFileWriterErrorCode.ILLEGAL_VALUE, String.format( "您配置的path: [%s] 目录不为空, 下面存在其他文件或文件夹.", path)); } } else { boolean createdOk = dir.mkdirs(); if (!createdOk) { throw DataXException .asDataXException( TxtFileWriterErrorCode.CONFIG_INVALID_EXCEPTION, String.format( "您指定的文件路径 : [%s] 创建失败.", path)); } } } catch (SecurityException se) { throw DataXException.asDataXException( TxtFileWriterErrorCode.SECURITY_NOT_ENOUGH, String.format("您没有权限查看目录 : [%s]", path)); } } else { throw DataXException .asDataXException( TxtFileWriterErrorCode.ILLEGAL_VALUE, String.format( "仅支持 truncate, append, nonConflict 三种模式, 不支持您配置的 writeMode 模式 : [%s]", writeMode)); } } @Override public void post() { } @Override public void destroy() { } @Override public List split(int mandatoryNumber) { LOG.info("begin do split..."); List writerSplitConfigs = new ArrayList(); String filePrefix = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME); Set allFiles = new HashSet(); String path = null; try { path = this.writerSliceConfig.getString(Key.PATH); File dir = new File(path); allFiles.addAll(Arrays.asList(dir.list())); } catch (SecurityException se) { throw DataXException.asDataXException( TxtFileWriterErrorCode.SECURITY_NOT_ENOUGH, String.format("您没有权限查看目录 : [%s]", path)); } String fileSuffix; for (int i = 0; i < mandatoryNumber; i++) { // handle same file name Configuration splitedTaskConfig = this.writerSliceConfig .clone(); String fullFileName = null; fileSuffix = UUID.randomUUID().toString().replace('-', '_'); fullFileName = String.format("%s__%s", filePrefix, fileSuffix); while (allFiles.contains(fullFileName)) { fileSuffix = UUID.randomUUID().toString().replace('-', '_'); fullFileName = String.format("%s__%s", filePrefix, fileSuffix); } allFiles.add(fullFileName); splitedTaskConfig .set(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME, fullFileName); LOG.info(String.format("splited write file name:[%s]", fullFileName)); writerSplitConfigs.add(splitedTaskConfig); } LOG.info("end do split."); return writerSplitConfigs; } } public static class Task extends Writer.Task { private static final Logger LOG = LoggerFactory.getLogger(Task.class); private Configuration writerSliceConfig; private String path; private String fileName; @Override public void init() { this.writerSliceConfig = this.getPluginJobConf(); this.path = this.writerSliceConfig.getString(Key.PATH); this.fileName = this.writerSliceConfig .getString(com.alibaba.datax.plugin.unstructuredstorage.writer.Key.FILE_NAME); } @Override public void prepare() { } @Override public void startWrite(RecordReceiver lineReceiver) { LOG.info("begin do write..."); String fileFullPath = this.buildFilePath(); LOG.info(String.format("write to file : [%s]", fileFullPath)); OutputStream outputStream = null; try { File newFile = new File(fileFullPath); newFile.createNewFile(); outputStream = new FileOutputStream(newFile); UnstructuredStorageWriterUtil.writeToStream(lineReceiver, outputStream, this.writerSliceConfig, this.fileName, this.getTaskPluginCollector()); } catch (SecurityException se) { throw DataXException.asDataXException( TxtFileWriterErrorCode.SECURITY_NOT_ENOUGH, String.format("您没有权限创建文件 : [%s]", this.fileName)); } catch (IOException ioe) { throw DataXException.asDataXException( TxtFileWriterErrorCode.Write_FILE_IO_ERROR, String.format("无法创建待写文件 : [%s]", this.fileName), ioe); } finally { IOUtils.closeQuietly(outputStream); } LOG.info("end do write"); } private String buildFilePath() { boolean isEndWithSeparator = false; switch (IOUtils.DIR_SEPARATOR) { case IOUtils.DIR_SEPARATOR_UNIX: isEndWithSeparator = this.path.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR)); break; case IOUtils.DIR_SEPARATOR_WINDOWS: isEndWithSeparator = this.path.endsWith(String .valueOf(IOUtils.DIR_SEPARATOR_WINDOWS)); break; default: break; } if (!isEndWithSeparator) { this.path = this.path + IOUtils.DIR_SEPARATOR; } return String.format("%s%s", this.path, this.fileName); } @Override public void post() { } @Override public void destroy() { } } } ================================================ FILE: txtfilewriter/src/main/java/com/alibaba/datax/plugin/writer/txtfilewriter/TxtFileWriterErrorCode.java ================================================ package com.alibaba.datax.plugin.writer.txtfilewriter; import com.alibaba.datax.common.spi.ErrorCode; /** * Created by haiwei.luo on 14-9-17. */ public enum TxtFileWriterErrorCode implements ErrorCode { CONFIG_INVALID_EXCEPTION("TxtFileWriter-00", "您的参数配置错误."), REQUIRED_VALUE("TxtFileWriter-01", "您缺失了必须填写的参数值."), ILLEGAL_VALUE("TxtFileWriter-02", "您填写的参数值不合法."), Write_FILE_ERROR("TxtFileWriter-03", "您配置的目标文件在写入时异常."), Write_FILE_IO_ERROR("TxtFileWriter-04", "您配置的文件在写入时出现IO异常."), SECURITY_NOT_ENOUGH("TxtFileWriter-05", "您缺少权限执行相应的文件写入操作."); private final String code; private final String description; private TxtFileWriterErrorCode(String code, String description) { this.code = code; this.description = description; } @Override public String getCode() { return this.code; } @Override public String getDescription() { return this.description; } @Override public String toString() { return String.format("Code:[%s], Description:[%s].", this.code, this.description); } } ================================================ FILE: txtfilewriter/src/main/resources/plugin.json ================================================ { "name": "txtfilewriter", "class": "com.alibaba.datax.plugin.writer.txtfilewriter.TxtFileWriter", "description": "useScene: test. mechanism: use datax framework to transport data to txt file. warn: The more you know about the data, the less problems you encounter.", "developer": "alibaba" } ================================================ FILE: txtfilewriter/src/main/resources/plugin_job_template.json ================================================ { "name": "txtfilewriter", "parameter": { "path": "", "fileName": "", "writeMode": "", "fieldDelimiter":"", "dateFormat": "" } } ================================================ FILE: userGuid.md ================================================ # DataX DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。 # Features DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。 # System Requirements - Linux - [JDK(1.8以上,推荐1.8) ](http://www.oracle.com/technetwork/cn/java/javase/downloads/index.html) - [Python(2或3都可以) ](https://www.python.org/downloads/) - [Apache Maven 3.x](https://maven.apache.org/download.cgi) (Compile DataX) # Quick Start * 工具部署 * 方法一、直接下载DataX工具包:[DataX下载地址](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz) 下载后解压至本地某个目录,进入bin目录,即可运行同步作业: ``` shell $ cd {YOUR_DATAX_HOME}/bin $ python datax.py {YOUR_JOB.json} ``` 自检脚本:    python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json * 方法二、下载DataX源码,自己编译:[DataX源码](https://github.com/alibaba/DataX) (1)、下载DataX源码: ``` shell $ git clone git@github.com:alibaba/DataX.git ``` (2)、通过maven打包: ``` shell $ cd {DataX_source_code_home} $ mvn -U clean package assembly:assembly -Dmaven.test.skip=true ``` 打包成功,日志显示如下: ``` [INFO] BUILD SUCCESS [INFO] ----------------------------------------------------------------- [INFO] Total time: 08:12 min [INFO] Finished at: 2015-12-13T16:26:48+08:00 [INFO] Final Memory: 133M/960M [INFO] ----------------------------------------------------------------- ``` 打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下: ``` shell $ cd {DataX_source_code_home} $ ls ./target/datax/datax/ bin conf job lib log log_perf plugin script tmp ``` * 配置示例:从stream读取数据并打印到控制台 * 第一步、创建作业的配置文件(json格式) 可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER} ``` shell $ cd {YOUR_DATAX_HOME}/bin $ python datax.py -r streamreader -w streamwriter DataX (UNKNOWN_DATAX_VERSION), From Alibaba ! Copyright (C) 2010-2015, Alibaba Group. All Rights Reserved. Please refer to the streamreader document: https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md Please refer to the streamwriter document: https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job. { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "column": [], "sliceRecordCount": "" } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "", "print": true } } } ], "setting": { "speed": { "channel": "" } } } } ``` 根据模板配置json如下: ``` json #stream2stream.json { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "sliceRecordCount": 10, "column": [ { "type": "long", "value": "10" }, { "type": "string", "value": "hello,你好,世界-DataX" } ] } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 5 } } } } ``` * 第二步:启动DataX ``` shell $ cd {YOUR_DATAX_DIR_BIN} $ python datax.py ./stream2stream.json ``` 同步结束,显示日志如下: ``` shell ... 2015-12-17 11:20:25.263 [job-0] INFO JobContainer - 任务启动时刻 : 2015-12-17 11:20:15 任务结束时刻 : 2015-12-17 11:20:25 任务总计耗时 : 10s 任务平均流量 : 205B/s 记录写入速度 : 5rec/s 读出记录总数 : 50 读写失败总数 : 0 ``` # Contact us Google Groups: [DataX-user](https://github.com/alibaba/DataX)