建议使用以下工具对应的版本号,由于版本导致的问题,解决起来怪麻烦的
Gradle >= 2.4, 下载链接在 这里 2.14(当前最新发布版)也没问题
Play = 2.2.4, 下载链接在 这里 Play的版本兼容做得不好,每次版本升级都会搭配一个升级指南
,所以保证版本的问题
Java=1.8
Mysql >= 5.6,事实上,5.7会有问题
ElasticSearch,最新发布版就好,下载链接在 这里
metadata-etl/extralibs目录需要的Jar包,从Maven仓库下载即可
Maven仓库地址:http://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.12
#1. mysql-connector-java.5.1.*.jar
如 mysql-connector-java.5.1.12.jar
#2. jython-standalone-2.7.0.jar
安装Gradle
、Play
以及ElasticSearch
教程:
Play-2.2.4在安装完成后,为了避免`StackOverflowError`, 需要改一下 `$PLAY_HOME/framework/build` 里的配置信息:将
"$JAVA" ${DEBUG_PARAM} -Xms512M -Xmx1536M -Xss1M -XX:ReservedCodeCacheSize=192m -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=512M ${JAVA_OPTS} -Dfile.encoding=UTF-8 -Dplay.version="${PLAY_VERSION}" -Dplay.home=`dirname $0` -Dsbt.boot.properties=`dirname $0`/sbt/sbt.boot.properties -Dsbt.scala.version=${SBT_SCALA_VERSION} ${PLAY_OPTS} -jar `dirname $0`/sbt/sbt-launch.jar "$@"
改为
"$JAVA" ${DEBUG_PARAM} -Xms512M -Xmx1536M -Xss2M -XX:ReservedCodeCacheSize=192m -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=512M ${JAVA_OPTS} -Dfile.encoding=UTF-8 -Dplay.version="${PLAY_VERSION}" -Dplay.home=`dirname $0` -Dsbt.boot.properties=`dirname $0`/sbt/sbt.boot.properties -Dsbt.scala.version=${SBT_SCALA_VERSION} ${PLAY_OPTS} -jar `dirname $0`/sbt/sbt-launch.jar "$@"
ElasticSearch, [官方教程](
https://www.elastic.co/guide/en/elasticsearch/guide/current/running-elasticsearch.html)
用master
分支的代码吧
即使不做这些构建前准备工作,项目也能构建成功,但是运行的时候,就会报错了
#1. 创建wherehows数据库
CREATE DATABASE wherehows
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
#2. 创建wherehows用户
CREATE USER 'wherehows'@'localhost' IDENTIFIED BY 'wherehows';
CREATE USER 'wherehows'@'%' IDENTIFIED BY 'wherehows';
GRANT ALL ON wherehows.* TO 'wherehows'@'wherehows';
GRANT ALL ON wherehows.* TO 'wherehows'@'%';
CREATE USER 'wherehows_ro'@'localhost' IDENTIFIED BY 'readmetadata';
CREATE USER 'wherehows_ro'@'%' IDENTIFIED BY 'readmetadata';
GRANT SELECT ON wherehows.* TO 'wherehows_ro'@'localhost';
GRANT SELECT ON wherehows.* TO 'wherehows_ro'@'%';
#3. 执行建表语句
cd WhereHows/data-model/DDL
mysql -h localhost -u wherehows -p wherehows < create_all_tables_wrapper.sql
#1. 创建数据集、注释、字段映射
curl -XPUT '$YOUR_INDEX_URL:9200/wherehows' -d '
{
"mappings": {
"dataset": {},
"comment": {
"_parent": {
"type": "dataset"
}
},
"field": {
"_parent": {
"type": "dataset"
}
}
}
}
'
#2. 创建flow_job内嵌对象映射
curl -XPUT '$YOUR_INDEX_URL:9200/wherehows/flow_jobs/_mapping' -d '
{
"flow_jobs": {
"properties": {
"jobs": {
"type": "nested",
"properties": {
"job_name": { "type": "string" },
"job_path": { "type": "string" },
"job_type": { "type": "string" },
"pre_jobs": { "type": "string" },
"post_jobs": { "type": "string" },
"is_current": { "type": "string" },
"is_first": { "type": "string" },
"is_last": { "type": "string" },
"job_type_id": { "type": "short" },
"app_id": { "type": "short" },
"flow_id": { "type": "long" },
"job_id": { "type": "long" }
}
}
}
}
}
'
#3. 构建ElasticSearch索引
这个构建工作作为一个`ETL Job`,在从Mysql获取数据时会自动触发,当然也可以手动执行`metadata-etl/src/main/resources/jython/ElasticSearchIndex.py`来构建
WhereHows是一个 Gradle多工程项目,公共依赖都是在父工程
声明的,所以修改WhereHows/build.gradle
文件
将"akka" : "com.typesafe.akka:akka-actor_2.10:2.3.15",
调整为"akka" : "com.typesafe.akka:akka-actor_2.10:2.2.5",
具体描述参见 这里
WhereHows中的metadata-ETL
的工作流程是这样的:
backend-service项目定时读取DB中wherehows.wh_etl_job
表的记录,找出本次需要执行的ETL-Job
Java调Jython脚本执行Extract,生成一些文件到磁盘,CSV文件
Java调Jython脚本执行Transform,生成一些文件到磁盘,CSV和JSON文件
Java调Jython脚本执行Load,分析上面生成的文件,并录入Mysql
cd $HOME/Documents
#1. 用来存放生成的csv或json文件
mkdir -p wherehows_tmp/exec
mkdir -p wherehows_tmp/app_folder
#3. 存放一些UI相关的文件
midir -p wherehows_tmp/resources
主要是启动backend-service
和web
两个工程
UI和后台服务之间是互相独立的,可以分别启动
它是一个Play应用,启动方式如下:
#1. 修改conf/database.conf配置信息
db.wherehows.driver = "com.mysql.jdbc.Driver"
db.wherehows.url = "jdbc:mysql://localhost/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull"
db.wherehows.user = "wherehows"
db.wherehows.password = "wherehows"
db.wherehows.host = "localhost"
#2. dev模式-启动
cd backend-service
$PLAY_HOME/play "run PORT_NUM"
#3. prod模式-启动
cd backend-service
gradle dist
##启动
./target/universal/stage/bin/backend-service -Dhttp.port=PORT_NUM
在浏览器输入http://localhost:PORT_NUM
能看到Test
即启动成功
它是一个Play应用,启动方式如下:
#1. 修改web/conf/application.conf的如下配置
search.engine = "default"
elasticsearch.dataset.url = "$YOUR_DATASET_INDEX_URL"
elasticsearch.flow.url = "$YOUR_FLOW_INDEX_URL"
datasets.tree.name = "$YOUR_HOME/Documents/wherehows_tmp/resource/dataset.json"
flows.tree.name = "$YOUR_HOME/Documents/wherehows_tmp/resource/flow.json"
database.opensource.username = "wherehows"
database.opensource.password = "wherehows"
database.opensource.url = "jdbc:mysql://localhost/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull"
#2. dev-模式 启动
cd web
$PLA_HOME/play "run PORT_NUM"
#3. 发布程序
cd web
gradle dist
在`target/universal`下就包含了zip包
在浏览器输入http://localhost:PORT_NUM
就可以看到了
保证backend-service
应用启动的前提下,例如http://localhost:9000
,教程在 这里
wherehows.wh_property
NOTE:这一步骤做一次就好了,直接刷脚本吧
改一下路径
INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.app_folder','$YOUR_HOME/Documents/wherehows_tmp/app_folder','N',NULL);
INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.db.driver','com.mysql.jdbc.Driver','N',NULL);
INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.db.jdbc.url','jdbc:mysql://localhost/wherehows?charset=utf8&zeroDateTimeBehavior=convertToNull','N',NULL);
INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.db.password','wherehows','N',NULL);
INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.db.username','wherehows','N',NULL);
INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.ui.tree.dataset.file','$YOUR_HOME/Documents/wherehows_tmp/resource/dataset.json','N',NULL);
INSERT INTO `wh_property` (`property_name`,`property_value`,`is_encrypted`,`group_name`) VALUES ('wherehows.ui.tree.flow.file','$YOUR_HOME/Documents/wherehows_tmp/resource/flow.json','N',NULL);
文档说明在 这里,那些Required=N
的字段也必须给,文档描述有误
URL: http://localhost:9000/cfg/db
Method: POST
Body(JSON):
{
"db_id": 10001,
"db_code": "HIVE_DEMO",
"db_type_id": 0,
"description": "HIVE_DEMO_desc",
"cluster_size": 0,
"associated_data_centers": 1,
"replication_role": "MASTER",
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-HIVE"
}
文档说明在 这里,这里的wh_etl_job_name
字段取值去metadata-etl/src/main/java/metadata/etl/models/EtJobName.java
里找,这里没有完全列举。。。
URL: http://localhost:9000/cfg/db
Method: POST
Body(JSON):
{
"db_id": 10001,
"db_code": "HIVE_DEMO",
"db_type_id": 0,
"description": "HIVE_DEMO_desc",
"cluster_size": 0,
"associated_data_centers": 1,
"replication_role": "MASTER",
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-HIVE"
一旦新增成功后,backend-service在下一次调度时就会去执行了
E-T-L
过程,完善目前UI界面的缺陷
转载:https://blog.csdn.net/houzhizhen/article/details/66972166