当前位置: 首页 > 软件库 > 数据库相关 > >

Udacity-Data-Engineering-Projects

授权协议 View license
开发语言 C/C++
所属分类 数据库相关
软件类型 开源软件
地区 不详
投 递 者 颜瀚漠
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Data Engineering Projects

Project 1: Data Modeling with Postgres

In this project, we apply Data Modeling with Postgres and build an ETL pipeline using Python. A startup wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. Currently, they are collecting data in json format and the analytics team is particularly interested in understanding what songs users are listening to.

Link: Data_Modeling_with_Postgres

Project 2: Data Modeling with Cassandra

In this project, we apply Data Modeling with Cassandra and build an ETL pipeline using Python. We will build a Data Model around our queries that we want to get answers for.For our use case we want below answers:

  • Get details of a song that was herad on the music app history during a particular session.
  • Get songs played by a user during particular session on music app.
  • Get all users from the music app history who listened to a particular song.

Link : Data_Modeling_with_Apache_Cassandra

Project 3: Data Warehouse

In this project, we apply the Data Warehouse architectures we learnt and build a Data Warehouse on AWS cloud. We build an ETL pipeline to extract and transform data stored in json format in s3 buckets and move the data to Warehouse hosted on Amazon Redshift.

Use Redshift IaC script - Redshift_IaC_README

Link - Data_Warehouse

Project 4: Data Lake

In this project, we will build a Data Lake on AWS cloud using Spark and AWS EMR cluster. The data lake will serve as a Single Source of Truth for the Analytics Platform. We will write spark jobs to perform ELT operations that picks data from landing zone on S3 and transform and stores data on the S3 processed zone.

Link: Data_Lake

Project 5: Data Pipelines with Airflow

In this project, we will orchestrate our Data Pipeline workflow using an open-source Apache project called Apache Airflow. We will schedule our ETL jobs in Airflow, create project related custom plugins and operators and automate the pipeline execution.

Link: Airflow_Data_Pipelines

Project 6: Api Data to Postgres

In this project, we build an etl pipeline to fetch data from yelp API and insert it into the Postgres Database. This project is a very basic example of fetching real time data from an open source API.

Link: API to Postgres

CAPSTONE PROJECT

Udacity provides their own crafted Capstone project with dataset that include data on immigration to the United States, and supplementary datasets that include data on airport codes, U.S. city demographics, and temperature data.

I worked on my own open-ended project.
Here is the link - goodreads_etl_pipeline

 相关资料
  • Social-Engineering Toolkit(SET) 是一个由 David Kennedy (ReL1K)设计的社会工程学工具,该工具集成了多个有用的社会工程学攻击工具在一个统一的简单界面上。SET的主要目的是对多个社会工程攻击工具实现自动化和改良。作为一个渗透测试人员,社会工具是一个有效的攻击手段,但实际上并没有多少人使用它。

  • E-Standards for Mass Properties Engineering 是一个开源论坛,旨在提高开发者对于 Java 应用质量特性的兴趣。

  • 本文向大家介绍bootstrap data与jquery .data,包括了bootstrap data与jquery .data的使用技巧和注意事项,需要的朋友参考一下 jquery官网对.data函数描述是:在匹配元素上存储任意相关数据 或 返回匹配的元素集合中的第一个元素的给定名称的数据存储的值。 存储键值(key/value):     取键值   以上这些都很容易掌握和理解,今天在看bo

  • .data : * 可用于存储你需要的数据。 myTween.data={data1:'value1',data2:'value2',} //存储 myTween.data.data1 //读取 .data适用于TweenMaxTweenLite .data的参数 .data 示例 .box { width:50px; height:50px; border-radiu

  • .data : * 用于储存或者读取任何你想要的数据。 timeline.data={data1:'value1',data2:'value2',} //设置数据 timeline.data.data1 //读取数据 .data适用于TimelineMaxTimelineLite .data的参数 .data 示例 .box { width:50px; height:50px;

  • 数据包用于加载和保存应用程序中的所有数据。 数据包有很多类,但最重要的类是 - Model Store Proxy Model 模型的基类是Ext.data.Model 。 它代表应用程序中的实体。 它将商店数据绑定到视图。 它具有后端数据对象到视图dataIndex的映射。 在商店的帮助下获取数据。 创建模型 为了创建模型,我们需要扩展Ext.data.Model类,我们需要定义字段,它们的名称