当前位置：首页 > 软件库 > 神经网络/人工智能 > 机器学习/深度学习 >

Github-Stars-Predictor

授权协议 MIT License

开发语言 Python

所属分类神经网络/人工智能、机器学习/深度学习

软件类型开源软件

地区不详

投递者漆雕奇

操作系统跨平台

开源组织无

适用人群未知

软件概览

Github Repo Stars Predictor

Overview

It's a github repo star predictor that tries to predict the stars ofany github repository having greater than 100 stars. It predicts based onthe owner/organization's status and activities (commits, forks, comments,branches, update rate, etc.) on the repository. Different types of models(Gradient boost, Deep neural network, etc) have been tested successfullyon the dataset we fetched from github apis.

Dataset

We used the github REST api and GraphQL api to collect data of repositorieshaving more than 100 stars. The data is available in the dataset directoryWe were able to collect the data faster using the Digital Ocean's multipleservers. So we thanks Digital Ocean for providingfree credits to students to use servers. For the details on dataset featuresrefer the summary section below.

Tools used

Python 2.7
Jupyter Notebook
NumPy
Sklearn
Pandas
Keras
Cat Boost
Matplot Lib
seaborn

We also used Google Colab's GPU notebooks.So we thank to Google for starting thier colab project forproviding GPUs

Code details

Below is a brief description for the Code files/folder in repo.

Bash Script

settingUpDOServer.sh
filepath: scripts/bash/settingUpDOServer.sh
This is used for configuring the digital ocean server

NodeJs scripts

getting_repos_v2.js
filepath: scripts/nodejs/getting_repos_v2.js
This script fetches the basic info of repos having more than 100 stars using the Github REST API
githubGraphQLApiCallsDO_V2.js
filepath: scripts/nodejs/githubGraphQLApiCallsDO_V2.js
This script fetches the complete info of the repositories that were fetched by the abovescript and uses the Github GraphQL API. It follows the approach of fetching the dataat the fixed rate defined in env file (eg. 730ms per request)
githubGraphQLApiCallsDO_V3.js
filepath: scripts/nodejs/githubGraphQLApiCallsDO_V3.js
This script fetches the complete info of the repositories that were fetched by the abovescript and uses the Github GraphQL API. It follows the approach of requesting data fornext repo after receiving the response of the already sent request.

Python scripts

json_to_csv.py
filepath: scripts/python/json_to_csv.py
This script converts the json data fetched from Github's GraphQL API in the above script to theequivalent csv file.
merge.py
filepath: scripts/python/merge.py
This scripts merges all the data in multiple csv files to a single csv file

Jupyter Notebooks

VisualizePreprocess.ipynb
filepath: notebooks/VisualizePreprocess.ipynb
We have done the feature engineering task in this notebook. It visualises the data and correspondinglycreates new features, modifies existing features and removes redundant features. For detailson features created, check the summary below
training_models.ipynb
filepath: notebooks/training_models.ipynb
In this notebook, we trained different models with hyper parameter tuning on our dataset and compared their result in the end.For details on models trained, their prediction scores, etc. check the summary below.

Summary

In this project we have tried to predict the number of starsof a github repository that have more than 100 stars. For this we havetaken the github repository data from github REST api and GraphQL api.After generating the dataset we visualized and did some feature engineeringwith the dataset and after that , finally we come up to the stage where weapplied various models and predicted the model's scores on training andtest data.

Feature Engineering

There are total of 49 features before pre-processing. After pre-processing (adding new features, removal of redundant features andmodifying existing features) the count changes to 54. All the features are listed below.Some features after pre-processing may not be clear. Please refer to the VisualizePreprocess.ipynb notebook for details.

Original Features

column 1	column 2	column 3
branches	commits	createdAt
description	diskUsage	followers
following	forkCount	gistComments
gistStar	gists	hasWikiEnabled
iClosedComments	iClosedParticipants	iOpenComments
iOpenParticipants	isArchived	issuesClosed
issuesOpen	license	location
login	members	organizations
prClosed	prClosedComments	prClosedCommits
prMerged	prMergedComments	prMergedCommits
prOpen	prOpenComments	prOpenCommits
primaryLanguage	pushedAt	readmeCharCount
readmeLinkCount	readmeSize	readmeWordCount
releases	reponame	repositories
siteAdmin	stars	subscribersCount
tags	type	updatedAt
websiteUrl

Features after pre-processing

column 1	column 2	column 3
branches	commits	createdAt
diskUsage	followers	following
forkCount	gistComments	gistStar
gists	hasWikiEnabled	iClosedComments
iClosedParticipants	iOpenComments	iOpenParticipants
issuesClosed	issuesOpen	members
organizations	prClosed	prClosedComments
prClosedCommits	prMerged	prMergedComments
prMergedCommits	prOpen	prOpenComments
prOpenCommits	pushedAt	readmeCharCount
readmeLinkCount	readmeSize	readmeWordCount
releases	repositories	subscribersCount
tags	type	updatedAt
websiteUrl	desWordCount	desCharCount
mit_license	nan_license	apache_license
other_license	remain_license	JavaScript
Python	Java	Objective
Ruby	PHP	other_language

Models Trained

Gradient Boost Regressor
Cat Boost Regressor
Random Forest Regressor
Deep Neural Network

Evaluation Metrics

R^2 score

Results

使用案例

查看一个github用户的总star数

nodejs项目，简单易用： https://github.com/yyx990803/starz
【Github】查看自己的项目fork和star

以我的price-monitor项目为例： https://github.com/qqxx6661/price-monitor star： https://github.com/qqxx6661/price-monitor/stargazers fork： https://github.com/qqxx6661/price-monitor/network
11、比对软件STAR（https://github.com/alexdobin/STAR）

转载：https://mp.weixin.qq.com/s?__biz=MzI1MjU5MjMzNA==&mid=2247484731&idx=1&sn=b15fbee5910b36341bf366860ee5df53&scene=21#wechat_redirect 这次给大家带来的是ENCODE project的御用比对软件STAR，ENCODE项目是一个由美国国家人类基因组研究所(NHGRI
Github最有价值的项目 Stars 30k+篇

以下排序不单纯看star数。js相关的排除、PhP相关的排除、冷门语言（编程榜前十名之外）排除。标*的为曾经最热门但目前已经没落的项目，虽然star数很高但不推荐使用。一.最佳项目以下收录的项目价值很高，每一个都推动了计算机历史的进程，不能用star衡量，所以star数不列出项目描述 pytorch 仅次于Tensorflow的深度学习框架 spring-framework Javawe
Github最有价值的项目 Stars 40k+篇

以下排序不单纯看star数。js相关的排除、PhP相关的排除、冷门语言（编程榜前十名之外）排除。标*的为曾经最热门但目前已经没落的项目，虽然star数很高但不推荐使用。一.最佳项目以下收录的项目价值很高，每一个都推动了计算机历史的进程，不能用star衡量，所以star数不列出项目描述 tensorflow 目前最流行的深度学习框架 linux Linux内核源码 spring-boot
查看github.com上代码star排行

https://github.com/search?l=PHP&q=+stars%3A%3E0&ref=searchresults&type=Repositories 转载于:https://www.cnblogs.com/arvintang/p/5994615.html