stock-market-scraper is a command line tool which downloads all historical stock data in both csv
and json
formats from Yahoo Finance. This is for educational and reasearch purposes only.
Don't overuse this script. It puts loads on Yahoo Finance servers.
Photo by Igor Kozak for 10Clouds on dribbble
yahoo finance tickers
are saved. This is in the Assets
folderticker
s. This will bring the query pages, where yahoo finance holds it's historical stock data.Currently supports Yahoo Finance only.
It currently just downloads all stock data for all over the world from Yahoo Finance.
I am working on command line argument version where you will be able to download selected stocks with this snippet too.
This script can run on multiple Operating Systems. Follow the instructions mentioned below, according to your OS.
Since most (if not all) Linux/Debian OS come with python pre-installed, you don't have to install python manually. Make sure you're using python >= 3.5 though.
We need pip
to install any external dependenc(ies). So, open any terminal and type in pip list
and if it shows some data, then it is fine. But, if it shows error, like pip not found
or something along this line, then you need to install pip
. Just type this command in terminal :
sudo apt-get install python-pip
If you're on Fedora, CentOS/RHEL, openSUSE, Arch Linux, then you simply need to follow THIS TUTORIAL
to install pip
.
If this still doesn't work, then you'll manually need to install pip. Doing so is an easy one time job and you can follow THIS TUTORIAL
to do so.
requirements.text
file and put it in some directory/folder.pip install -r requirements.txt
If you're on windows, then follow these steps :
requirements.text
file and put it in some directory/folder.pip install -r requirements.txt
Now, install Node.Js as well and make sure it's in your path.
Well, if everything came up good without any error(s), then you're good to go!
Mac OS X users will have to fetch their version of Python
and Pip
.
After downloading and installing these, you need to add PIP & Python in your path. Follow THIS LITTLE GUIDE
to install both, Python & pip successfully.
Supports python >= 3.5
Follow the instructions according to your OS :
After you've saved this script in a directory/folder, you need to open command prompt
and browse to that directory and then execute the script. Let's do it step by step :
SHIFT
key and while holding down the SHIFT key, RIGHT CLICK
and select Open Command Prompt Here
from the options that show up.python stock-market-scraper.py
After you've saved this script in a directory/folder, you need to open command prompt
and browse to that directory and then execute the script. Let's do it step by step :
Ctrl + Alt + T
is the shortcut to do so (if you didn't know).python stock-market-scraper.py
Comics will be saved on the same directory you clone this repository. Here is how:
- --SomeDirectory (Where you cloned the repository)
|--stock-market-scraper
| |--requirements.txt
| |--.gitignore
| |--_config.yml
| |--stock-market-scraper.py
| |--stock-market-scraper.ipnyb
| |--readme.md
- |--historic_data
| |--json
| | |--(>63000) files.json
| |--csv
| | |--(>61000) files.csv
Yahoo has gone to a Reactjs front end which means if you analyze the request headers from the client to the backend you can get the actual JSON they use to populate the client side stores.
query1.finance.yahoo.com
HTTP/1.0query2.finance.yahoo.com
HTTP/1.1 difference between HTTP/1.0 & HTTP/1.1If you plan to use a proxy or persistent connections use query2.finance.yahoo.com
. But for the purposes of this post the host used for the example URLs is not meant to imply anything about the path it's being used with.
We will use HTTP/1.1
/v10/finance/quoteSummary/AAPL?modules=
(Full list of modules below)(substitute your symbol for: AAPL)
?modules=
query: 'assetProfile',
'incomeStatementHistory',
'incomeStatementHistoryQuarterly',
'balanceSheetHistory',
'balanceSheetHistoryQuarterly',
'cashflowStatementHistory',
'cashflowStatementHistoryQuarterly',
'defaultKeyStatistics',
'financialData',
'calendarEvents',
'secFilings',
'recommendationTrend',
'upgradeDowngradeHistory',
'institutionOwnership',
'fundOwnership',
'majorDirectHolders',
'majorHoldersBreakdown',
'insiderTransactions',
'insiderHolders',
'netSharePurchaseActivity',
'earnings',
'earningsHistory',
'earningsTrend',
'industryTrend',
'indexTrend',
'sectorTrend' ]
https://query1.finance.yahoo.com/v10/finance/quoteSummary/AAPL?modules=assetProfile%2CearningsHistory
Querying for: assetProfile
and earningsHistory
The %2C
is the Hex representation of ,
and needs to be inserted between each module you request. details about the hex encoding bit (if you care)
/v7/finance/options/AAPL
(current expiration)/v7/finance/options/AAPL?date=1579219200
(January 17, 2020 expiration)https://query2.finance.yahoo.com/v7/finance/options/AAPL
(current expiration)https://query2.finance.yahoo.com/v7/finance/options/AAPL?date=1579219200
(January 17, 2020 expiration)Any valid future expiration represented as a UNIX timestamp can be used in the ?date=
query. If you query for the current expiration the JSON response will contain a list of all the valid expirations that can be used in the ?date=
query. (here is a post explaining converting human readable dates to unix timestamp in Python)
/v8/finance/chart/AAPL?symbol=AAPL&period1=0&period2=9999999999&interval=3mo
&interval=3mo
3 months, going back until initial trading date.&interval=1d
1 day, going back until initial trading date.&interval=5m
5 minuets, going back 80(ish) days.&interval=1m
1 minuet, going back 4-5 days.How far back you can go with each interval is a little confusing and seems inconsistent. My assumption is that internally yahoo is counting in trading days and my naive approach was not accounting for holidays. Although that's a guess and YMMV.
period1=
: unix timestamp representation of the date you wish to start at. Values below the initial trading date will be rounded up to the initial trading date.
period2=
: unix timestamp representation of the date you wish to end at. Values greater than the last trading date will be rounded down to the most recent timestamp available.
Note: If you query with a period1=
(start date) that is too far in the past for the interval you've chosen, yahoo will return prices in the 3mo
interval regardless of what interval you requested.
&includePrePost=true
&events=div%2Csplit
https://query1.finance.yahoo.com/v8/finance/chart/AAPL?symbol=AAPL&period1=0&period2=9999999999&interval=1d&includePrePost=true&events=div%2Csplit
The above request will return all price data for ticker AAPL on a 1 day interval including pre and post market data as well as dividends and splits.
Note: the values used in the price example url for period1=
& period2=
are to demonstrate the respective rounding behavior of each input.
The above article is taken from
here.
Yahoo adjusts all historical prices to reflect a stock split. For example, ISRG
was trading around $1000 prior to 2017/10/06
. Then on 2017/10/06
, it underwent a 3-for-1 stock split. As you can see, Yahoo's historical prices divided all prices by 3 (both prior to and after 2017/10/06
):
For dividends, let's say stock ABC
closed at 200 on December 18. Then on December 19, the stock increases in price by $2
but it pays out a $1
dividend. In Yahoo's historical prices for XYZ, you will see that it closed at 200 on Dec 18 and 201 on Dec 19. Yahoo factors in the dividend in the "Adj Close" column for all the previous days. So the Close for Dec 18 would be 200, but the Adj Close would be 199.
For example, on 2017/09/15, SPY paid out a $1.235
dividend. Yahoo's historical prices say that SPY's closing price on 2017/09/14 was 250.09, but the Adj Close is 248.85, which is $1.24
lower. The Adjusted Close for the previous days was reduced by the dividend amount.
The above article is taken from
here.
import urllib.request, json , time, os, difflib, itertools
import pandas as pd
from multiprocessing.dummy import Pool
from datetime import datetime
try:
import httplib
except:
import http.client as httplib
def check_internet():
conn = httplib.HTTPConnection("www.google.com", timeout=5)
try:
conn.request("HEAD", "/")
conn.close()
# print("True")
return True
except:
conn.close()
# print("False")
return False
Now see below, I have opened an arbitrary stock Igarashi Motors
. In URL can you see the ticker for the stock? It is IGARASHI.BO
How to get the ticker, I will show you later.
First let us make a function that can pull json data
from yahoo about that stock like below. (I will discuss about the function parameters
later)
We will be using query2
get_historic_price
for given query_url
.json
and csv
inside a folder named "historic_data"def get_historic_price(query_url,json_path,csv_path):
stock_id=query_url.split("&period")[0].split("symbol=")[1]
if os.path.exists(csv_path+stock_id+'.csv') and os.stat(csv_path+stock_id+'.csv').st_size != 0:
print("<<< Historical data of "+stock_id+" already exists")
return
while not check_internet():
print("Could not connect, trying again in 5 seconds...")
time.sleep(5)
try:
with urllib.request.urlopen(query_url) as url:
parsed = json.loads(url.read().decode())
except:
print("||| Historical data of "+stock_id+" doesn't exist")
return
else:
if os.path.exists(json_path+stock_id+'.json') and os.stat(json_path+stock_id+'.json').st_size != 0:
os.remove(json_path+stock_id+'.json')
with open(json_path+stock_id+'.json', 'w') as outfile:
json.dump(parsed, outfile, indent=4)
try:
Date=[]
for i in parsed['chart']['result'][0]['timestamp']:
Date.append(datetime.utcfromtimestamp(int(i)).strftime('%d-%m-%Y'))
Low=parsed['chart']['result'][0]['indicators']['quote'][0]['low']
Open=parsed['chart']['result'][0]['indicators']['quote'][0]['open']
Volume=parsed['chart']['result'][0]['indicators']['quote'][0]['volume']
High=parsed['chart']['result'][0]['indicators']['quote'][0]['high']
Close=parsed['chart']['result'][0]['indicators']['quote'][0]['close']
Adjusted_Close=parsed['chart']['result'][0]['indicators']['adjclose'][0]['adjclose']
df=pd.DataFrame(list(zip(Date,Low,Open,Volume,High,Close,Adjusted_Close)),columns =['Date','Low','Open','Volume','High','Close','Adjusted Close'])
if os.path.exists(csv_path+stock_id+'.csv'):
os.remove(csv_path+stock_id+'.csv')
df.to_csv(csv_path+stock_id+'.csv', sep=',', index=None)
print(">>> Historical data of "+stock_id+" saved")
except:
print(">>> Historical data of "+stock_id+" could not be saved")
return
json
and csv
files will be saved which have been passed to the function get_historic_price()
json_path = os.getcwd()+os.sep+".."+os.sep+"historic_data"+os.sep+"json"+os.sep
csv_path = os.getcwd()+os.sep+".."+os.sep+"historic_data"+os.sep+"csv"+os.sep
os.mkdir
if not os.path.isdir(json_path):
os.makedirs(json_path)
if not os.path.isdir(csv_path):
os.makedirs(csv_path)
Now as promised I will be showing how to find historical data. See below, I have opened historical data of Igarashi Motors
. Here you can see max time period from which we can pull data for the stock. It stores period as unix timestamp
in the query.
Now let's make the query. First set
period1 = 0
period2 = 9999999999
interval = 1d
See the image below, it's period1
is greater than 0
and period2
is lesser than 9999999999
. This produces maximum span period from which data can be pulled.
yahoo finance tickers
are saved. This is in the Assets
folderHow did I get this? Well here is the direct link to download the yahoo ticker list (last updated September 2017). It would be helpful for the author if you visit his website page, as his income is through advertisements, and it takes lots of hours to create this type of ticker list.
All right, moving on.
ticker_file_path = "Assets"+os.sep+"Yahoo Ticker Symbols - September 2017.xlsx"
temp_df = pd.read_excel(ticker_file_path)
print("Total stocks:",len(temp_df))
temp_df.head(10)
Total stocks: 106331
Yahoo Stock Tickers | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | |
---|---|---|---|---|---|---|---|---|
0 | http://investexcel.net | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | Ticker | Name | Exchange | Category Name | Country | NaN | NaN | NaN |
3 | OEDV | Osage Exploration and Development, Inc. | PNK | NaN | USA | NaN | NaN | Samir Khan |
4 | AAPL | Apple Inc. | NMS | Electronic Equipment | USA | NaN | NaN | simulationconsultant@gmail.com |
5 | BAC | Bank of America Corporation | NYQ | Money Center Banks | USA | NaN | NaN | NaN |
6 | AMZN | Amazon.com, Inc. | NMS | Catalog & Mail Order Houses | USA | NaN | NaN | This ticker symbol list was downloaded from |
7 | T | AT&T Inc. | NYQ | Telecom Services - Domestic | USA | NaN | NaN | http://investexcel.net/all-yahoo-finance-stock... |
8 | GOOG | Alphabet Inc. | NMS | Internet Information Providers | USA | NaN | NaN | and was updated on 2nd September 2017 |
9 | MO | Altria Group, Inc. | NYQ | Cigarettes | USA | NaN | NaN | NaN |
temp_df = temp_df.drop(temp_df.columns[[5, 6, 7]], axis=1)
headers = temp_df.iloc[2]
df = pd.DataFrame(temp_df.values[3:], columns=headers)
print("Total stocks:",len(df))
df.head(10)
Total stocks: 106328
2 | Ticker | Name | Exchange | Category Name | Country |
---|---|---|---|---|---|
0 | OEDV | Osage Exploration and Development, Inc. | PNK | NaN | USA |
1 | AAPL | Apple Inc. | NMS | Electronic Equipment | USA |
2 | BAC | Bank of America Corporation | NYQ | Money Center Banks | USA |
3 | AMZN | Amazon.com, Inc. | NMS | Catalog & Mail Order Houses | USA |
4 | T | AT&T Inc. | NYQ | Telecom Services - Domestic | USA |
5 | GOOG | Alphabet Inc. | NMS | Internet Information Providers | USA |
6 | MO | Altria Group, Inc. | NYQ | Cigarettes | USA |
7 | DAL | Delta Air Lines, Inc. | NYQ | Major Airlines | USA |
8 | AA | Alcoa Corporation | NYQ | Aluminum | USA |
9 | AXP | American Express Company | NYQ | Credit Services | USA |
ticker
s. This will bring the query pages, where yahoo finance holds it's historical stock data.Example query is like this: https://query1.finance.yahoo.com/v8/finance/chart/
ticker?symbol=
ticker&period1=0&period2=9999999999&interval=1d&includePrePost=true&events=div%2Csplit
query_urls=[]
for ticker in df['Ticker']:
query_urls.append("https://query1.finance.yahoo.com/v8/finance/chart/"+ticker+"?symbol="+ticker+"&period1=0&period2=9999999999&interval=1d&includePrePost=true&events=div%2Csplit")
with Pool(processes=10) as pool:
pool.starmap(get_historic_price, zip(query_urls, itertools.repeat(json_path), itertools.repeat(csv_path)))
print("<|> Historical data of all stocks saved")
<<< Historical data of SBIN.NS already exists, Updating data...
<<< Historical data of IGARASHI.NS already exists, Updating data...
<<< Historical data of TATAMOTORS.NS already exists, Updating data...
<<< Historical data of TCS.NS already exists, Updating data...
>>> Historical data of TCS.NS saved
>>> Historical data of IGARASHI.NS saved
>>> Historical data of TATAMOTORS.NS saved
>>> Historical data of SBIN.NS saved
All downloads completed !
If your're planning to open an issue for the script or ask for a new feature or anything that requires opening an Issue, then please do keep these things in mind.
If you're going to report an issue, please follow this syntax :
Command You Gave : What was the command that you used to invoke the issue?
Expected Behaviour : After giving the above command, what did you expect shoud've happened?
Actual Behaviour : What actually happened?
Error Log : Error Log is mandatory.
If you're here to make suggestions, please follow the basic syntax to post a request :
Subject : Something that briefly tells us about the feature.
Long Explanation : Describe in details what you want and how you want.
stock-market-india A npm package which fetches data from Bombay & National Stock Exchange and provides an API to access it. National Stock Exchange (NSE) API Get the stock market status (open/closed)
Server architecture for Real-time Stock-market prediction with ML In this repository, I have developed the entire server-side principal architecture for real-time stock market prediction with Machine
Market-Enabler是一个可以忽悠安卓google play市场的软件,它可以模拟其他国家的网络运营商代码。Android市场根据国家/地区的不同区分市场,部分应用程序仅提供给特定的市场。 使用本软件需要root权限。
In-Stock 是一个 iOS 应用,用来检查你周边的苹果商店里 iPhone 是否有售。
这是一个 Android Market 的开源 API 项目。基于Google Protocol Buffers 议实现。它使用java.net.URL来与google应用商店服务器通信,能够运行在GoogleAppEngine和Android应用程序中。你可以利用它来浏览任何国家或本地的Android应用商店。通过关键字或包名来搜索Android应用。并通过返回的应用ID来获取这个应用的详细信息
NFT Market AWS Serverless Lambda Service To use together with NFT Marketplace Installation Make sure you have serverless installed npm install serverless -gnpm install Setup env file required IPFS_HOS