I recently launched https://PreMarketScanner.HerokuApp.com
Most day traders start their day by preparing a list of potential stocks to trade. They search for the highest gainers by percent change and/or by the volume traded.
However some traders still have their day job wherein quick and easy access to a software that scans pre-market data is impossible. I created my Do-It-Yourself (DIY) pre-market scanner to do my research with a few mouse clicks.
I searched for free APIs that provide pre-market data and but no luck. Most of them are either free for a trial period or you have to pay for a subscription.
So I built my pre-market stocks scanner using the technique called web scraping wherein data from websites are extracted.
PROs of web scraping:
- No need to pay for expensive APIs that supports pre-market real-time data.
- Ability to control what data you want to include in the results.
CONs of web scraping:
- Highly-dependent on the HTML structure of the scraped websites. If the source websites change their HTML, the scanner may not return any result. This is not the most sustainable solution in creating a web application.
Sample result of scanner:
Source code (app.py) for the scanner:
from flask import Flask, request from urllib.request import urlopen, Request from bs4 import BeautifulSoup, Comment import datetime, time from pytz import timezone app = Flask(__name__) BENZINGA_URL = 'https://www.benzinga.com/money/premarket-movers/' YAHOO_FINANCE_URL = 'https://finance.yahoo.com/quote/{}/key-statistics?p={}' YAHOO_LABEL_URL = 'https://finance.yahoo.com/quote/{}' HEADERS = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3'} scanParams = {} @app.route('/') def default(): return "<a href='/index?gainsPercent=5&minLast=.01&maxLast=100&minVolume=50000&minDollarValue=100000&minFloatValue=500000&maxFloatValue=200000000'>Click here to redirect to default scan parameters.</a>" @app.route('/index') def index(): html = "<html><head><title>Premarket scanner</title>" html += "<link rel='stylesheet' href='/static/sorta.css'>" html += "<script src='/static/sort-table.js'></script>" html += "</head><body>" errorMessages = [] scanParams['gainsPercent'] = float(request.args.get('gainsPercent')) scanParams['minLast'] = float(request.args.get('minLast')) scanParams['maxLast'] = float(request.args.get('maxLast')) scanParams['minVolume'] = float(request.args.get('minVolume')) scanParams['minFloatValue'] = float(request.args.get('minFloatValue')) scanParams['maxFloatValue'] = float(request.args.get('maxFloatValue')) scanParams['minDollarValue'] = float(request.args.get('minDollarValue')) gainers = [] eligible_candidates = scrape_benzinga(gainers, errorMessages) passed_candidates = scrape_yahoo_finance(eligible_candidates, errorMessages) time_now = str(datetime.datetime.now(timezone('EST')))[:-7] html += time_now html += print_criteria() html += displayResults(passed_candidates, errorMessages) html += '<br/><br/><br/><br/>' html += '</body></html>' return html def soup_maker(url): req = Request(url=url, headers=HEADERS) page = urlopen(req).read() soup = BeautifulSoup(page, 'html.parser') return soup def scrape_benzinga(gainers, errorMessages): soup = soup_maker(BENZINGA_URL) for element in soup(text=lambda text: isinstance(text, Comment)): element.extract() benzingaGainers = soup.find('div', {'id': 'movers-stocks-table-gainers'}) gainers = [] for td in benzingaGainers.find_all('tr')[1:]: ticker = td.find('td', {'class': 'premarket-stock-table__cell--stock'}).text[:-1] percentChangeStr = td.find('td', {'class': 'premarket-stock-table__cell--change'}).text[:-1] lastPriceStr = td.find('td', {'class': 'premarket-stock-table__cell--price'}).text[1:] volumeStr = td.find('td', {'class': 'premarket-stock-table__cell--volume'}).text ticker = ticker.replace(" ", "").replace('\n', '') percentChangeFloat = percentChangeStr.replace(" ", "").replace('\n', '').replace('%', '') lastPrice = lastPriceStr.replace(" ", "").replace('\n', '').replace('%', '').replace('$', '') volumeStr = volumeStr.replace(" ", "").replace('\n', '').replace('%', '').replace('$', '') if volumeStr == '–' or volumeStr == '': volumeStr = '10' volumeInt = 0 if volumeStr != '' and volumeStr != '–': volumeInt = convertFloatStrToInt(volumeStr) volumeStr = volumeInt percentChange = float(percentChangeFloat) lastPrice = float(lastPrice) dollarValueInt = round(volumeInt * lastPrice, 0) dollarValueStr = '' news = '' if percentChange > scanParams['gainsPercent'] and lastPrice >= scanParams['minLast'] and \ lastPrice <= scanParams['maxLast'] and volumeInt > scanParams['minVolume'] and \ dollarValueInt >= scanParams['minDollarValue']: dollarValueStr = f'${convertIntToText(dollarValueInt)}' gainer = {'ticker': ticker, 'volumeInt': volumeInt, 'volumeStr': volumeStr, 'percentChange': percentChange, 'percentChangeStr': percentChangeStr, 'lastPrice': lastPrice, 'dollarValueInt': dollarValueInt, 'dollarValueStr': dollarValueStr, 'news': news} gainers.append(gainer) else: reason = '' if percentChange < scanParams['gainsPercent']: reason += f"% Change {percentChange} is less than {scanParams['gainsPercent']}. " if lastPrice < scanParams['minLast']: reason += f"Price {lastPrice} is less than {scanParams['minLast']}. " if lastPrice > scanParams['maxLast']: reason += f"Price {lastPrice} is greater than {scanParams['maxLast']}. " if volumeInt < scanParams['minVolume']: reason += f"Volume {volumeInt} is less than {scanParams['minVolume']}. " if dollarValueInt < scanParams['minDollarValue']: reason += f"$ Value {dollarValueInt} is less than {scanParams['minDollarValue']}" err = f"Ignored Ticker={ticker} Pre-market Price=${lastPrice} %Change={percentChangeStr} Vol={volumeStr} $Val={dollarValueInt} Reason={reason}" errorMessages.append(err) return gainers def scrape_yahoo_finance(gainers, errorMessages): filteredGainers = [] for gainer in gainers: try: yahoo_fin = YAHOO_LABEL_URL.format(gainer['ticker']) soup = soup_maker(yahoo_fin) labelDiv = soup.find_all('title')[0] companyLabel = str(labelDiv.text).replace('Stock Price, News, Quote & History - Yahoo Finance', '') earningsDate = '' tds = soup.find('table', {'class': 'W(100%) M(0) Bdcl(c)'}).find_all('td', {'data-test': 'EARNINGS_DATE-value'}) for earningsTd in tds: earningsDate = earningsTd.text yahoo_fin = YAHOO_FINANCE_URL.format(gainer['ticker'], gainer['ticker']) soup = soup_maker(yahoo_fin) trs = soup.find_all('tr') for tr in trs: td = tr.find('td') if td: if 'Float' in td.text and 'Short' not in td.text: tickerFloatValueStr = tr.find('td', {'class': 'Fw(500) Ta(end) Pstart(10px) Miw(60px)'}).text divs = soup.find_all('fin-streamer', {'data-symbol': gainer['ticker']}) for div in divs: if div['data-field'] == 'regularMarketChangePercent': gainer['prevPercentChange'] = div.text.replace('(', '').replace(')', '').replace('%', '') if div['data-field'] == 'regularMarketPrice': gainer['prevClose'] = div.text tickerFloatValue = convertFloatStrToInt(tickerFloatValueStr) if tickerFloatValue > scanParams['minFloatValue'] and tickerFloatValue < scanParams['maxFloatValue']: gainer['tickerFloatValueInt'] = convertFloatStrToInt(tickerFloatValueStr) gainer['tickerFloatValueStr'] = tickerFloatValueStr gainer['companyLabel'] = companyLabel gainer['earningsDate'] = earningsDate filteredGainers.append(gainer) else: reason = '' if tickerFloatValue < scanParams['minFloatValue']: reason += f"Float {tickerFloatValue} is less than {scanParams['minFloatValue']}. " if tickerFloatValue > scanParams['maxFloatValue']: reason += f"Float {tickerFloatValue} is greater than {scanParams['maxFloatValue']}. " err = f"IGNORED {gainer['ticker']} Float {tickerFloatValueStr} Pre-market Price=${gainer['lastPrice']} %Change={gainer['percentChangeStr']} Vol={gainer['volumeStr']} $Val={gainer['dollarValueInt']} Reason={reason}" errorMessages.append(err) except Exception as e: print(f"|{gainer['ticker']}| no float {e}") gainer['tickerFloatValueStr'] = '' filteredGainers.append(gainer) return filteredGainers def displayResults(filteredGainers, errorMessages): html = '' if len(filteredGainers) == 0: html = "<br/>No tickers match your criteria at the moment." else: ctr = 1 html += "<br><b>SCAN RESULTS</b><br/><table border='1' class=\"js-sort-table\" id='premarket-gainers'>" html += "<thead><tr><th class=\"js-sort-number\"></th><th class=\"js-sort-number\"><b>Ticker</b></th><th class =\"js-sort-number\"><b>Company Name</b></th> <th class =\"js-sort-number\"><b>Pre-market Price</b></th> <th class =\"js-sort-number\"><b><b>Pre-market % Change</b></th> <th class =\"js-sort-number\"><b>Pre-Market Volume</b></th> <th class =\"js-sort-number\"><b>Pre-Market $ Value</b></th> <th class =\"js-sort-number\"><b>Last Close </th> <th class =\"js-sort-number\"><b>Last % Change</b></th> <th class =\"js-sort-number\"><b>Float</b></th> <th class =\"js-sort-number\"><b>Earnings Date</b></th><th class =\"js-sort-number\"><b>News</b></th> </tr></thead><tbody>" for filteredGainer in filteredGainers: html += f"<tr><td>{ctr}</td><td id='ticker'><a href=https://finance.yahoo.com/quote/{filteredGainer['ticker']} target='_blank'>{filteredGainer['ticker']}</a></td> " \ f" <td>{filteredGainer['companyLabel']}</td>" \ f" <td>{filteredGainer['lastPrice']}</td> <td>{filteredGainer['percentChange']}</td> <td><span title='{filteredGainer['volumeStr']}'></span>{filteredGainer['volumeInt']}</td> <td>{filteredGainer['dollarValueInt']}</td>" \ f"<td>{filteredGainer['prevClose']}</td> <td>{filteredGainer['prevPercentChange']}</td> " \ f"<td>{filteredGainer['tickerFloatValueInt']}</td>" \ f"<td>{filteredGainer['earningsDate']}</td>" \ f"<td>{filteredGainer['news']}</td>" \ f"</tr>" ctr += 1 html += '</tbody></table><br/>' errHtml = '' for err in errorMessages: errHtml += f"<br/>{err}" html += errHtml + '<br/><br/><br/><br/>' return html def print_criteria(): html = f"<br/><br/>This application uses <a href='https://www.benzinga.com/premarket/'>Benzinga Premarket Gainers</a> for the latest pre-market movers and Yahoo Finance for the stocks' float shares data.<br/><br/> <b>CRITERIA</b><br><table border='1'><tr><td><b>Minimum Last Price</b></td><td> ${scanParams['minLast']}</td></tr>" html += f"<tr><td><b>Max. Last Price</b></td><td> ${scanParams['maxLast']}</td></tr>" html += f"<tr><td><b>Min. Float</b></td><td> {convertIntToText(scanParams['minFloatValue'])}</td></tr>" html += f"<tr><td><b>Max. Float</b></td><td>{convertIntToText(scanParams['maxFloatValue'])}</td></tr>" html += f"<tr><td><b>Min. % Change</b></td><td> {scanParams['gainsPercent']}%</td></tr>" html += f"<tr><td><b>Min. Volume</b></td><td> {convertIntToText(scanParams['minVolume'])}</td></tr>" html += f"<tr><td><b>Min. $ Value</b></td><td> {convertIntToText(scanParams['minDollarValue'])}</td></tr></table>" return html def convertFloatStrToInt(flt): flt = flt.lower() if 'b' in flt: # B return int(float(flt[:-1]) * 1000000000) elif 'm' in flt: # Million return int(float(flt[:-1]) * 1000000) elif 'k' in flt: # thousand return int(float(flt[:-1]) * 1000) else: return int(float(flt[:-1])) def convertIntToText(num): if num >= 1000000000: return str(round(float(num / 1000000000), 2)) + 'B' elif num >= 1000000: return str(round(float(num / 1000000), 2)) + 'M' elif num >= 1000: return str(round(float(num / 1000), 2)) + 'K' else: return num
Areas of Improvement:
- You may scrape more than one websites for the pre-market gainers list so that if one website’s HTML structure fails, there are other fallback websites that will return you the list of gainers.
- Include a column for Ex-dividends Date. Similar to Earnings Date, the fact that the company is issuing dividends may play a factor in the pre-market volume of the stock.
- Adding a column for stocks’ daily chart can save you a few clicks in evaluating whether a stock is a potential play for the day.
- If you can find a website with OTC premarket movers, it will be nice to add to the scanner.
Update: On February 14, the volume for Premarket Gainers list was not showing up. It could be a glitch in the site or some enhancement they are implementing (e.g. this data may not be accessible to the public). This is a classic example of why web scraping is not ideal in any application. I had to fix the errors brought to Benzinga’s recent change and I had to scrape another website so the scanner will return results.