Select Page

I recently launched https://PreMarketScanner.HerokuApp.com

Most day traders start their day by preparing a list of potential stocks to trade. They search for the highest gainers by percent change and/or by the volume traded.

However some traders still have their day job wherein quick and easy access to a software that scans pre-market data is impossible. I created my Do-It-Yourself (DIY) pre-market scanner to do my research with a few mouse clicks.

I searched for free APIs that provide pre-market data and but no luck. Most of them are either free for a trial period or you have to pay for a subscription.

So I built my pre-market stocks scanner using the technique called web scraping wherein data from websites are extracted.

PROs of web scraping:

  • No need to pay for expensive APIs that supports pre-market real-time data.
  • Ability to control what data you want to include in the results.

CONs of web scraping:

  • Highly-dependent on the HTML structure of the scraped websites. If the source websites change their HTML, the scanner may not return any result. This is not the most sustainable solution in creating a web application.

Sample result of scanner:

Source code (app.py) for the scanner:

from flask import Flask, request
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup, Comment
import datetime, time
from pytz import timezone

app = Flask(__name__)
BENZINGA_URL = 'https://www.benzinga.com/money/premarket-movers/'
YAHOO_FINANCE_URL = 'https://finance.yahoo.com/quote/{}/key-statistics?p={}'
YAHOO_LABEL_URL = 'https://finance.yahoo.com/quote/{}'
HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3'}
scanParams = {}


@app.route('/')
def default():
    return "<a href='/index?gainsPercent=5&minLast=.01&maxLast=100&minVolume=50000&minDollarValue=100000&minFloatValue=500000&maxFloatValue=200000000'>Click here to redirect to default scan parameters.</a>"


@app.route('/index')
def index():
    html = "<html><head><title>Premarket scanner</title>"
    html += "<link rel='stylesheet' href='/static/sorta.css'>"
    html += "<script src='/static/sort-table.js'></script>"
    html += "</head><body>"
    errorMessages = []
    scanParams['gainsPercent'] = float(request.args.get('gainsPercent'))
    scanParams['minLast'] = float(request.args.get('minLast'))
    scanParams['maxLast'] = float(request.args.get('maxLast'))
    scanParams['minVolume'] = float(request.args.get('minVolume'))
    scanParams['minFloatValue'] = float(request.args.get('minFloatValue'))
    scanParams['maxFloatValue'] = float(request.args.get('maxFloatValue'))
    scanParams['minDollarValue'] = float(request.args.get('minDollarValue'))

    gainers = []
    eligible_candidates = scrape_benzinga(gainers, errorMessages)
    passed_candidates = scrape_yahoo_finance(eligible_candidates, errorMessages)

    time_now = str(datetime.datetime.now(timezone('EST')))[:-7]
    html += time_now
    html += print_criteria()
    html += displayResults(passed_candidates, errorMessages)

    html += '<br/><br/><br/><br/>'
    html += '</body></html>'
    return html


def soup_maker(url):
    req = Request(url=url, headers=HEADERS)
    page = urlopen(req).read()
    soup = BeautifulSoup(page, 'html.parser')
    return soup


def scrape_benzinga(gainers, errorMessages):
    soup = soup_maker(BENZINGA_URL)
    for element in soup(text=lambda text: isinstance(text, Comment)):
        element.extract()
    benzingaGainers = soup.find('div', {'id': 'movers-stocks-table-gainers'})
    gainers = []
    for td in benzingaGainers.find_all('tr')[1:]:
        ticker = td.find('td', {'class': 'premarket-stock-table__cell--stock'}).text[:-1]
        percentChangeStr = td.find('td', {'class': 'premarket-stock-table__cell--change'}).text[:-1]
        lastPriceStr = td.find('td', {'class': 'premarket-stock-table__cell--price'}).text[1:]
        volumeStr = td.find('td', {'class': 'premarket-stock-table__cell--volume'}).text

        ticker = ticker.replace(" ", "").replace('\n', '')
        percentChangeFloat = percentChangeStr.replace(" ", "").replace('\n', '').replace('%', '')
        lastPrice = lastPriceStr.replace(" ", "").replace('\n', '').replace('%', '').replace('$', '')
        volumeStr = volumeStr.replace(" ", "").replace('\n', '').replace('%', '').replace('$', '')
        if volumeStr == '–' or volumeStr == '':
            volumeStr = '10'
        volumeInt = 0
        if volumeStr != '' and volumeStr != '–':
            volumeInt = convertFloatStrToInt(volumeStr)
            volumeStr = volumeInt
        percentChange = float(percentChangeFloat)
        lastPrice = float(lastPrice)
        dollarValueInt = round(volumeInt * lastPrice, 0)

        dollarValueStr = ''
        news = ''
        if percentChange > scanParams['gainsPercent'] and lastPrice >= scanParams['minLast'] and \
                lastPrice <= scanParams['maxLast'] and volumeInt > scanParams['minVolume'] and \
                dollarValueInt >= scanParams['minDollarValue']:
            dollarValueStr = f'${convertIntToText(dollarValueInt)}'
            gainer = {'ticker': ticker, 'volumeInt': volumeInt, 'volumeStr': volumeStr, 'percentChange': percentChange,
                      'percentChangeStr': percentChangeStr,
                      'lastPrice': lastPrice, 'dollarValueInt': dollarValueInt, 'dollarValueStr': dollarValueStr,
                      'news': news}
            gainers.append(gainer)
        else:
            reason = ''
            if percentChange < scanParams['gainsPercent']:
                reason += f"% Change {percentChange} is less than {scanParams['gainsPercent']}.  "
            if lastPrice < scanParams['minLast']:
                reason += f"Price {lastPrice} is less than {scanParams['minLast']}.  "
            if lastPrice > scanParams['maxLast']:
                reason += f"Price {lastPrice} is greater than {scanParams['maxLast']}.  "
            if volumeInt < scanParams['minVolume']:
                reason += f"Volume {volumeInt} is less than {scanParams['minVolume']}.  "
            if dollarValueInt < scanParams['minDollarValue']:
                reason += f"$ Value {dollarValueInt} is less than {scanParams['minDollarValue']}"
            err = f"Ignored Ticker={ticker} &nbsp; &nbsp; &nbsp; Pre-market Price=${lastPrice} &nbsp; &nbsp; &nbsp; %Change={percentChangeStr} &nbsp; &nbsp; &nbsp; Vol={volumeStr} &nbsp; &nbsp; &nbsp; $Val={dollarValueInt} Reason={reason}"
            errorMessages.append(err)
    return gainers


def scrape_yahoo_finance(gainers, errorMessages):
    filteredGainers = []
    for gainer in gainers:
        try:
            yahoo_fin = YAHOO_LABEL_URL.format(gainer['ticker'])
            soup = soup_maker(yahoo_fin)
            labelDiv = soup.find_all('title')[0]
            companyLabel = str(labelDiv.text).replace('Stock Price, News, Quote & History - Yahoo Finance', '')
            earningsDate = ''
            tds = soup.find('table', {'class': 'W(100%) M(0) Bdcl(c)'}).find_all('td',
                                                                                 {'data-test': 'EARNINGS_DATE-value'})
            for earningsTd in tds:
                earningsDate = earningsTd.text
            yahoo_fin = YAHOO_FINANCE_URL.format(gainer['ticker'], gainer['ticker'])
            soup = soup_maker(yahoo_fin)
            trs = soup.find_all('tr')
            for tr in trs:
                td = tr.find('td')
                if td:
                    if 'Float' in td.text and 'Short' not in td.text:
                        tickerFloatValueStr = tr.find('td', {'class': 'Fw(500) Ta(end) Pstart(10px) Miw(60px)'}).text

            divs = soup.find_all('fin-streamer', {'data-symbol': gainer['ticker']})

            for div in divs:
                if div['data-field'] == 'regularMarketChangePercent':
                    gainer['prevPercentChange'] = div.text.replace('(', '').replace(')', '').replace('%', '')
                if div['data-field'] == 'regularMarketPrice':
                    gainer['prevClose'] = div.text

            tickerFloatValue = convertFloatStrToInt(tickerFloatValueStr)
            if tickerFloatValue > scanParams['minFloatValue'] and tickerFloatValue < scanParams['maxFloatValue']:
                gainer['tickerFloatValueInt'] = convertFloatStrToInt(tickerFloatValueStr)
                gainer['tickerFloatValueStr'] = tickerFloatValueStr
                gainer['companyLabel'] = companyLabel
                gainer['earningsDate'] = earningsDate
                filteredGainers.append(gainer)
            else:
                reason = ''
                if tickerFloatValue < scanParams['minFloatValue']:
                    reason += f"Float {tickerFloatValue} is less than {scanParams['minFloatValue']}. "
                if tickerFloatValue > scanParams['maxFloatValue']:
                    reason += f"Float {tickerFloatValue} is greater than {scanParams['maxFloatValue']}.  "
                err = f"IGNORED {gainer['ticker']} Float {tickerFloatValueStr} &nbsp; &nbsp; &nbsp; Pre-market Price=${gainer['lastPrice']} &nbsp; &nbsp; &nbsp; %Change={gainer['percentChangeStr']} &nbsp; &nbsp; &nbsp; Vol={gainer['volumeStr']} &nbsp; &nbsp; &nbsp; $Val={gainer['dollarValueInt']} Reason={reason}"
                errorMessages.append(err)
        except Exception as e:
            print(f"|{gainer['ticker']}| no float {e}")
            gainer['tickerFloatValueStr'] = ''
            filteredGainers.append(gainer)
    return filteredGainers


def displayResults(filteredGainers, errorMessages):
    html = ''
    if len(filteredGainers) == 0:
        html = "<br/>No tickers match your criteria at the moment."
    else:
        ctr = 1
        html += "<br><b>SCAN RESULTS</b><br/><table border='1' class=\"js-sort-table\" id='premarket-gainers'>"
        html += "<thead><tr><th class=\"js-sort-number\"></th><th class=\"js-sort-number\"><b>Ticker</b></th><th class =\"js-sort-number\"><b>Company Name</b></th>  <th class =\"js-sort-number\"><b>Pre-market Price</b></th>  <th class =\"js-sort-number\"><b><b>Pre-market % Change</b></th>  <th class =\"js-sort-number\"><b>Pre-Market Volume</b></th>   <th class =\"js-sort-number\"><b>Pre-Market $ Value</b></th> <th class =\"js-sort-number\"><b>Last Close </th> <th class =\"js-sort-number\"><b>Last % Change</b></th>  <th class =\"js-sort-number\"><b>Float</b></th> <th class =\"js-sort-number\"><b>Earnings Date</b></th><th class =\"js-sort-number\"><b>News</b></th>  </tr></thead><tbody>"
        for filteredGainer in filteredGainers:
            html += f"<tr><td>{ctr}</td><td id='ticker'><a href=https://finance.yahoo.com/quote/{filteredGainer['ticker']} target='_blank'>{filteredGainer['ticker']}</a></td>  " \
                    f" <td>{filteredGainer['companyLabel']}</td>" \
                    f" <td>{filteredGainer['lastPrice']}</td>  <td>{filteredGainer['percentChange']}</td> <td><span title='{filteredGainer['volumeStr']}'></span>{filteredGainer['volumeInt']}</td>  <td>{filteredGainer['dollarValueInt']}</td>" \
                    f"<td>{filteredGainer['prevClose']}</td> <td>{filteredGainer['prevPercentChange']}</td> " \
                    f"<td>{filteredGainer['tickerFloatValueInt']}</td>" \
                    f"<td>{filteredGainer['earningsDate']}</td>" \
                    f"<td>{filteredGainer['news']}</td>" \
                    f"</tr>"
            ctr += 1
        html += '</tbody></table><br/>'
    errHtml = ''
    for err in errorMessages:
        errHtml += f"<br/>{err}"
    html += errHtml + '<br/><br/><br/><br/>'
    return html


def print_criteria():
    html = f"<br/><br/>This application uses <a href='https://www.benzinga.com/premarket/'>Benzinga Premarket Gainers</a> for the latest pre-market movers and Yahoo Finance for the stocks' float shares data.<br/><br/> <b>CRITERIA</b><br><table border='1'><tr><td><b>Minimum Last Price</b></td><td> ${scanParams['minLast']}</td></tr>"
    html += f"<tr><td><b>Max. Last Price</b></td><td> ${scanParams['maxLast']}</td></tr>"
    html += f"<tr><td><b>Min. Float</b></td><td> {convertIntToText(scanParams['minFloatValue'])}</td></tr>"
    html += f"<tr><td><b>Max. Float</b></td><td>{convertIntToText(scanParams['maxFloatValue'])}</td></tr>"
    html += f"<tr><td><b>Min. % Change</b></td><td> {scanParams['gainsPercent']}%</td></tr>"
    html += f"<tr><td><b>Min. Volume</b></td><td> {convertIntToText(scanParams['minVolume'])}</td></tr>"
    html += f"<tr><td><b>Min. $ Value</b></td><td> {convertIntToText(scanParams['minDollarValue'])}</td></tr></table>"
    return html


def convertFloatStrToInt(flt):
    flt = flt.lower()
    if 'b' in flt:  # B
        return int(float(flt[:-1]) * 1000000000)
    elif 'm' in flt:  # Million
        return int(float(flt[:-1]) * 1000000)
    elif 'k' in flt:  # thousand
        return int(float(flt[:-1]) * 1000)
    else:
        return int(float(flt[:-1]))


def convertIntToText(num):
    if num >= 1000000000:
        return str(round(float(num / 1000000000), 2)) + 'B'
    elif num >= 1000000:
        return str(round(float(num / 1000000), 2)) + 'M'
    elif num >= 1000:
        return str(round(float(num / 1000), 2)) + 'K'
    else:
        return num

Areas of Improvement:

  • You may scrape more than one websites for the pre-market gainers list so that if one website’s HTML structure fails, there are other fallback websites that will return you the list of gainers.
  • Include a column for Ex-dividends Date. Similar to Earnings Date, the fact that the company is issuing dividends may play a factor in the pre-market volume of the stock.
  • Adding a column for stocks’ daily chart can save you a few clicks in evaluating whether a stock is a potential play for the day.
  • If you can find a website with OTC premarket movers, it will be nice to add to the scanner.

Update: On February 14, the volume for Premarket Gainers list was not showing up. It could be a glitch in the site or some enhancement they are implementing (e.g. this data may not be accessible to the public). This is a classic example of why web scraping is not ideal in any application. I had to fix the errors brought to Benzinga’s recent change and I had to scrape another website so the scanner will return results.