How to Find Insider Trading Data from SEC website (Using Python)

Using Investopedia’s definition of Insider Trading:

Insider trading involves trading in a public company’s stock by someone who has non-public, material information about that stock for any reason. Insider trading can be either illegal or legal depending on when the insider makes the trade.

The illegal way does not interest me, as it comes with penalties such as fines and jail time up to 20 years in prison. An example is when a CEO shares an information to a friend even before the public finds out about the information. Let’s say that the cure of a disease that their pharmaceutical company is developing will make the disease even worse. Then that friend uses that information to sell stocks and profit from the sale, before the official news comes out and the stock price plummets. The CEO and the friend could be prosecuted.

So when is insider trading legal? It can be legal as long as the transaction conforms to the rules set forth by the Securities and Exchange Commission (SEC). Those rules were passed into a law in 1934. The law is called Securities Exchange Act of 1934. When corporate insiders trade in their own company stock, they must report their trades to SEC. They file paperwork and that paperwork is all transparent in SEC website on EDGAR system, short for Electronic Data Gathering, Analysis, and Retrieval system

Directors and major shareholders have to publicly disclose it if they are buying or selling their own shares. Many investors and traders use this information to identify companies with investment potential. If insiders are using their money to buy huge amount of shares of their company, it is probably a good sign. They may have a good reason and information backing up their decision to buy.

In this blog post, I wrote a Python program that reads the Insider Trading data publicly available on the SEC website. Thanks to this post from Wrigters.io, their tutorial did most of the heavy lifting and I was able to start coding. Then I refined the logic and displayed the results on an HTML table.

Here’s a sample output of the Python code

Areas of Improvement:

The ticker symbols “hard-coded” in this Python code are FAANG (FB, AAPL, AMZN, NFLX and GOOG). This can be further improved by searching the filings for all 12,000+ publicly-traded companies.
For this sample code, “hard-coded” dates are from 2022-05-13 to 2022-06-13. Add an option to filter by filing dates gives more flexibility.
The output is saved in an HTML file in the local file system. It would be nice to create an API or a web application.
The code only displays buying and selling transactions on stocks. It can be further improved by including the options trading.

PART 1 – Setup Python project workspace

Step 1. Install Pycharm Community Edition in https://www.jetbrains.com/pycharm/download/
This tutorial uses the latest stable release pycharm-community-2021.1.2.exe

Step 2. Open Pycharm. Create New Project with a Virtual environment. The latest Pycharm installer will always have new virtual environment installed.

Step 3. Create new file with filename requirements.txt. Enter the following library names

pandas
requests>=2.25.1
yfinance
bs4
beautifulsoup4>=4.9.1
lxml

Step 4. Select “Terminal” on the bottom panel. This will launch a command line. Execute the command to install the libraries listed in requirements.txt

pip3 install -r requirements.txt

PART 2 – Writing the Python script

Write these lines of codes in main.py

Full code

import pandas as pd
import re
import copy
import requests, json
import yfinance as yf
from bs4 import BeautifulSoup
from lxml import etree
from datetime import datetime as dt
headers = {"User-Agent": "website.com email@email.com"}
TICKERS_CIK_URL = "https://www.sec.gov/files/company_tickers.json"
FILING_XML_URL = "https://www.sec.gov/Archives/edgar/data/"
FILING_SUBMISSION_URL = "https://data.sec.gov/submissions/CIK"

def make_url(cik, row):
    accession_number = row["accessionNumber"].replace("-", "")
    return f"{FILING_XML_URL}{cik}/{accession_number}/{row['accessionNumber']}.txt"


def get_element_text(doc, path):
    ret_str = ""
    if len(doc.xpath(path)) > 0:
        ret_str = doc.xpath(path)[0].text
    return ret_str


def get_transaction_type(acquired_or_disposed):
    if acquired_or_disposed == "D":
        transaction_type = "Sale"
    elif acquired_or_disposed == "A":
        transaction_type = "Purchase"
    return transaction_type


def get_transaction(elem2, total1, total2, ticker):
    transaction_date = ""
    transaction_code = ""
    valid_transaction = True
    acquired_or_disposed = ""
    transaction_form_type = ""
    total1 = 0
    total2 = 0
    for elem3 in elem2.getchildren():
        if elem3.tag == "transactiondate":
            for elem4 in elem3.getchildren():
                if elem4.tag == "value":
                    transaction_date = elem4.text
        if elem3.tag == "transactioncode":
            for elem4 in elem3.getchildren():
                if elem4.tag == "transactioncode":
                    transaction_code = elem4.text
                if elem4.tag == "transactionformtype":
                    transaction_form_type = elem4.text
            if (transaction_code != "S" and transaction_code != "P") or (transaction_form_type != "4"):
                valid_transaction = False
                break
        if elem3.tag == "transactionamounts":
            shares = 0
            price = 0
            acquired_or_disposed = ""
            for elem4 in elem3.getchildren():
                if elem4.tag == "transactionacquireddisposedcode":
                    for elem5 in elem4.getchildren():
                        if elem5.tag == "value":
                            acquired_or_disposed = elem5.text
                elif elem4.tag == "transactionshares":
                    for elem5 in elem4.getchildren():
                        if elem5.tag == "value" and elem5.text is not None:
                            shares = float(elem5.text)
                elif elem4.tag == "transactionpricepershare":
                    for elem5 in elem4.getchildren():
                        if elem5.tag == "value" and elem5.text is not None:
                            price = float(elem5.text)
                    if price == 0:
                        ticker_data = yf.Ticker(ticker)
                        ticker_data = ticker_data.history(start=transaction_date, end=transaction_date)
                        try:
                            price = ticker_data["Close"][0]
                        except Exception as e:
                            print(f"exception {e}")
                            price = 1
            if elem2.tag == "nonderivativetransaction" and acquired_or_disposed == "D":
                stock_sale_amt = price * shares
                total2 += stock_sale_amt
            elif elem2.tag == "nonderivativetransaction" and acquired_or_disposed == "A":
                stock_bought_amt = price * shares
                total1 += stock_bought_amt
    transaction_detail = {"acquired_or_disposed": acquired_or_disposed,
                          "total_stocks_bought_dollar": total1, "total_stocks_sold_dollar": total2,
                          "valid_transaction": valid_transaction}
    return transaction_detail


def create_bought_transaction(transaction):
    t = copy.copy(transaction)
    t["acquired_or_disposed"] = "A"
    t["total_stocks_sold_dollar"] = 0.0
    return t


def create_sold_transaction(transaction):
    t = copy.copy(transaction)
    t["acquired_or_disposed"] = "D"
    t["total_stocks_bought_dollar"] = 0.0
    return t


def convert_int_to_text(num):
    if num >= 1000000000:
        return str(round(float(num / 1000000000), 2)) + "B"
    elif num >= 1000000:
        return str(round(float(num / 1000000), 2)) + "M"
    elif num >= 1000:
        return str(round(float(num / 1000), 2)) + "K"
    else:
        return num


def get_document(cik, row, ticker):
    transactions = []
    url = make_url(cik, row)
    print(url)

    res = requests.get(url, headers=headers)
    res.raise_for_status()
    soup = BeautifulSoup(res.content, "html.parser")
    # use a case insensitive search for the root node of the XML document
    docs = soup.find_all(re.compile("ownershipDocument", re.IGNORECASE))
    if len(docs) > 0:
        doc = etree.fromstring(str(docs[0]))
        if docs[0].name == "ownershipdocument":
            owner = get_element_text(doc, "/ownershipdocument/reportingowner/reportingownerid/rptownername")
            is_director = get_element_text(doc, "/ownershipdocument/reportingowner/reportingownerrelationship/isdirector")
            if is_director == "1" or is_director == "true":
                is_director = "Yes"
            elif is_director == "0" or is_director == "false":
                is_director = "No"
            is_officer = get_element_text(doc, "/ownershipdocument/reportingowner/reportingownerrelationship/isofficer")
            if is_officer == "1" or is_officer == "true":
                is_officer = "Yes"
            elif is_officer == "0" or is_officer == "false":
                is_officer = "No"
            title = get_element_text(doc, "/ownershipdocument/reportingowner/reportingownerrelationship/officertitle")
            if title == "" and is_director == "Yes":
                title = "Director"
            other_text = get_element_text(doc, "/ownershipdocument/reportingowner/reportingownerrelationship/othertext")
            if title == "" and other_text is not None:
                title = other_text
            if title is None:
                title = ""
            is_ten_percent_owner = get_element_text(doc, "/ownershipdocument/reportingowner/reportingownerrelationship/istenpercentowner")
            if is_ten_percent_owner == "1" or is_ten_percent_owner == "true":
                is_ten_percent_owner = "Yes"
            elif is_ten_percent_owner == "0" or is_ten_percent_owner == "false":
                is_ten_percent_owner = "No"
            issuer_trading_symbol = get_element_text(doc, "/ownershipdocument/issuer/issuertradingsymbol")
            security = get_element_text(doc, "//securitytitle/value")
            date = get_element_text(doc, "//transactiondate/value")
            total_stocks_bought_dollar = 0
            total_stocks_sold_dollar = 0
            transaction = {}
            valid_transaction = False
            for elem in doc.getchildren():
                transaction["symbol"] = ticker
                transaction["issuer_trading_symbol"] = issuer_trading_symbol
                transaction["owner"] = owner
                transaction["security"] = security
                transaction["date"] = date
                transaction["is_director"] = is_director
                transaction["is_officer"] = is_officer
                transaction["is_ten_percent_owner"] = is_ten_percent_owner
                transaction["title"] = title
                if elem.tag == "nonderivativetable":
                    transaction = {}
                    for elem2 in elem.getchildren():
                        if elem2.tag == "nonderivativetransaction":
                            total1 = total_stocks_bought_dollar
                            total2 = total_stocks_sold_dollar
                            transaction = get_transaction(elem2, total1, total2, issuer_trading_symbol)
                            if not transaction["valid_transaction"]:
                                valid_transaction = False
                            elif transaction["valid_transaction"]:
                                if transaction["acquired_or_disposed"] == "A":
                                    total_stocks_bought_dollar += transaction["total_stocks_bought_dollar"]
                                if transaction["acquired_or_disposed"] == "D":
                                    total_stocks_sold_dollar += transaction["total_stocks_sold_dollar"]
                                valid_transaction = True
                                transaction["total_stocks_bought_dollar"] = total_stocks_bought_dollar
                                transaction["total_stocks_sold_dollar"] = total_stocks_sold_dollar
                                accession_number = row["accessionNumber"].replace("-", "")
                                transaction["url"] = f"{cik}/{accession_number}/{row['accessionNumber']}.txt"
        else:
            raise ValueError(f"Don't know how to process {docs[0].name}")

        if valid_transaction is True:
            if transaction["total_stocks_bought_dollar"] != 0:
                bought_transaction = create_bought_transaction(transaction)
                transactions.append(bought_transaction)
            if transaction["total_stocks_sold_dollar"] != 0:
                sold_transaction = create_sold_transaction(transaction)
                transactions.append(sold_transaction)
        return transactions


def main():
    r = requests.get(TICKERS_CIK_URL)
    companies = json.loads(r.content)
    tickers = ["FB", "AMZN", "AAPL", "NFLX", "GOOG"]
    cik_lookup = dict([(val["ticker"], val["cik_str"]) for key, val in companies.items()])
    rows = []
    try:
        for ticker in tickers:
            print(ticker)
            cik = cik_lookup[ticker]
            edgar_filings = requests.get(f"{FILING_SUBMISSION_URL}{cik:0>10}.json", headers=headers).json()
            recents = pd.DataFrame(edgar_filings["filings"]["recent"])
            recents["filingDate"] = pd.to_datetime(recents["filingDate"])
            insider_q1 = recents[(recents["form"] == "4") &
                                 (recents["filingDate"] >= "2022-05-13") &
                                 (recents["filingDate"] <= "2022-06-13")]
            insider_q1.shape
            for i in range(len(insider_q1)):
                transactions = get_document(cik, insider_q1.iloc[i], ticker)
                for trans in transactions:
                    rows.append(trans)
    except Exception as e:
        print(e)
        pass

    HTML_HEAD = "<html> <head> <title> </title> <link rel='stylesheet' href='/static/sorta.css'><script src='/static/sort-table.js'></script></head>"
    HTML_HEAD += "<body><table border='1' class =\'js-sort-table\'><thead><tr> <th class=\'js-sort-string\'><b>Ticker</b></th><th class=\'js-sort-string\'><b>Insider Trader</b></th> <th class=\'js-sort-string\'><b>Issuer </b></th>  " \
                 "<th class=\'js-sort-string\'><b>Director?</b></th> <th class=\'js-sort-string\'><b>Officer?</b></th>  <th class=\'js-sort-string\'><b>10% Owner</b></th> <th class=\'js-sort-string\'><b>Title</b></th>    <th class=\'js-sort-string\'><b>Transaction Date</b></th>   <th class=\'js-sort-number\'><b>Value ($)</b></th>   <th class=\'js-sort-string\'><b>Type</b></th>  <th class=\'js-sort-string\'><b>Filing Document</b></th>  </thead></tr>"
    html = HTML_HEAD
    row_ctr = 1
    for row in rows:
        row_url = f"<a href=\"{FILING_XML_URL}{row['url']}\" target=\"_blank\">Filing</a>"
        transaction_type = get_transaction_type(row['acquired_or_disposed'])
        total = 0
        if transaction_type == "Sale":
            total = round(row['total_stocks_sold_dollar'])
        elif transaction_type == "Purchase":
            total = round(row['total_stocks_bought_dollar'])
        formatted_total = "{:,}".format(total)
        html += f"<tr>    <td>{row['symbol']}</td>  <td>{row['owner']}</td>  <td>{row['issuer_trading_symbol']}</td> <td>{row['is_director']}</td> <td>{row['is_officer']}</td>   <td>{row['is_ten_percent_owner']}</td>   <td>{row['title']}</td><td>{row['date']}</td> <td align='right' title='${convert_int_to_text(total)}'>{formatted_total}</td> <td>{transaction_type}</td> <td>{row_url}</td></tr>"
        row_ctr += 1
    html += "</table><br><br></body></html>"
    now = dt.now()

    filename = "output\edgar" + now.strftime("%m%d_%H%M%S")
    file = open(f"{filename}.html", "a")
    file.write(html)


if __name__ == "__main__":
    main()

PART 3 – Running the Python Code

Step 1. From Pycharm Terminal, run

python main.py

Step 2. Open the file generated by the Python program. The filename should look like this: edgar0621_212022.html

The last column contains the link to the filing found in SEC Edgar archives. An example is this link for Filing for FB.

Congratulations!

You may also explore the official API documentation from the SEC EDGAR website https://www.sec.gov/edgar/sec-api-documentation to further improve collecting insider trading information.

References:

An introduction to accessing financial data in EDGAR, using Python

Insider Trading – The Legal and Illegal

Post Views: 3,530

How to Find Insider Trading Data from SEC website (Using Python)

Areas of Improvement:

PART 1 – Setup Python project workspace

PART 2 – Writing the Python script

Full code

PART 3 – Running the Python Code

References:

Social Media Accounts

Links to My Projects

Popular Posts

Recent Posts

Archives

Categories