How to scrape S&P 500 ticker symbols in 5 simple steps (Python)
Step 1: Finding the ticker symbols online (URL)
URL = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
Step 2: Importing the libraries we need
import bs4 as bs
import requests
import pandas as pd
Step 3: Scrapping the Data
We are going to use our requests library to fetch the HTML content and convert it into an beautifulsoup object.
#Getting the page content using request
resp = requests.get(URL)
soup = bs.BeautifulSoup(resp.content)
Step 4: Finding the Table and Parsing the tickers (FUN Part)
We are not interested in the page content, we only want the ticker symbols from this page. The ticker symbols are contained in an HTML table <table> with a class name : “wikitable sortable”.
We are going to use beautifulsoup to parse the Table out of this page.
#getting the table of stock list
table = soup.find('table', {'class': 'wikitable sortable'})
We are not interested in the whole table we just need the ticker symbols.
The ticker symbol is hidden inside a table data<td> which is hidden inside a table row <tr>
#list to save all the tickers
tickers = []
#Parsing through the HTML Page to find ticker
for row in table.findAll('tr')[1:]: #ignoring the header row
ticker = row.findAll('td')[0].text
# print(ticker) #DEBUG_STATMENT
tickers.append(ticker)
Step 5: Writing the data to a CSV
If you are wondering why we import pandas in step 1? well this is it pandas will help us convert our list to a CSV file.
#Converting the list to pandas data frame
df = pd.DataFrame(tickers)
df = df.replace('\n','', regex=True)
df.columns = ["Ticker"]
#Write the data frame to a csv
df.to_csv('s_&_p_500_tickers.csv')