In this article, I am going to talk about how to crawl data from Internet with R, and then store the data into MySQL database.
When someone conducts quantitative analysis on financial markets, data is an imperative element. Beside using Bloomberg, DataStream or some other tools to get data, we can also crawl data from various website, especially when the data you need are displayed on different pages, e.g, daily closed prices of stocks contained in the S&P 500 from 2001-01-01 to 2015-08-01, which are often displayed on multi pages. Stock prices can be easily found on the Yahoo Finance. However, if downloading data from stock to stock, one needs to open more 500 pages and copy these data repetitively. If this work is finished manually, it will be very time-consuming. Therefore, crawling technique can be very efficient.
Taking crawling trading information of stocks contained in the S&P 500 as the example:
Since we have already get all the symbols of stocks listed in S\&P500 index, the next step is to find the data we need on the Internet and parse the websites then crawling all the information we need. Taking crawling stock price data from Yahoo Finance as exmaple:
After crawling all the data required, one can try to connect database with R. RMySQL is a powerful package that can connect R with database and manipulate data directly in R. Followings are some basic instructions of using MySQL in R.