For statistical inference, Gibbs Sampling is commonly used, especially in Baysian Inference. It is a Markov Chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations when directly sampling from a multivariate probability distribution is difficult. Since it is an randomized algorithm, the result produced each time may be different.
Why we need Gibbs Sampling? When the joint distribution is unknown and it is very difficult to sample from the distribution directly, but the conditional distribution of each random variable is known and could be easy to sample from, then one can take samples from conditional distribtuion of each variable. After thousands times of sampling, a markov chain constituted by the sequences of samples can be obtained, and stationary distribution of the chain is just the joint distribution we are seeking.
Gibbs sampling generates a Markov Chain of samples so that each sample is correlated with nearby samples. Therefore, if independent random samples are desired, one must pay attention to use this method. Moreover, samples in the burn-in period generated by Gibbs sampling may not accurately represent the desired distributions.
If one want to get \(K\) samples of from a joint distribution , let \(i\)th sample is denoted by , the procedure of Gibbs Sampling is:
(1) Begin with some initial value \(\mathbf{X^{(0)}}\).
(2) Sample each component variable \(x_{j}^{(i+1)}\) from the distribution of that varibale conditioned on all other variables.
(3) Repeat the above step \(K\) times.
Following is an example Python program for Gibbs Sampling.
Considering BivariateNormal distrbition case, we define a new function gibbs to make Gibbs Sampling:
(Note: Before using the Gibbs Sampling program shown above, one must create a new class bvn which contains BivariateNormal methods and plot_bvn_rho,plot_bvn methods. This class can be found in my GitHub)
Set the correlation between the two random variables as \(\rho = 0.5\) and marginal mean, variance, we generate 400 samples.
Then we plot the theoretical BivariateNormal distribution and our samples generated by Gibbs Sampling.
As an comparison, we set the correlation between two random variables as 0.97, then see what will happen.
According to sample paths figures, it is indicated that when the correlation between x and y is small, the sample paths look much more like random. With \(\rho=0.97\), the sample paths seem highly correlated.
In this article, I am going to talk about how to crawl data from Internet with R, and then store the data into MySQL database.
When someone conducts quantitative analysis on financial markets, data is an imperative element. Beside using Bloomberg, DataStream or some other tools to get data, we can also crawl data from various website, especially when the data you need are displayed on different pages, e.g, daily closed prices of stocks contained in the S&P 500 from 2001-01-01 to 2015-08-01, which are often displayed on multi pages. Stock prices can be easily found on the Yahoo Finance. However, if downloading data from stock to stock, one needs to open more 500 pages and copy these data repetitively. If this work is finished manually, it will be very time-consuming. Therefore, crawling technique can be very efficient.
Taking crawling trading information of stocks contained in the S&P 500 as the example:
Since we have already get all the symbols of stocks listed in S\&P500 index, the next step is to find the data we need on the Internet and parse the websites then crawling all the information we need. Taking crawling stock price data from Yahoo Finance as exmaple:
After crawling all the data required, one can try to connect database with R. RMySQL is a powerful package that can connect R with database and manipulate data directly in R. Followings are some basic instructions of using MySQL in R.
Installation environment: OS X Yosemite, Python 2.7.9
The official API MySQLdb is contained in the package MySQL-python. Therefore, when using pip to install the MySQLdb, just simply execute the command in Terminal:pip install MySQL-python.
Solve an error Reason: image not found
After installing the MySQL-python successfully, let’s import this package in the Python to connect to MySQL database. However, an error as follows may occur:
Don’t worry. The solution is easy. Just open the terminal and execute command:
Then re-open your Python and import the package again, I think the problem will probably be solved. Yeah! :)