Removing Records by Duplicate Values
Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how...continue reading.
Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how...continue reading.
In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a...continue reading.
In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: data = pd.read_table(‘/home/liuwensui/Documents/data/csdata.txt’) In [4]: Y = data.LEV_LT3 In [5]: X = sm.add_constant(data[[‘COLLAT1’, ‘SIZE1’, ‘PROF2’, ‘LIQ’,...continue reading.
SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by...continue reading.
In finance and investing the term portfolio refers to the collection of assets one owns. Compared to just holding a single asset at a time a portfolio has a number...continue reading.
In this post I like to illustrate the R package “ape” for phylogenetic trees for the purpose of assembling trees. The function read.tree creates a tree from a text...continue reading.
library(chron) library(zoo) # STOCK TICKER OF Fifth Third Bancorp stock <- ‘FITB’ # DEFINE STARTING DATE start.date <- 1 start.month <- 1 start.year <- 2012 # DEFINE ENDING DATE end.date...continue reading.
################################################# ## FIT A MULTIVARIATE ADAPTIVE REGRESSION ## ## SPLINES MODEL (MARS) USING MDA PACKAGE ## ## DEVELOPED BY HASTIE AND TIBSHIRANI ## ##############################################…continue reading.
Machine Learning and Kernels A common application of machine learning (ML) is the learning and classification of a set of raw data features by a ML algorithm or technique. In...continue reading.
The implied option volatility reflects the price premium an option commands. A trader’s profit and loss ‘P&L’ from hedging option positions is driven to a large extend by the actual...continue reading.
Cohort analysis is super important if you want to know if your service is in fact a leaky bucket despite nice growth of absolute numbers. There’s a good write up...continue reading.
How do you easily get beautiful calendar heatmaps of time series in ggplot2? E.g:From MarginTaleI was impressed by the lattice-based implementation from Paul Bleicher of Humedica, which you can find...continue reading.
THIS BLOG DOES NOT CONSTITUTE INVESTMENT ADVICE. ACTING ON IT WILL MOST LIKELY BE DETRIMENTAL TO YOUR FINANCIAL HEALTH.After following some R-related quant finance blogs like Timely Portfolio, Systematic Investor or Quantitative tho…continue reading.
Guest post by Daniel Adler. Below is a real-time audio-visual multimedia demonstration – or in short ‘an intro’ – written in 100% pure R. It requires no compilation and runs...continue reading.