Classification is used to assign an observation to a specific category within
a set of categories. Here are two different types of classification: binary classification
and multiclass classification.
With binary classification, one can assign elements of a set into two groups based a rule.
Some of the methods used for binary classification are: decision trees, bayesian
networks, support vector machines and probit model.
One method of SVM is the linear SVM. The linear SVM can be used to classify data into two classes. The following code can be used to import modules to accomplish the linear SVM.
from sklearn import svm
from sklearn import metrics
from sklearn.neighboers import KNeighborsClassifier
With clustering, one can group observation/objects into clusters. Observations in one cluster are more similar than to those in other groups/clusters. Below is a list of some of the algorithms used for clustering:
- Fuzzy clustering
- Expectation Maximization
- BIRCH
- DBSCAN
- K-Means
Balanced Iterative Reducing and Clusterting using Hierarchies(BIRCH) can cluster large datasets by creating a summary of the large dataset to retain as much info as possible. In order to use the BIRCH algorithm, one can import the following modules:
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import Birch
In order to use the DBSCAN algorithm, one can import the following module:
from sklearn.cluster import DBSCAN
Regression can be used to estimate relationship between a dependent variables and one or more independent variables. There are different types of regression: linear regression, logistic regression, stepwise regression, and classifiers. Below is a module to import for logistic regression:
from sklearn.linear_model import LogisticRegression
Below is a module to import for linear regression
from sklearn.linear_model import LinearRegression
One should note that there is no method for calculating power and sample size. We can take into account the rule made by Good and Hardin.
Feature engineering consists of extracting features from data. It can be used for predictive models and has been seen in code competitions.
Here is a list of feature engineering techniques: imputation, categorical encoding, binning, scaling, log transform, feature selection, and feature grouping.
Reinforcement learning is part of machine learning in which we look at how agents take action in an environment to maximize the notion of cumulative reward. This fiel is studied in other disciplines such as game theory, control theory, operations research, information theory, smilation-based optimization, multi-agent sysstems, swarm intelligence, and statistics.