澳洲 bilogy Assignment 代写

  • 100%原创包过,高质量代写&免费提供Turnitin报告--24小时客服QQ&微信:273427
  • Data Mining Process
    Project/Business Understanding:
    Identify potential benefits, risks and
    efforts of successful project.
    澳洲 bilogy Assignment 代写
    Data Understanding: Sufficient
    relevant data
    Visual assessment of basic
    relationships and properties
    Data quality (missing values)
    Abnormal cases (outliers)
    Data Preparation: Selection,
    correction and modification of data
    Modeling: Extract knowledge out of
    data in the form of a model
    Predictive – Explanatory
    Evaluation
    Deployment
    Data Mining Cycle
    Data Understanding
    澳洲 bilogy Assignment 代写
    Main Goal
    Gain general insights about the data that will potentially be
    helpful for the further steps in the data analysis process
    Not driven exclusively by goals and methods of later steps
    Approach data from neutral viewpoint
    Never trust data before carrying out simple plausibility
    checks
    At the end of Data Understanding we know much better
    whether the assumptions we made during the Project
    Understanding phase concerning: representativeness,
    informativeness, and data quality are justified
    Visualisation: Overview of basic characteristics of data and
    check plausibility
    Simple statistics
    Outliers, missing values, data quality
    Data Visualisation
    Bar chart: Frequency distribution for categorical attribute
    Histogram: Frequency distribution for numerical attribute
    澳洲 bilogy Assignment 代写
    Divide values into bins and show a bar plot of the number of
    objects in each bin
    Height of each bar indicates the number of objects in bin
    Shape of histogram depends on number of bins
    Boxplots
    Very compact method to visualise distribution of one
    attribute
    Many boxplots can fit in single plot: Useful for comparing
    distributions
    Scatterplots
    Relationship between two attributes (linear/ non-linear)
    Axes represent two considered attributes
    Each instance in the dataset is represented by a point
    Correlation between attributes
    Outliers
    With class label info: Separability of classes
    Correlation Analysis
    Scatterplots can give us an idea about correlations
    between pairs of variables
    Pearson’s correlation coefficient: Measure of linear
    association between 2 numerical attributes. Always
    between -1 and 1
    Even if a functional dependency exists between two
    attributes and the function is monotone, if it is non-linear
    then Pearson’s correlation coefficient can be far away from
    -1 and 1
    Rank correlation coefficients overcome this by relying on
    the ordering of the values of the attributes: Spearman’s rho
    Outliers
    Outlier
    A value or a data object that is far away or very different from
    most or all of the other data
    Intuitive but imprecise definition
    It might be worthwhile to exclude outliers from analysis
    Different methods more robust to outliers than others
    Categorical Attribute: value that occurs with very low
    frequency
    Numerical Attribute: Detection much more difficult.
    Boxplot, Scatterplot
    For multidimensional data much more complicated
    approaches need to be used
    Missing Values
    Missing values: One of the most important problems in real
    applications
    Not one best way of handling missing values
    Missing Completely at Random: No special
    circumstances or special values of the variable in question
    lead to higher or lower chances for values to be missing
    Missing at Random: Probability of a missing value
    depends on some other variable(s) Y but conditionally
    on Y it is independent of the value of X
    Nonignorable missing: Occurrence of missing values
    directly depends on the true value of the attribute
    Distinguish Between Types of Missing Values
    Distinction between MCAR and MAR: In case of MAR
    other attributes can be used to predict whether value is
    missing
    Turn considered attribute into binary variable: 1 if value
    exists, 0 if it is missing
    Build a classifier to predict binary variable using as inputs
    other variables
    Determine error rate
    MCAR: Error rate is approx. equal to proportion of missing
    values
    MAR: Error rate is significantly lower (it could also be
    non-ignorable missing)
    In general not possible to distinguish non-ignorable
    missing from the other two cases using only available data
    Treating Missing Values
    Explicit Value: Replace with new value for attribute
    MISSING (nominal attributes)
    If the fact that the value is missing carries information
    about the value itself (non-ignorable missing) introduction
    of new value can help because it can express an intention
    not captured by other attributes
    Better Approach Introduce new binary variable indicating
    that value was missing in original dataset and then
    substitute missing value
    If neither other attributes or imputed value help but the fact
    that the value was missing is important, binary variable
    captures this
    If no such missing value pattern is present the imputed
    value can be used without introducing MISSING value
    Relevance of Attribute
    More realistic problem
    Information available: X = H: P(G) high, X = L: P(G) low
    General Decision Rule
    Given the risk forecast of an applicant, X = {H,L}:
    o(G|X = x) =P(G|X = x)
    P(B|X = x)>‘
    g
    Relevance of Attribute
    More realistic problem
     
    TN + FP
    Sensitivity: Minimise misclassification of Class 1 records
    (also called Recall)
    Specificity: Minimise misclassification of Class 0 records
    ROC Curve
    Critical points on ROC curve
    (TPR,FPR)
    (0,0): All records classified 0
    (1,1): All records classified 1
    (0,1): Ideal model
    Random Classifier: Diagonal Line
    Below diagonal line: Prediction is opposite of true class
    Good classifier: As close as possible to upper left corner
    Area Under ROC (AUC): Summarises ROC curve into a
    single number
    Cost-Sensitive Learning
    Cost of Misclassification
    C(i,j): Cost of misclassifying a pattern from class i to class j
    Cost Matrix:
    Predicted Class
    C(i,j)
    1
    Increasing variable xjby 1:
    Increases log(o(1|xi)) by βj
    Increases o(1|xi) by factor of eβj
    If xjis binary then xj= 1 increases o(1|xi) by eβj
    Synopsis Logistic Regression
    Linear predictor:
    Accommodates quantitative and qualitative variables
    (dummy)
    Enables transformations and combinations (interactions)
    while retaining interpretability. Logistic regressions extends
    this idea to binomial data
    Explanatory model:
    Contribution of individual variables
    Model comparison – Model selection
    Confidence interval (not covered)
    Linear relationship between attribute values and probability
    of success
    Non-linearities can be overcome using discretisation
    Decision Trees
    Decision Tree Approach
    Ask series of questions about attributes to determine class
    Build decision tree from top to bottom (from root to leaves)
    Greedy selection of a test attribute
    Compute an evaluation measure for all attributes
    Select the attribute with the best evaluation
    Greedy Strategy
    Grows a decision tree by making a series of locally optimal
    decisions about how to partition the data
    Divide and conquer / recursive descent
    Divide examples according to the values of the test attribute
    Apply the procedure recursively to the subsets (Hunt’s
    algorithm)
    Characteristics of Decision Tree Induction
    Non-parametric: No assumptions about the type of
    probability distributions satisfied by the data
    Finding optimal decision tree is computationally infeasible:
    Greedy heuristic approaches
    Decision tree induction algorithms construct trees quickly
    even for very large train sets
    Easy to interpret: Especially for small trees
    Robust to presence of noise: Especially when methods to
    avoid overfitting are employed
    Redundant attributes do not adversely affect accuracy
    If dataset contains many irrelevant attributes then some
    could be accidentally chosen by tree-growing algorithm.
    Feature selection
    Characteristics of Decision Tree Induction
    Data fragmentation: Number of records at leaf nodes can
    become too small to make statistically significant decision
    – Impose threshold on minimum number of records per
    node
    Subtree can be replicated many times within a decision
    tree making the model more complex and harder to
    interpret
    Robust performance w.r.t. choice of impurity measure
    Treatment of missing values
    Small changes in train set can yield entirely different tree
    Performance is robust
    Performance adversely affected by too many interval
    scaled variables (Discretisation)
    Artificial Neural Networks (ANN)
    ANNs inspired by attempts to model biological neural
    systems
    Brain consists of a large number of interconnected simple
    processing units (neurons)
    Learning in human brain takes place by changing the
    strength of the synaptic connection between neurons
    through repeated stimulation by the same impulse
    Perceptron
    Perceptron: Simple Model of a Neuron
    Each input node is connected via a weighted link to the
    summing junction
    Weights emulate strength of synaptic connection between
    neurons
    Training adapts weights to reduce error
    Can solve linearly separable problems
    Artificial Neural Networks
    Number of simple processing
    units (nodes)
    Organised in Layers
    Output layer: Returns prediction
    Input layer: Receives inputs
    Hidden layers: Layers between
    input and output layers
    Topology: 5 × 3 × 1
    Multilayer Perceptrons: Only
    Feed-forward connections
    More complicated decision boundaries can be
    approximated using more nodes and more layers
    Design Issues in ANNs
    Systems that combine automatic feature extraction with
    classification process
    Increasing the number of hidden nodes and the number of
    hidden layers ANNs can become very flexible classifiers
    Flexibility can easily result to Overfitting
    Selecting appropriate topology is Critical
    No general rule for how to choose the number of hidden
    layers and the size of the hidden layers
    Small neural networks might not be flexible enough to fit the
    data. Large neural networks tend to overfitting
    Cannot handle missing values
    Black box models: Explaining what an ANN has learned is
    not straightforward
    Very sensitive to chosen feature vector: Variable selection
    and preprocessing necessary
    Ensemble Methods
    Central Idea
    Improve accuracy by combining predictions of multiple
    classifiers
    Conditions for performance improvement
    1 Base classifiers (close to) independent
    2 Base classifiers better than random guessing
    Constructing Ensemble Classifiers: Bagging
    Bagging – Bootstrap Aggregating
    Create many training sets
    through Bootstrapping
    (resampling with replacement)
    Build classifier for each train set
    Use majority vote to predict
    Reduces variance of base classifiers
    Unstable classifier: Sensitive to minor perturbations in
    train data
    Bagging reduces generalisation error of unstable classifiers
    (Decision trees, Neural networks, k–nearest neighbours)
    Can be detrimental for stable/ robust classifiers because
    the size of the train set is reduced
    Does not focus on particular instances of training data
    Boosting
    Example: Weights determine sampling distribution
    Initially all weights are equal 1/N
    At each round i = 1,2,...
    Draw bootstrap sample Dibased on weights
    Base classifier built on Diand used to classify all examples
    from original dataset D
    Increase weights of misclassified examples
    Misclassified examples more likely to be chosen in
    subsequent rounds
    Attention focused on difficult to classify examples