# MATLAB Tutorial

Welcome to the MATLAB tutorial version of Data Science Rosetta Stone. Before beginning this tutorial, please check to make sure you have MATLAB installed.

Note: In MATLAB,

```% This is a single line comment.
%{ This is a paragraph
comment %}
```

Now let’s get started!

# 1 Reading in Data and Basic Statistical Functions

## 1.1 Read in the data.

### a) Read the data in as a .csv file.

First specify the format of the variables to be read in: %C = character, %d = integer, %f = floating point number.

```formatSpec = '%C%C%d%f%f';
student = readtable('class.csv', 'Delimiter', ',', 'Format', formatSpec);
```

### b) Read the data in as a .xls file.

MATLAB reads tables from .xlsx formats, which is another version of an Excel file.

```student_xlsx = readtable('class.xlsx');
```

### c) Read the data in as a .json file.

```student_json = jsondecode(fileread('class.json'));
```

## 1.2 Find the dimensions of the data set.

```disp(size(student));
```
```    19     5
```
size() | disp()

## 1.3 Find basic information about the data set.

```summary(student);
```
```Variables:

Name: 19×1 categorical

Values:

Alfred     1
Alice      1
Barbara    1
Carol      1
Henry      1
James      1
Jane       1
Janet      1
Jeffrey    1
John       1
Joyce      1
Judy       1
Louise     1
Mary       1
Philip     1
Robert     1
Ronald     1
Thomas     1
William    1

Sex: 19×1 categorical

Values:

F     9
M    10

Age: 19×1 int32

Values:

Min       11
Median    13
Max       16

Height: 19×1 double

Values:

Min       51.3
Median    62.8
Max         72

Weight: 19×1 double

Values:

Min       50.5
Median    99.5
Max        150
```
summary()

## 1.4 Look at the first 5 (last 5) observations.

The ":" operator tells MATLAB to print all columns (variables), while "1:5" indicates to print only the first 5 observations.

```disp(student(1:5,:));
```
```     Name      Sex    Age    Height    Weight
_______    ___    ___    ______    ______

Alfred     M      14       69      112.5
Alice      F      13     56.5         84
Barbara    F      13     65.3         98
Carol      F      14     62.8      102.5
Henry      M      14     63.5      102.5
```
disp()

## 1.5 Calculate means of numeric variables.

```age = table2array(student(:,3));
disp(mean(age));
```
```   13.3158
```
```height = table2array(student(:,4));
disp(mean(height));
```
```   62.3368
```
```weight = table2array(student(:,5));
disp(mean(weight));
```
```  100.0263
```
table2array() | mean() | disp()

## 1.6 Compute summary statistics of the data set.

```numeric_vars = student(:,{'Age', 'Height', 'Weight'});
statarray = grpstats(numeric_vars, [], {'min', 'median', 'mean', 'max'});
disp(statarray);
```
```           GroupCount    min_Age    median_Age    mean_Age    max_Age    min_Height    median_Height    mean_Height    max_Height    min_Weight    median_Weight    mean_Weight    max_Weight
__________    _______    __________    ________    _______    __________    _____________    ___________    __________    __________    _____________    ___________    __________

All    19            11         13            13.316      16         51.3          62.8             62.337         72            50.5          99.5             100.03         150

```
grpstats() | disp()

## 1.7 Descriptive statistics functions applied to variables of the data set.

```weight = table2array(student(:,5));
disp(std(weight));
```
```   22.7739
```
```disp(sum(weight));
```
```   1.9005e+03
```
```disp(length(weight));
```
```    19
```
```disp(max(weight));
```
```   150
```
```disp(min(weight));
```
```   50.5000
```
```disp(median(weight));
```
```   99.5000
```
table2array() | std() | sum() | length() | max() | min() | median()

## 1.8 Produce a one-way table to describe the frequency of a variable.

### a) Produce a one-way table of a discrete variable.

```tabulate(age);
```
```  Value    Count   Percent
1        0      0.00%
2        0      0.00%
3        0      0.00%
4        0      0.00%
5        0      0.00%
6        0      0.00%
7        0      0.00%
8        0      0.00%
9        0      0.00%
10        0      0.00%
11        2     10.53%
12        5     26.32%
13        3     15.79%
14        4     21.05%
15        4     21.05%
16        1      5.26%
```
tabulate()

### b) Produce a one-way table of a categorical variable.

```sex = table2array(student(:,{'Sex'}));
tabulate(sex);
```
```  Value    Count   Percent
F        9     47.37%
M       10     52.63%
```
table2array() | tabulate()

## 1.9 Produce a two-way table to describe the frequency of two categorical or discrete variables.

```crosstable = varfun(@(x) length(x), student, 'GroupingVariables', {'Age' 'Sex'}, 'InputVariables', {});
disp(crosstable);
```
```    Age    Sex    GroupCount
___    ___    __________

11     F      1
11     M      1
12     F      2
12     M      3
13     F      2
13     M      1
14     F      2
14     M      2
15     F      2
15     M      2
16     M      1
```
varfun()

## 1.10 Select a subset of the data that meets a certain criterion.

```% Find the indices of those students who are females, and then get those observations
% from the student data frame.
females = student(student.Sex == 'F',:);
disp(females(1:5,:));
```
```     Name      Sex    Age    Height    Weight
_______    ___    ___    ______    ______

Alice      F      13     56.5         84
Barbara    F      13     65.3         98
Carol      F      14     62.8      102.5
Jane       F      12     59.8       84.5
Janet      F      15     62.5      112.5
```

## 1.11 Determine the correlation between two continuous variables.

The first argument of the cat function is dim, which is specified as 2 here to indicate to concatenate column-wise.

```height_weight = cat(2,table2array(student(:,4)),table2array(student(:,5)));
disp(corr(height_weight));
```
```    1.0000    0.8778
0.8778    1.0000
```
table2array() | cat() | corr()