GLM Flex

From MR Tools
Jump to: navigation, search

GLM_Flex is now officially depreciated. The GLM_Flex2 scripts are no longer available in the download packages. Please use the GLM_Flex_Fast family of tools.

NOTE: This information now refers to CreateDesign2, GLM_Flex2, and GLM_Flex_Contrast2. These scripts have been updated from the original version to properly return type III results in all cases.

NOTE: GLM_Flex is being phased out in favor of GLM_Flex_Fast2. I strongly recommend that you switch to using GLM_Flex_Fast2 as GLM_Flex2 will be phased out in the near future, and will no longer receive any support.

If you are new to the GLM Flex tools I recommend that you first acquaint yourself with the GLM_Flex_Fast2 toolkit. In general GLM_Flex_Fast2 is faster and easier to use, with 3 caveats:

1. GLM_Flex_Fast is not presently configured to use spm_reml variance/covariance correction.
If that doesn't matter to you, then go take a look at the GLM_Flex_Fast2 page.


What is GLM Flex?

GLM Flex is a set of second level neuroimaging analysis tools that I wrote (in MATLAB) as an alternative to the stock SPM8 utilities.

GLM Flex is written for MATLAB (2010B or newer), and requires and SPM8 installation.

GLM Flex was designed to be a general purpose second level analysis tool with the following features:

1. GLM Flex uses partitioned error terms.
2. GLM Flex can analyze voxels with missing data.
3. GLM Flex can be used to automatically remove outliers. 
4. GLM Flex can be used to run analyses with more factors than SPM8 will allow.
5. GLM Flex can be used to run second level models on FreeSurfer surface volumes (requires a freesurfer install).

General Note: Repeated Measures ANCOVA is not possible at this time.

Keep in mind that this is beta version software. I’m sure there are still quite a few bugs that I haven’t found yet, so feel free to let me know if things are acting up.

The creation of these scripts was supported in part by the NIH funded Harvard Aging Brain Study (P01AG036694), NIH R01-AG027435, and the Mass. General Hospital Corp.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.


GLM Flex Overview

A Quick Peak Inside GLM Flex

1. GLM Flex uses partitioned error terms.

The way this works is through the addition of additional interaction terms into the GLM model. For instance, a simple mixed design ANOVA has two error terms. The between subjects error term is computed from the total variance of the Subject Factor (minus any between effects), and the within factor error term (as well as the interaction term) is computed from the Subject by Within-Factor interaction minus all other effects and error terms. Note that the Subject by Within Factor interaction is an identity matrix and will capture 100% of the variance in the data. After subtracting out the other bits we are left with the model residual which is what SPM uses. In practice I’ve set things up so that the “Final” interaction term that would result in an identity matrix is left out of the model, and the full model residual is used as is.

2. GLM Flex can analyze voxels with missing data.

The scripts are setup to look for voxels with matching subjects, that is if we are doing a one sample t-test we start with the voxels that aren’t missing any data. Then we move on to Voxels that are missing only Subject1, then only Subject2 etc ..., then it’s voxels only missing Subject1 and Subject2, then Subject1 and Subject3 etc ... In practice we don’t look for every possible combination, rather we sort the data into sets with the same N, and then break those sets into the constituent sets of Subjects. In this way we can iteratively walk through the different sets in a fairly efficient manner without having to analyze each voxel independently.

3. GLM Flex can be used to automatically remove outliers.

To remove outliers we use Cook’s D (put in info on the threshold). Only a single data point is removed from a voxel at one time. If a data point is removed that voxel gets kicked into a set with a smaller N size (one that hasn’t been analyzed yet), and then Cook’s D is estimated and assess when the voxel is reanalyzed. Again this iterative process combined with the methodology for dealing with unequal N allows us to do this in an efficient iterative manner.

4. GLM Flex can be used to run analyses with more factors than SPM8 will allow.

Nothing too fancy about this, the design create tool (CreateDesign.m), allows for up to 6 way interactions, and hopefully no one will try pushing it beyond that.

5. GLM Flex can be used to run second level models on FreeSurfer surface volumes (requires a freesurfer install).

Again nothing fancy here. I simply leverage the MRIread.m and MRIwrite.m to get FS surface volumes into matlab, and then the data can be analyzed in the same manner as typical 3D Volumes.

6.Variance and Independence Corrections.

I borrowed the methodology from this directly from spm and I use the spm_reml script to estimate the variance and covariance hyper-parameters. The tricky bit was pooling variance across voxels when dealing with a sub-model where there are only a few voxels. Luckily all that needed to be done was to take the first pass covariance matrix from voxels without missing data, and simply pool the data from the sub-model with the sub-indexed portions of the original covariance matrix. This way there are always enough voxels entering into the covariance matrix to get reasonably reliable estimates. One note here, there are very subtle rounding error level difference between what you will get with my scripts and what you will get with SPM, so be warned that it is not 100% identical in the ideal case of comparing voxels with no missing data), but it should be 99.9% the same. A second note: The variance-covariance correction is computed on the pooled model not the full partitioned model (this is the equivalent model that would be used in spm, and lacks the Subject by Factor interaction terms). In general then partitioned model will be more robust against violations of equality of variance and sphereicity, but I haven’t tested this out in cases where there are many repeated factors with more than 3 levels apiece. In general, using these options shouldn’t hurt anything, but my impression thus far is that it won’t help that much either. If you find big differences between corrected and uncorrected models, please drop me a line, I would be interested in knowing in when this correction has a noticeable impact.

7. CreateDesign.m and MakeContrastMatrix.m

There is some fun stuff in these two scripts that I should explain, particularly about how the design creation/parsing works, and how the automatic Contrast Vector/Matrix generation works, but I will leave that for another day.

How to use GLM Flex

GLM Flex does not include a GUI so you will need to set up the analyses in your own matlab scripts.

Step 1 - Create a Model:

To create a GLM you being by creating the following data structure in MATLAB

IN.N_subs = [];
IN.Between = [];
IN.BetweenLabs = {};
IN.Within = [];
IN.WithinLabs = {};
IN.Covar = {};
IN.CovarLabs = {};
IN.CovarInt = {};
IN.Interactions = [];
IN.FactorLabs = {};
IN.EqualVar = [];
IN.Independent = [];

Let’s say we want to do a 3 level one way anova.

First we input the number of Subjects in each condition:

IN.N_subs = [12 10 8];

Next we enter the number of levels per Between Subjects IV. In this case we only have a single IV with three levels so we input:

IN.Between = [3];

Now we enter labels for the condition (note each IV is entered in it’s own curly brackets { }).

IN.BetweenLabs = { {‘Cond1’ ‘Cond2’ ‘Cond3’} }; %Note that there is a label for each condition

We don’t have any Within Factors so we can skip those, and we don’t have any covariates so we can skip those as well. Additionally, with only a single factor in the model there are no interactions to worry about so the next thing that needs to be entered are the Factor Labels:

IN.FactorLabs = {‘Factor1’};

Finally, we need to decide about using variance and covariance corrections. For each factor we enter our assumptions, and since there is only one factor there will be only one entry.

IN.EqualVar = [1];       %This means we are assuming equal variance across Conditions
IN.Independent = [1]; %This means we are assuming that the measurements are independent from one another.

So our configured data structure for this analysis looks like this:

IN.N_subs = [12 10 8];
IN.Between = [3];
IN.BetweenLabs = { {‘Cond1’ ‘Cond2’ ‘Cond3’} }; 
IN.FactorLabs = {‘Factor1’};
IN.EqualVar = [1];     
IN.Independent = [1]; 

We now pass this into the function CreateDesign2.m

F = CreateDesign2(IN);

This will create the GLM design matrix, variance, and error components and a bunch of other stuff that we will need later on. Note that at this point we have created the model, but we have not yet analyzed the data.

Step 2 - Evaluate the Model:

Now we can configure the actual data analysis (all defaults are shown):

I.OutputDir = pwd;             % ‘Path/to/where/the/analysis/files/will/be/written;
I.F = [];                               % This is the design we created in the first step;
I.Scans = {};                        % A cell array (vector) of input filenames in the appropriate condition order.
I.Mask = [];                        % An optional specification for a mask file - only voxels within the mask will be analyzed.
I.RemoveOutliers = 0;        % A boolean flag specifying if outliers should be removed. Default is 0 (no).
I.Thresh = 0;                     % An optional specification for the Cook's D outlier detection threshold.
I.DoOnlyAll = 0;                 % An optional flag to only analyze voxels where all data is present.
I.NoMiss = 0;                     % An optional flag that will set any voxel with any missing data to NaN.
I.minN = 2;                        % The minimum number of observations per condition required to analyze that voxel.
I.minRAT = 0;                    % An optional input to specify the minimal ratio of observations required to analyze a voxel.
I.writeIni = 0;                    % Write out the original input data as a 4D Nifti file.
I.writeFin = 0;                   % Write out the final data (i.e. data with dropped outliers and transformations performed) as a 4D Nifti file.
I.writeoo = 1;                    % Write out the oo.mat file, this file is necessary for re-running contrasts, but takes up a lot of file space.
I.writeI = 1;                       % Write out the I.mat file.  This will track the design, model, and output options.  This should almost always be left as =1
I.estSmooth = 1;               % Estimate spatial smoothness using spm_est_smoothness.m  - necessary for FWE correction - requires writing out residuals.
I.KeepResiduals = 0;         % If smoothness is estimated do you want to keep the residuals (4D Nifiti file) or should they be deleted?
I.Transform.FisherZ = 0;    % Perform a Fisher Z transformation prior to analysis
I.Transform.AdjustR2 = 0; % This will convert R-squared values to correlations, and then apply a Fisher Z transformation.
I.Transform.ZScore = 0;     %  Normalize the data voxel-wise prior to analysis (handy for getting con images that encode correlation values).
I.Transform.crtlFor = 0;     % Option to enter a set of covariates that will be controlled for prior to data analysis.
I.Transform.log = 0;          % Option to log-transform data prior to analysis.
I.Transform.MapNorm = 0; % Option to z-score each input map (scan-wise) prior to analysis

If you do not care about changing default values then you can skip specification of those parameters, so a simple specification of our model can be specified as:

 I.OutputDir = pwd;
 I.F = F;
 I.minN = 5;
 I.Scans = {

Step 3 - Specify Contrasts:

Contrasts can be added later and analyzed with GLM_Flex_Contrasts2.m (requires that oo.mat be saved), however processing speed is much better (fewer disk I/O cycles) if the contrasts are specified in advance.

The basic contrast specification is as follows:

I.Cons(1).name = ‘ContrastName’;   % optional specification of the contrast name
I.Cons(1).Groups = {1 2 3};              % specification of the columns of the design matrix that specify the grouping structure for the contrast
I.Cons(1).Levs = [3];                        % specification of the contrast structure (levels per IV in the contrast).
I.Cons(1).ET = 1;                             % specification of which error term to use when creating the statistical parametric maps.

In this contrast specification, we specify the groupings of data that belong to particular conditions; this is done with the Cons.Groups field. The default method is to specify the data groups in a cell array, so if there are three groups I specify which columns in the design matrix code those groups. Also note that {1 2 3} is not the same as {1:3}. The point of using cell arrays in this case is that it allows for the grouping of multiple conditions. For example lets say we want to compare the mean of conditions 1 and 2 to condition 3. We can do this by specifying {[1 2] 3} or {1:2 3}. This tells the scripts that there are two groups of data, and we are combining conditions 1 and 2 into a single group.

As an alternative, you can also manually specify contrasts. This would done as:

I.Cons(1).name = 'ContrastName'; I.Cons(1).c = [1 -1 0; 0 1 -1]'; I.Cons(1).ET = 1;

Let's also specify a post hoc between conditions 1 and 3

I.Cons(2).name = ‘Contrast2’;
I.Cons(2).Groups = {1 3};
I.Cons(2).Levs = [2];
I.Cons(2).ET = 1; 

Note that the Groups variable now points to columns 1 and 3 which code condition 1 and condition 3 respectively. The levels for this contrast are now 2 instead of 3, and we will use the same error terms to create the statistical maps.

An alternate form for specification for these structures that can make your scripts a little cleaner is:

i = 1; 
I.Cons(i) = struct('name','Contrast1','Groups',{num2cell([1 2 3])},'Levs',{[3]},  'ET',{1}); i = i+1;
I.Cons(i) = struct('name','Contrast2','Groups',{num2cell([1 3])},   'Levs',{[2]},  'ET',{1}); i = i+1;

This will produce the same structures as we just specified above.

Now that out contrasts are specified we can go ahead and run the analysis:

I = GLM_Flex2(I);

The analysis will now run. Keep in mind that for these scripts to work, the entire data set much be read in at once, so depending on the size of the dataset you will need a substantial amount of RAM (we recommend at least 8 Gigs), and the larger the data set the longer the analysis will take to run. Also note that the data was entered so that all the condition 1 images went in first, followed by condition 2, and then condition 3. It is crucial to make sure that the scans were entered in the correct order. The order needs to match the specification of the design matrix which can be found in I.F.XX or F.XX.

Here are some example of how to set and run a variety of different models:

One Sample T-Test

Two Sample T-Test

Paired Sample T-Test

One Way Between

One Way Within

2x2 Between ANOVA

2x2 Factor Within

Mixed Design ANOVA (Example data included!)

Simple Regression

Complicated Regression


Personal tools