User Tools

Site Tools


analysis:course-w16:week16

Pairwise co-occurrence

Goals

  • Learn to think about possible ensemble firing patterns and ways to characterize them
  • Implement, in detail, an analysis of pairwise co-occurrence during putative “replay” events
  • Apply a permutation test (shuffle) to determine levels of chance (independent) co-occurrence

Introduction

As you have seen in the module on decoding, hippocampal place cells tend to be active in specific locations within an environment. As a rat moves, it passes through a number of place fields:

Consider this rat moving from right to left on a linear track. Passing through the place field of each cell (indicated by different colored circles) the firing order will be blue-cyan-green-yellow-orange-red. This firing order corresponds to the order different locations in the environment were experienced.

In rats, memories of past experiences may be re-expressed in association with sharp wave-ripples (SWRs) in the local field potential: the spiking order of place cells during rest is correlated with the field order experienced in the environment (Wilson & McNaughton 1994; Foster & Wilson 2006; Karlsson et al. 2009). This non-local, ordered spiking activity has been called “replay”:

Shown schematically are three putative SWRs, associated with a “forward” replay (cells active in same order as experienced), an indeterminate replay (no order detectable; may be a replay of some different experience from which we do not know the correct place cell order), and a “reverse” replay.

Suppose we want to quantify how much a rat is recalling a certain trajectory (“experience”). We could identify how many times certain trajectories are replayed, like the left trajectory or the right trajectory along arms of a T-maze. There are a few ways of doing this, including co-activation analysis, sequence analysis, and decoding. This module will cover co-activation analysis.

Overview of sharp wave-ripple associated spiking activity analyses:

  • Co-activation analysis (covered here): are place fields active together during sharp wave-ripple events?
  • Sequence analysis: is spiking order (in time) correlated with field order (on the track)?
  • Decoding: given what we know about field locations (tuning curves), does a given combination of spikes represent a place in the environment?

Co-activation: concept and overall workflow

Co-activation or co-occurrence analysis allows us to examine the content of “replay” without direct reference to the field order of place cells in the environment (Cheng and Frank, 2008). By grouping place cells into categories such as “left-arm place cells” and “right-arm place cells” we can ask if left cells are significantly more active together than right cells, which would suggest that the rat may have been recalling the left arm more frequently than the right arm.

The logic of looking at pairwise co-occurrence is illustrated in the figure above. If a random selection of cells is active in each SWR, then their co-occurrence would not differ from what we expect based on the activity of each cell individually. This is the case for the “red” cells D-E-F in the diagram. If, in contrast, spiking patterns during SWRs form sequences, then cells with nearby place fields will tend to be active together, as is the case for the “blue” cells A-B-C. This concept of co-occurrence will be made more explicit for real data in the steps below.

The overall workflow for co-occurrence analysis looks like this:

  1. Generate candidate events: these are intervals that may contain replays, generally associated with sharp wave-ripple events (SWRs) in the LFP
  2. Estimate place fields: get locations in the environment where cells increased their firing rates during behavior
  3. Categorize place cells: we want cells that were active on the left or right arm only.
  4. Make a Q-matrix: Bin spiking activity during SWRs.
  5. Get cell participation during SWRs: are right or left cells more active during SWRs?
  6. Get joint probability for cell pairs: how often are right cells active together? How often are left cells active together?
  7. Get z-scored coactivity for cell pairs: are the cell pairs co-active at greater than chance levels? (If the cells are firing randomly, we still expect them to co-occur to some degree)

This module will first take you through how existing code accomplishes these steps. Then, the final section implements a basic version of the same analysis from scratch, so that you can get to know the guts of the analysis and build on it yourself.

Step-by-step

Setup and data loading

First, make sure you do a git pull as usual, and get the data. We'll be using session R064-2015-04-22. You'll also need to include the tasks\Alyssa-Tmaze and tasks\Replay_Analysis folders in your path.

Then, we load the data:

% first, cd to where your data lives
LoadExpKeys
cfg = []; cfg.fc = ExpKeys.goodSWR(1);
CSC = LoadCSC(cfg);
 
cfg = []; cfg.load_questionable_cells = 1; % load all cells we've got
S = LoadSpikes(cfg);
 
cfg = []; cfg.convFact = ExpKeys.convFact; % conversion factor from camera pixels to centimeters (see PosCon())
pos = LoadPos(cfg);

Step 1: Generating candidate events

Detecting putative sharp-wave ripple events based on thresholding power in a certain frequency band was introduced in Module 6. You can use your own code for detecting sharp wave-ripples (SWRs), or the code below (just make sure your variable containing detected intervals is called evt):

cfg = [];
cfg.type = 'fdesign'; % doit4me
cfg.f = [140 250]; % frequencies present in ripples
CSCf = FilterLFP(cfg,CSC);
 
% Obtain envelope
envelope = abs(CSCf.data);
 
% Convolve with a gaussian kernel (improves detection)
kernel = gausskernel(60,20); % note, units are in samples; for paper Methods, need to specify Gaussian SD in ms
envelope = conv(envelope,kernel,'same');
 
% Convert to TSD
SWR = tsd(CSC.tvec,envelope);
 
% Produce intervals by thresholding
cfg = []; 
cfg.method = 'zscore'; cfg.threshold = 3; % cut at 3 SDs above the mean
cfg.minlen = 0.02; % exclude intervals shorter than 20 ms
cfg.merge_thr = 0; % do not merge nearby events
evt = TSDtoIV(cfg,SWR); % evt is the set of candidate events
 
clearvars -except S pos evt ExpKeys

:!: Make sure that the filter above performed as expected. In particular, if the FieldTrip version of filtfilt() takes precedence in your path over the matlab builtin version, this could fail!

Normally there is a lot more that goes on during candidate detection, such as the elimination of events that occur when the rat is moving too quickly, and but we will keep it simple for now.

☛ What other steps can you think of that may improve the detection of candidate SWR events? Scan the Methods sections of a few replay papers to see how they do it.

Step 2: Estimating place fields

The basics of estimating tuning curves are covered here. Place cells can fire locally (in-field, when the rat is inside the place field) and non-locally (out-out-field, e.g. when the rat is at rest but recalling a previously experienced trajectory). Replay is an example of non-local firing, and it occurs during periods of quiescence. Since non-local firing can affect estimates of place field location, we exclude times when the rat was moving too slowly:

linspeed = getLinSpd([],pos); % linear speed
 
% Threshold speed
cfg = []; cfg.method = 'raw'; cfg.operation = '>'; cfg.threshold = 3.5; % speed limit in cm/sec
iv_fast = TSDtoIV(cfg,linspeed); % only keep intervals with speed above thresh
 
% Restrict data so it includes fast intervals only
pos_fast = restrict(pos,iv_fast);
S_fast = restrict(S,iv_fast);

The next step requires us to know more about the layout of the track used for this data set, so here it is:

  • Rats performed 15-20 trials in one session, such as the one you have loaded.
  • Rats experienced one of two possible trajectories during each trial: start-to-left, and start-to-right. The end of the left arm had a food reward, and the end of the right arm had a water reward.
  • The red line indicates a left trajectory.
  • Each trial shared the central arm in common. The intersection of the central arm and left & right arms is the choice point.
  • Red and blue circles depict hypothetical locations of some place fields.

Our goal is to group the place cells into left and right categories, so we need to look at how cells were spiking when the rat went left or right. However, it’s not quite this simple because the rat often chose one arm far more often than the other, so we need to make sure we’re estimating place fields for the two groups with the same amount of position data. Let’s cheat by using an existing function called GetMatchedTrials():

LoadMetadata; % contains information about trial times
 
% Get equal numbers of trials
[trials.L,trials.R] = GetMatchedTrials([],metadata,ExpKeys);

Now we need to restrict the data to times when the rat was on the track (i.e. during trial intervals). At this point we separate the data into two different sets corresponding to arm choices or trajectories: left and right.

% Restrict position and spiking data to trial intervals (left and right groups)
pos_trial.L = restrict(pos_fast,trials.L);
pos_trial.R = restrict(pos_fast,trials.R);
S_trial.L = restrict(S_fast,trials.L);
S_trial.R = restrict(S_fast,trials.R);

Right now the position data exists in two dimensions (the horizontal plane), but it’s simpler to think of the data in one dimension, existing from the bottom of the Tmaze out to the end of either the left or right arm. To do this, we linearize the position data onto the left and right trajectories. Coordinates defining linear trajectories exist already in metadata, but they are in units of pixels so we need to standardize them to have a bin size in centimeters:

% Resample coords to have same number of pos units in each bin
cfg = [];
cfg.binsize = 3;
cfg.run_dist = ExpKeys.pathlength;
coord.L = StandardizeCoord(cfg,metadata.coord.coordL_cm);
coord.R = StandardizeCoord(cfg,metadata.coord.coordR_cm);
 
% Note that this step and several previous ones can be done in a more
% complicated loop like this, which guarantees that we haven't make L/R
% copy paste typos:
 
% Define arm categories
arms = {'L','R'}; % L is for the left arm and R is for the right arm
for iArm = 1:length(arms)
    coord.(arms{iArm}) = StandardizeCoord(cfg,metadata.coord.(['coord',arms{iArm},'_cm']));
    % this reads as:
    % coord.L = StandardizeCoord(cfg,metadata,coord.coordL_cm); when iArm = 1
    % coord.R = StandardizeCoord(cfg,metadata,coord.coordR_cm); when iArm = 2
end
 
% Linearize position data onto L and R coordinates
cfg = [];
cfg.Coord = coord.L;
linpos.L = LinearizePos(cfg,pos_trial.L);
cfg.Coord = coord.R;
linpos.R = LinearizePos(cfg,pos_trial.R);
 
% Here is another example of the hard-to-read loop for the linearization step:
for iArm = 1:length(arms)
    cfg = []; cfg.Coord = coord.(arms{iArm});
    linpos.(arms{iArm}) = LinearizePos(cfg,pos_trial.(arms{iArm}));
end

Finally, we can use our restricted spike trains and restricted linearized position data to get place field estimates along the trajectories:

% Get place fields ("PF") for L and R trials.
cfg = [];
cfg.binSize = 1;
PF.L = MakeTC(cfg,S_trial.L,linpos.L);
PF.R = MakeTC(cfg,S_trial.R,linpos.R);

☛ Plot the tuning curves for the left and right arms. imagesc() is a good way of doing this.

☛ Why do we say place fields are “estimated” instead of, say, obtained or computed?

Step 3: Categorizing place cells

So far, we have place fields (PF) for all cells that were active during left or right traversals of the track, including cells that were active on the central arm and cells that were active on both arms. We need to keep cells that were responsive to a single arm of the track.

To exclude central arm cells, we can select cells with fields that are located beyond the choice point:

% Linearize choice point so that it exists in the same units as the
% linearized position data:
chp_tsd = tsd(0,metadata.coord.chp_cm,{'x','y'}); % make choice point useable by cobebase functions
cfg = []; cfg.Coord = coord.L;
chp.L = LinearizePos(cfg,chp_tsd); % the exact chp for L or R differs depending on the coord
cfg.Coord = coord.R;
chp.R = LinearizePos(cfg,chp_tsd);
 
% Get indices for cells with fields after the choice point
fields.L = PF.L.field_template_idx(PF.L.field_loc > chp.L.data(1));
fields.R = PF.R.field_template_idx(PF.R.field_loc > chp.R.data(1));

The fields struct contains indices of the cells that were not active on the central arm. However, we still need to account for some cells having fields on multiple arms:

% Remove cells that are active on both arms
[~,remove.L,remove.R] = intersect(fields.L,fields.R); % intersect tells us which of the indices are common to both groups
fields.L(remove.L) = [];
fields.R(remove.R) = [];

As a last step, we need to make sure that each cell appears only once in the list (if a cell has multiple fields on the same arm, we don’t want to keep multiple copies of that cell in our analysis):

% Some cells have double place fields on the arms, so they are present
% twice in the list of cells. Remove them.
fields.L = unique(fields.L);
fields.R = unique(fields.R);

(Aside: The function unique() returns sorted output, so the cells will no longer be ordered according to field location like they are when output from MakeTC(). This is not an issue for co-occurrence analysis, but is for sequence analysis.)

Now that we have the indices describing which cells have fields on the left or right arm only, we can select them from our original spike train:

% Select place cells using the indices in the fields struct
% Note that we're indexing into the *original* non-restricted collection of
% spiketrains
S_arm.L = SelectTS([],S,fields.L);
S_arm.R = SelectTS([],S,fields.R);

As you can see from the command line output of SelectTS(), not many cells of the original 178 are left after this operation!

Step 4: Make a Q-matrix

A Q-matrix is organized such at each row represents a neural unit and each column represents a candidate that units are potentially active in. Thus, each element of the Q-matrix is a count of the number of spikes emitted by the corresponding neural unit in the corresponding time bin. Below, we'll use an existing function, but it is not much more than a wrapped version of the built-in fuction histc():

% There is an existing function that makes a Q-matrix from spiketrains and
% candidate intervals
cfg = [];
cfg.win = 0.1; % In seconds, the binsize for containing spikes. i.e. the interval boundaries are redrawn to all have length 100 ms. If empty [], the exact candidate boundaries are used
Q.L = MakeQfromS2(cfg,S_arm.L,evt);
Q.R = MakeQfromS2(cfg,S_arm.R,evt);

This Q-matrix contains the number of times each uniquely left-arm place cell spiked during candidate, or sharp wave-ripple, events.

Consider column 7 (i.e. event 7) for the “left” cells (Q.L.data(:,7)): cell 1 spiked three twice, and cells 11 and 12 spiked once. These left arm cells co-occurred during the same candidate interval. This matrix is the basis for all subsequent analysis steps, much in the same way that a slightly different Q-matrix is the basis for the decoding analysis covered earlier.

Steps 5-7: Get co-activation probabilities

There is an existing function in the codebase that does steps 5-7. We'll run it, plot the results, and then look at what's going on internally.

cfg = [];
cfg.nShuffle = 1000;
cfg.useMask = 1;
cfg.outputFormat = 'vectorU';
CC.L = CoOccurQ2(cfg,Q.L);
CC.R = CoOccurQ2(cfg,Q.R);
Step 5 results: Cell participation during candidate events (''p0'')

Overall, are left cells more active during candidates than right cells? If so, this means they are more likely to co-occur by chance:

% Make a bar plot
p0_data = [nanmean(CC.L.p0) nanmean(CC.R.p0)];
colors = flipud(linspecer(2)); % get some colors for plotting
location = [1 3]; % where on the x-axis to place the bars
 
figure;
for iBar = 1:length(p0_data)
   bar(location(iBar),p0_data(iBar),'FaceColor',colors(iBar,:),'EdgeColor','none');
            hold on 
end
 
set(gca,'XLim',[location(1)-1 location(2)+1],'XTick',location,'XTickLabel',{'L' 'R'}); xlabel('Field location')
ylabel({'Proportion of';'SWRs active'})
title('Activation probability (p0)')
Step 6 results: Joint probability for cell pairs in the same group (''p3'')

The frequency of pairs of cells firing together during candidate events (the coactivity, or co-occurrence, of cell pairs):

% Make a bar plot
p3_data = [nanmean(CC.L.p3) nanmean(CC.R.p3)];
colors = flipud(linspecer(2)); % get some colors for plotting
location = [1 3]; % where on the x-axis to place the bars
 
figure;
for iBar = 1:length(p3_data)
   bar(location(iBar),p3_data(iBar),'FaceColor',colors(iBar,:),'EdgeColor','none');
            hold on 
end
 
set(gca,'XLim',[location(1)-1 location(2)+1],'XTick',location,'XTickLabel',{'L' 'R'}); xlabel('Field location')
ylabel({'Cell pair'; 'joint probability'})
title('Observed coactivity (p3)')
Step 7: Z-score coactivity (''p4'')

Are cells co-occurring more often than expected by chance?

% Make a bar plot
p4_data = [nanmean(CC.L.p4) nanmean(CC.R.p4)];
colors = flipud(linspecer(2)); % get some colors for plotting
location = [1 3]; % where on the x-axis to place the bars
 
figure;
for iBar = 1:length(p4_data)
   bar(location(iBar),p4_data(iBar),'FaceColor',colors(iBar,:),'EdgeColor','none');
            hold on 
end
 
set(gca,'XLim',[location(1)-1 location(2)+1],'XTick',location,'XTickLabel',{'L' 'R'}); xlabel('Field location')
ylabel({'SWR coactivation'; 'Z-score'})
title('Coactivation above chance levels (p4)')

A neat way of plotting all of these in one figure is:

%% Plot everything in one figure using a loop
 
p_list = {'p0','p3','p4'};
titles = {'Activation probability (p0)','Observed coactivity (p3)','Coactivation above chance levels (p4)'};
ylabels = {{'Proportion of';'SWRs active'},{'Cell pair'; 'joint probability'},{'SWR coactivation'; 'Z-score'}};
arms = {'L','R'};
colors = flipud(linspecer(2));
location = [1 2.5];
 
figure;
for iP = length(p_list):-1:1
 
    p_data(iP,1:2) = [nanmean(CC.L.(p_list{iP})) nanmean(CC.R.(p_list{iP}))];
 
    h(iP) = subplot(1,3,iP);
    for iBar = 1:length(arms)
 
        bar(location(iBar),p_data(iP,iBar),'FaceColor',colors(iBar,:),'EdgeColor','none')
        hold on
 
        title(titles{iP})
        ylabel(ylabels{iP})
        xlabel('Field location')
    end
end
 
set(h,'XLim',[location(1)-1 location(2)+1],'XTick',location,'XTickLabel',{'L' 'R'});
set(h,'PlotBoxAspectRatio',[1 1 1]); maximize

This gives something like:

It looks like his right cell pairs are more co-active than his left cell pairs, relative to shuffled co-coccurrence, suggesting that there may be more right trajectories replayed compared to left trajectories.

Note: in interpreting results like the above, it is important to consider what asymmetries may be present in the neural and behavioral data that could account for any differences. We used the function GetMatchedTrials() earlier to create a fair comparison between left and right from a behavioral sampling perspective, but that doesn't change the fact that behaviorally, the animal experienced the right trajectory more often than the left:

fprintf('Number of left trials: %d\n',length(metadata.taskvars.trial_iv_L.tstart))
fprintf('Number of right trials: %d\n',length(metadata.taskvars.trial_iv_R.tstart))

Implementing a basic co-occurrence analysis from scratch

In this section, we will implement a simple version of coactivation analysis, where we don't care how many times a cell spiked during a given candidate event, only whether it spiked or not. Since our Q matrix contains spike counts, we convert the Q matrix to binary with 1's indicating that a cell was active in a candidate bin and 0's indicating that it wasn't:

% Pull some Q data out of the collector struct. Let's work with just the
% left cells for these examples:
Q_binary = Q.L.data;
 
% First exclude counts > 0
Q_binary(Q_binary > 1) = 1;
Calculating p0

This is a one-liner:

% Get expected co-occurrence under independence assumption
p0 = nanmean(Q_binary,2); % fraction of bins each cell participates in individually
 
% this is the same as nansum(Q_binary,2)./length(evt.tstart)
% Fraction of bins active is equal to the sum of candidate bins the cell was
% active in divided by the total number of candidates
 
% Plot results
figure; bar(1, nanmean(p0),'facecolor',colors(1,:)); set(gca,'xlim',[0 2],'ylim',get(gca,'ylim')*1.3)
Calculating p3
% Get observed co-occurrences
nCells = size(Q_binary,1);
p3 = nan(nCells);
for iI = 1:nCells
    for iJ = 1:nCells
 
        p3(iI,iJ) = nanmean(Q_binary(iI,:).*Q_binary(iJ,:));
 
    end
end
p3(logical(eye(size(p3)))) = NaN; % probability of cell co-occurring with itself is not defined

p3 is a symmetrical matrix that is nCells x nCells in dimension. The co-occurrence of cells 1 and 2, for example, can be seen in (row 1, column 2 ) and (row 2, column 1). Note the diagonal of NaNs: this is because it doesn’t make sense asking about a cell’s co-occurrence with itself, so we exclude these values from the set.

Since p3 is symmetrical, it has multiple copies of the same measure. We can keep one copy by extracting either the upper or lower triangular parts of the matrix:

keep = triu(ones(nCells),1); % get indices of values we want to keep (the 1 means we're discarding the diagonal)
p3_upper = p3(find(keep));
% now p3 is a vector and contains only one copy of each co-occurrence value
 
% Plot results
figure; bar(1, nanmean(p3_upper),'facecolor',colors(1,:)); set(gca,'xlim',[0 2],'ylim',get(gca,'ylim')*1.3)
Calculating p4

It's possible that co-occurrence values in p3 are expected by chance alone: if one set of cells (left or right) is more chatty overall and active in a higher proportion of candidate events, we would expect them to co-occur more often. What we can do to check if the co-occurrences are similar to chance is to randomly shuffle the values within each row, and compare the shuffled co-occurrences with the observed co-occurrences and get a z-score. This will tell us if observed co-occurrences (p3) occur at a higher rate than expected by chance. The number of shuffles for accurate estimates of chance co-occurrence is at least 10,000. But a lower number reduces the time required for computation and is a close enough approximation for these purposes:

nShuffles = 1000;
 
shuf_p4 = nan(nShuffles,nCells,nCells);
nBins = size(Q_binary,2);
 
for iShuf = 1:nShuffles
 
    % permute Q-matrix
    Q_shuffle = Q_binary;
    for iCell = 1:nCells
        Q_shuffle(iCell,:) = Q_shuffle(iCell,randperm(nBins)); % this step using randperm mixes the contents of each row horizontally
    end
 
    % now compute shuffled co-occurrences
    for iI = 1:nCells
        for iJ = 1:nCells
 
            shuf_p4(iShuf,iI,iJ) = nanmean(Q_shuffle(iI,:).*Q_shuffle(iJ,:)); % shuf_p4 is the shuffled co-occurrences and is a 3-dimensional array
 
        end
    end % of loop over cell pairs
 
end % of loop over shuffles
 
% z-score coactivity
p4 = nan(nCells);
 
for iI = 1:nCells
    for iJ = 1:nCells
 
        p4(iI,iJ) = (p3(iI,iJ)-nanmean(sq(shuf_p4(:,iI,iJ))))./nanstd(sq(shuf_p4(:,iI,iJ)));
 
    end
end % of loop over cell pairs
 
% p4 is a symmetrical matrix like p3, so let's keep half of it:
p4_upper = p4(find(keep));
figure; bar(1, nanmean(p4_upper),'facecolor',colors(1,:)); set(gca,'xlim',[0 2],'ylim',get(gca,'ylim')*1.3)

This shuffling procedure is an example of resampling: we create a number of different data sets based on some rearrangement of the original data. In this case, the specific resampling is a shuffle or permutation, which breaks any relationship between neurons (because we shuffle each neuron independently) and therefore functions as a control for the amount of co-occurrence we expect by chance (i.e. if the neurons were independently active). In general, a major advantage of resampling methods is that they preserve aspects of the underlying distribution – in this case, of spike counts – and make no particular assumptions about its shape, whereas many parametric statistical tests require data to be e.g. normally distributed.

Challenges

★ Why don't we just look at single-cell activation? Under what conditions would the results from that be the same, or different, from pairwise co-occurrence?

★ Comment on the choice of bins used in constructing the Q-matrix. It it reasonable to assume that SWR events are always 100ms in length? Modify the code to use the actual length of SWR events. Is the resampling statistic still doing the right thing in this case?

★ Implement co-occurrence analysis on your own data.

Credits

This module was developed by Alyssa Carey.

Discussion

Enter your comment. Wiki syntax is allowed:
R H K T A
 
analysis/course-w16/week16.txt · Last modified: 2016/03/01 09:50 by eirvine