The rodent whisker system has become the leading experimental paradigm for the study of active sensing. Thanks to more sophisticated behavioral paradigms, progressively better neurophysiological methods, and improved video hardware/software, there is now the prospect of defining the precise connection between the sensory apparatus and brain activity in awake, exploring animals. Achieving this ambitious goal requires quantitative, objective characterization of head and whisker kinematics. This study presents the methodology and potential uses of a new automated motion analysis routine. The program provides full quantification of head orientation and translation, as well as the angle, frequency, amplitude, and bilateral symmetry of whisking. The system operates without any need for manual tracing by the user. Quantitative comparison to whisker detection by expert humans indicates that the program's correct detection rate is at >95% even on animals with all whiskers intact. Particular attention has been paid to obtaining reliable performance under nonoptimal lighting or video conditions and at frame rates as low as 100. Variation of the zoom across time is compensated for without user intervention. The program adapts automatically to the size and shape of different species. The outcome of our testing indicates that the program can be a valuable tool in quantifying rodent sensorimotor behavior.
- whisker tracking
- snout tracking
- head tracking
active sensing is the control of the position and movement of the receptor apparatus so as to optimize the collection of information from the environment (Bajcsy 1988; Mitchinson et al. 2007; Szwed et al. 2003). It is increasingly recognized that sensor guidance and control are key challenges in adaptive systems, whether natural or artificial; thus active sensing is widely studied both by biologists and by engineers seeking to build intelligent machines (Solomon and Hartmann 2006). Active control of the sensory apparatus is most evident in the modality of touch. For humans, movement of the hand and fingers is fundamental to our tactile sense (Lederman and Klatzky 1993; Robles-De-La-Torre and Hayward 2001). In the animal kingdom, the rodent whisker system has become the leading experimental paradigm for the study of active sensing. Rats sweep their whiskers (also called facial vibrissae) forwards and backwards to recognize the positions of floors, walls, and objects, particularly in the dark surroundings that are their natural habitat. Once they encounter an object, they collect additional information about its features, such as its size, shape, and surface texture by palpating it through a “whisking” motion (Grant et al. 2009; Kleinfeld et al. 2006). The head and whiskers need to be considered together because their coordination is a key feature of active sensing (Towal and Hartmann 2006).
Research on vibrissal tactile sensing in rodents is reaching a critical phase. Data are beginning to emerge that raise the prospect of drawing precise relationships between vibrissal movements in awake, behaving animals and accompanying brain activity (Arabzadeh et al. 2005, 2006; Diamond et al. 2008a,b; Jadhav et al. 2009; von Heimendahl et al. 2007; Wolfe et al. 2008). Achieving this ambitious goal will require quantitative, objective characterization of head and whisker kinematics.
Three approaches have been developed to monitor vibrissal motion in awake mice or rats: 1) electromyogram taken from the facial muscles (Hill et al. 2008); 2) monitoring whisker intersection with a sheet of light (Harvey et al. 2001; Jadhav et al. 2009; Wolfe et al. 2008); and 3) high speed videography (Knutsen et al. 2005, 2008; Ritt et al. 2008; von Heimendahl et al. 2007). Each method has advantages and disadvantages. Electromyogram provides a wealth of information about the motor commands transmitted to the muscles; however, it does not define whisker position itself; furthermore, the method is invasive. The light sheet provides very high temporal and spatial resolution but reveals only the point of intersection, in one dimension, between the whisker and the light plane. The rat must be positioned in such a way as to place its whiskers in the plane of light. Neither of the above methods monitors head motion.
High speed videography is noninvasive, provides good temporal and spatial resolution, and allows head and whisker movement to be observed within a relatively large field of view. Its main drawback is the complexity of image processing. High-speed films constitute huge amounts of data. Considering that in a single animal, hundreds of trials per day can be recorded over a period of several weeks, with each trial containing at least 1,000 images, it is evident that manual tracing of whisker and head position is not feasible. This problem has led to the development of several approaches for automated data extraction. The methods developed by Knutsen et al. (2005) and by Voigts et al. (2008) track whiskers on freely moving rodents but require most of the whiskers to be trimmed to avoid errors due to intersections and occlusions. Limiting the number of whiskers to a single row or fewer has been also reported in a recent unsupervised tracking method (O'Connor et al. 2010). This latter method runs on video of head-fixed mice. In another technique involving body-restrained rats, tracking was accomplished after a marker was glued to a single vibrissa (Venkatraman et al. 2009). Whisker clipping, fixation of head and body, or whisker tagging is time consuming and inevitably reduces the resemblance of the observed behavior to the animals' natural sensorimotor strategy.
Here we present vibrissae and snout analyzer (ViSA), a method for tracking the head and for extracting whisking parameters (angle, amplitude, phase, frequency, and symmetry) that does not require whisker manipulation and does not unacceptably limit the animal's movement. ViSA is designed to automatically analyze large quantities of high-speed video on multiple-processor machines. It is able to identify autonomously the position of the head and to track its contour across the whole duration of the film. The method is not intended for the frame-by-frame tracking of the kinematics of single whiskers; instead, it extracts parameters from the full complement of whiskers. Our testing verifies that it operates on compressed video and on films of moderate quality. Because the method requires little user intervention and operates on a wide array of video formats and a broad set of experimental conditions, ViSA could prove valuable for behavioral neuroscientists interested in the function of the vibrissal sensorimotor system and its modulation by behavioral tasks, pathology, or pharmacological agents.
ViSA consists of two modules. The first module tracks head contour, position, and orientation over time. This allows the program to place the whiskers in a head-centered coordinate system. The second module extracts whisker shafts from segments located at a user-defined distance from the head and then provides a parameterization of whisking over time.
Experimental Setup and Videography
Data were collected for studies concerning the neuronal bases of tactile sensation (von Heimendahl et al. 2007; Itskov et al. 2011; Zuo et al., in press). Subjects were adult male Wistar rats. All experiments were conducted in accordance with the National Institutes of Health, International, and Institutional Standards for the Care and Use of Animals in Research and were supervised by a consulting veterinarian.
Films were recorded under infrared light reflected in a circular mirror positioned under the rat; the head of the rodent and its whiskers are dark against a bright background. A typical frame is shown is Fig. 1A (with contour processing already applied), where the bright area corresponds to the mirror and the grooved plate at the lower edge is the texture that the rat was trained to palpate. In all trials, the rat entered the field of view from the upper edge, approached the texture, and withdrew after making a few contacts, turning right or left for a water reward.
The data were recorded with two different cameras, a MotionPro 2000 (Redlake) with resolution set to 512 × 512 and an Optronis CamRecord 450 (Optronis), with resolution set to 512 × 512, 256 × 512, or 300 × 640. The spatial resolution varied between 0.12 and 0.25 mm/pixel. The films, recorded at 512 or 1,000 frames per second (fps), were compressed to various degrees, and the method was tested on compressed films. In films with higher degrees of compression, whisker tips were blurred while shafts near the head maintained enough contrast to be correctly discriminated.
Head Detection and Tracking
Contour detection is designed to detect generically every contour in the image; identification of the head outline and tracking of its motion are implemented in successive modules. To separate head and whiskers silhouette from the bright background, the algorithm converts each frame to a binary image by applying a threshold found to be reliable for all tested films. It then filters out whiskers and noisy points with a morphological closing (Gonzales and Woods 2005) and extracts the pixels of the binary image boundaries. The resulting edges, combined with the original image, are shown in Fig. 1A.
Head detection: fitting a template to the contour.
Tracking of head dynamics requires estimation of the head position, orientation, and real contour data from all available edges. The method is based on motion history and on a robust algorithm for tracking rigid shapes. Ultimately, whisker position can be expressed in a head-centered coordinate system oriented constantly with the head axis of symmetry.
Estimation of the head center and orientation cannot rely only on detected contours. Our early approaches based on calculating the center of gravity and the symmetry axis solely using the head contour proved to be unstable. This is because over the course of a trial, rats extend or shorten the tip of their snout, which results in a consistent drift of the center of gravity uncorrelated with head dynamics. In some films, small head tilts and jaw motion significantly alter the estimation of contour symmetry. Finally, the cables of the implanted electrodes sometimes come into view and occlude part of the head silhouette, ruining the symmetry.
To overcome these problems, we tested two approaches: 1) linear and nonlinear filtering of the head contour, and 2) fitting a “head template,” as defined below. Although nonlinear filters (Petrovic et al. 2004) prevented the contour shrinking that is characteristic of linear filters (Mokhtarian and Mackworth 1992), neither eliminated the undesirable contour asymmetries causing the drift of the center of gravity. Drifting was successfully eliminated only using the second approach, whereby a rigid head template (Fig. 1B, red outline) was fitted on the detected contours (Fig. 1B, blue outlines). In this approach, the head position and orientation are estimated from the rigid template and not from the output of the edge detector. The fitting process minimizes the perpendicular distances between the template and the detected contour (Fig. 1C, cyan colored line segments), so that local asymmetries or partial occlusions disappear because the larger number of symmetric points outweighs them. Once the head template is fitted (Fig. 1D, green contour), the local contour replaces the rigid template to have a pixel-by-pixel fit of the head outline. In what follows, details of the template fitting and tracking method are given.
The head template serves as the rodent head model. It has been tested on rats (Figs. 1 and 2) and mice (see Supplemental Fig. S1; Supplemental Material for this article is available online at the J Neurophysiol website). It easily adapts to different head sizes and rodent morphologies because the tracking method is able to scale, rotate, and translate the template until a proper match with the detected contour is found. The template was created by approximating the head contour of a rat with a cubic spline in a frame where it was symmetric and its axis was perfectly vertical.
The method was implemented using the shape space tracking algorithm (Blake and Isard 1998). The principle of the algorithm is to restrict all possible deformations of a spline contour, in our case the head template, to a restricted space, called shape space. This restriction allows the template to undergo only certain types of deformations and consequently increases the stability of the tracking. In general, a spline contour has 2-M degrees of freedom, where M is the number of control points. Since the program must compensate for only vertical and horizontal translation, for rotation, and, occasionally, for scaling, four degrees of freedom are sufficient to track the planar motion of the head. We thus implemented the shape space of Euclidean similarities (Blake and Isard 1998) restricting the number of possible deformations to a four dimensional vector. In this space, the first two components of the shape vector represent the vertical and horizontal translation of the template, i.e., they specify the position of the head template in the frame. By differentiating these components, we obtain the vertical and horizontal velocity of the head. The rotation angle and the size of the head are extracted from a trigonometric function that relates these two parameters to the third and the fourth components of the shape vector. The estimation of this four-dimensional state vector was implemented using a Kalman filter (Blake and Isard 1998). This algorithm uses the perpendicular distances between the template and the detected edges to estimate the shape space vector. By considering these distances as a one-dimensional signal, the algorithm translates, scales, and rotates the template until the distances reach a minimum. The minimum corresponds to an optimal fit between the template and the detected edges (see Fig. 1). Details of the algorithm are reported in Supplemental Material.
Although the Kalman filter implementation proved reliable, we carried out two modifications to tailor it to head tracking. The first modification allows disabling of the scaling of the head contour during the tracking process. The possibility of scaling the head contour is implicit in the affine shape space formulation. It might be a desirable feature in a setup with motorized zoom or with experiments where the distance between the camera and the rodent varies significantly, as shown in the results, but in the experiments currently under analysis the zoom was fixed. We preferred not to define a new shape space that disables a priori the scaling, so we opted for a scaling correction step after the shape vector estimation that allowed only changes in translation and rotation.
The second modification is a procedure for the identification of the head contour from neighboring edges. The detection step generically outlines every contour in the binary image, so it initiates the fitting process to select the head outline. The standard approach (Blake and Isard 1998) identifies the head contour points as those edge points located within a predefined distance from the template. However, when the animal's snout closely approaches an object of interest (a textured plate in our experiments), the true head contour and the object edge fall inside the search interval, so the algorithm marks both as candidate edges. As a consequence, the shape space estimation algorithm minimizes the distances between the two edges by erroneously translating the head template to an intermediate position between the object and the real head contour. To exclude this tracking error, we developed a different method for defining head contour points. For every contour point inside a predefined distance from the head, we calculate its polar coordinates relative to the center of the template. We sort every point according its angular coordinates into a set of intervals with a 10-degree span. For each of these intervals, we retain as head contour points only those points close to the minimum radial distance in the interval. This procedure decreases the number of “spurious” edges, leading to a more robust implementation.
Kalman filtering in the time domain.
Estimation of head dynamics was additionally improved by applying a Kalman filter in the time domain that predicts the new template position and updates the estimate according to the shape vector. In general, the spatial Kalman filter used in the fitting process is able to account for moderate displacements between the detected edges in the new frame and the head contour in the previous frame, but tests on low frame rates pointed to the necessity of using motion history to predict the template position in the new frame. Among potential tracking algorithms, we opted for the Kalman filter (Kalman and Bucy 1961) because there is only a single target to track. We thus embedded the four-dimensional shape space vector (horizontal and vertical template translation, rotation, and scaling) into a six-dimensional state vector of the temporal Kalman filter by adding vertical and horizontal translation velocities. It proved unnecessary to add two dimensions related to acceleration nor to add one for the head rotation velocity. Therefore our head tracking uses a Kalman filter in time to predict template position in the new frame and a Kalman filter in space to correct the estimate. Details of the implementation are given in Supplemental Material.
Head tracking initialization.
Tracking initialization is the process of setting head position and orientation for the first time in a given video. Initialization is done either manually or automatically. In the manual modality, in each film the user selects the initial frame and clicks on the snout tip and on the head center to let the initialization routine calculate the appropriate scale, orientation, and position of the template. In the automatic modality, the user sets the scale of the template once for the full set of videos recorded under the same conditions. When there is the possibility that motion in the scene may be generated by objects other than the animal (e.g., by a discriminandum moved into position), the user must also select a “region of interest,” equivalent to all fields except that of the moving artifact. In other words, the user must mask out those areas in which artifacts could erroneously trigger head detection.
The automatic initialization routine scans every twentieth frame of the film until it is able to recognize the head contour among the edges of moving objects. These are detected by subtracting the background image extracted from the whole sequence and by masking the resulting image with the “region of interest” (details of the background subtraction are in the segment fitting section). The resulting frame is then converted to a binary image. Whenever the number of black pixels inside this image exceeds a user-selected threshold (defined as roiFill in Supplemental Table S2), the program assumes that a moving object has entered the scene and uses the center of mass of the candidate object to fit the head template. It repeats the procedure with eight different orientations spaced by 45°. By increasing the total number of iterations of the spatial Kalman filter, shape estimation is able to recover orientation changes of ±30° with respect to the starting value. Among the eight starting orientations, the routine selects the one that best fits the edges of the detected shape. This fitting indicator, named head contour goodness of fit (HCGF), stops the scanning routine and starts the data analysis, when its value exceeds a user set threshold (denoted as minHCGF in Supplemental Table S2). HCGF is defined as the percentage of perpendicular distance vectors between the template and the detected contour edge (Fig. 1C) that are shorter than a chosen distance threshold (referred as distanceT parameter in Supplemental Table S2). We found that 50 distance vectors, equally distributed along the head template, adequately quantify the match between the template and contour.
To evaluate the automatic initialization procedure, we ran it on a data set of 218 compressed films with different resolutions and compression ratios. Inspection of the images selected by automatic initialization revealed that with a distance threshold (distanceT) of 8 pixels and threshold for HCGF (minHCGF) set to 0.80 (i.e., >80% of the vectors between head template and detected edge were shorter than 8 pixels), the number of films correctly initialized was 95%. In our data set there were no errors on “simple scenes,” defined as those in which the rat's head entered the field of view with no occlusion. Examples of correct automatic initialization on simple scenes are given in Fig. 2, A and B. Even in videos with distracting objects, the program usually distinguished the actual head contour from the attached headstage (Fig. 2C). Errors occurred only on “complex scenes,” defined as those in which head orientation caused the headstage and cable to be prominent. In complex scenes, errors could arise by false positive (the object identified as the head was in fact some other object resembling it) or a miss (the animal's head entered the field of view but was not detected). Almost all errors were false positives, and they were caused by extensions from the animal head: the headstage or the recording cable (Fig. 2D). Note that such errors can occur in animals with implanted devices and would not occur in unfettered animals.
From this, we conclude that the ViSA can successfully initiate large batches of video in an automatic mode with low error rates. To provide checking for the correctness of automatic initialization, the program includes a routine in which the initiated image of each video is dumped to a folder and all initialization frames can be quickly scanned. If error rates are unacceptable, manual initialization can be used.
Validation of head tracking performance.
Once the initial frame is set, the film is processed forward and backward to extract whisking activity and head dynamics across its entire duration or until the rodent exits from the field of view (detected as a drop in HCGF rate). Since a mismatch between the head template and the real animal head is a critical error in unsupervised high-volume video processing, the HCGF is recorded for every frame to assess the reliability of the tracking data (Fig. 3A). Figure 3, B and C, shows the performance of the head tracking module on a film in which the zoom was changed continuously, compared with the same film with constant scale.
Whisker Segment Detection and Parameter Extraction
The extraction of whisker angles requires detection of the local shaft orientation. ViSA detects whisker segments located at a user defined distance from the head. Due to the limited curvature at these distances, we modeled them as rigid segments.
The algorithm for the detection of whisker segments entails four steps. 1) Whisker segments located within a predefined distance from the snout are highlighted. 2) An intensity peak detection algorithm marks points on the centers of the shafts. 3) Points are grouped into lines where the intensity peaks are sorted into segments. 4) Spurious segments are deleted using position and orientation criteria.
Once whisker segments are detected, whisking parameters can be easily extracted. At that point, the final output of ViSA is computed, whisking dynamics of the full set of vibrissae on both sides of the snout, coordinated with head dynamics. The following sections provide an overview of the techniques for segment detection and parameter extraction.
Background subtraction and masking for shaft highlighting.
Whiskers are highlighted in the original image (Fig. 4A) by using background subtraction and mask multiplication (Fig. 4B). Typically, the animal is present during most of the film, so the background image cannot be created from any single frame. Since the exploring rodent does not always cover the same portion of the background, and our lighting and camera are set up to film its silhouette, the best estimate for the background is its brightest value over the course of the video. The background image is thus formed by the maximum intensity value of every pixel.
The purpose of the background subtraction is to highlight the whiskers and the snout pixels. The resulting image is multiplied by a binary mask that marks all pixels within a predefined distance range from the head contour. The distance range is user defined and sets the minimum and maximum distances between the head contour and the mask pixels (minD and maxD in Supplemental Table S3). Besides marking the area where whiskers will be approximated as segments, the mask also reduces the computational load of later steps. In the last preprocessing step, the resolution of the image is increased with a standard bicubic interpolation followed by a Gaussian smoothing to reduce the noise and the CCD sampling quantization errors.
Intensity peak detection.
After the preprocessing step highlights the whiskers (Fig. 4, A and B), an intensity peak detection method is used to extract their coordinates. In our films, whisker thickness varies from 2 pixels, for a single thin vibrissa, to 10 pixels, for several overlapping vibrissae. Since greater thickness introduces ambiguity in whisker orientation, we extract only the coordinates of the shaft centers. Shaft centers are detected as intensity peaks by scanning the image intensity on paths perpendicular to the expected shaft direction, that is, on paths tangential to the snout. Using tangential lines speeds up the method because it allows us to convert the image into head-centered polar coordinates. Intensity detection becomes a simple local maxima detection performed on one-dimensional arrays where image pixels have the same radial coordinate and are sorted with increasing angular values. The intensity peak selection process is regulated by two parameters: the minimum intensity threshold, used for reducing the incidence of noise, and the minimum angular distance, set to avoid two neighboring peaks being detected across the same shaft (peakT and thetaT, respectively, in Supplemental Table S3). An example of intensity peak detection with dots marking the peak position is given in Fig. 4C.
Clustering intensity peaks into segments.
Although occasionally a true whisker segment can be reconstructed simply by joining neighboring intensity peaks, the presence of partially overlapping whiskers complicates the task and calls for a method for identifying collinear peaks and grouping them into segments. We give an overview of segment detection followed by a detailed protocol.
We formulated segment detection as a clustering problem, with the particularity that the cluster center is not a point, but a line segment joining two peaks termed start and end node. Start and end node classification depends upon their distance to the snout contour (referred to as dRatio parameter in Supplemental Table S3); intensity peaks located close to the snout side of the mask are marked as start nodes and intensity peaks close to the external border are marked as end nodes. Figure 4C shows start nodes (black dots) and end nodes (blue dots), while central nodes (peaks located at an intermediate distance from the snout) are marked with red dots. The method generates all possible segments between every start node and its surrounding end nodes (possible segments emanating from one start node are shown by blue lines in Fig. 4C) and rates every segment according the number of overlapped peaks. The algorithm selects the maximum number of nonoverlapping segments, such that the number of nodes in each segment exceeds a cluster-mass threshold (Fig. 4D, green and orange thick lines). The threshold avoids the selection of overly short segments or segments created by noise peaks.
The detailed technique is the following. The cluster generation process connects every start node to all end nodes within a predefined distance (denoted as cRatio in Supplemental Table S3). An example of candidate clusters generated from one start node is indicated by blue lines in Fig. 4C. Each cluster is then rated according to the number of nodes within a predefined distance from it; these nodes are also labeled as candidates for this cluster. The distance between a generic intensity peak and the cluster is defined as the perpendicular distance between the segment and the node. The distance threshold (defined as nodeT in Supplemental Table S3) acts as an area for the segment; intuitively, peaks with distance below nodeT are those that can be “covered” by the segment. An example of selected segments is given in Fig. 4D, where green and orange segments are plotted with an increased width to provide the idea of “coverage” based on perpendicular distance between the segment and the intensity peaks.
From the pool of all possible combinations between start and end nodes, with many of them generating overlapping segments, it is necessary to cluster the maximum number of nodes into nonoverlapping segments. Each time a cluster is selected, all unassigned nodes within the predefined distance are assigned to the cluster. The cluster selection process sorts segments according their initial score. The first cluster is selected by default since all its nodes are unassigned, but for each of the remaining clusters, the algorithm computes their score counting unassigned peaks. If the score exceeds the mass threshold (denoted as clusterT in the Supplemental Table S3), the segment is selected and all unassigned nodes counted in the score are assigned to the newly selected cluster. When several segments of different length overlap, the procedure always selects the larger cluster and discards the shorter segments because their new score is zero. Thus the advantage of first selecting the longer segments is that they more precisely define the shaft orientation. The procedure is repeated until there are no clusters with score exceeding the mass threshold, which means that there are not enough collinear intensity peaks to form a segment. The procedure can be reiterated by recalculating the new segment position and orientation using a linear least square fit of the nodes assigned to each cluster, but we found that additional iterations did not change the mean angle.
Filtering spurious segments.
When the user runs ViSA with the distance from the head parameter set to zero, it detects common fur in addition to the whiskers. To exclude the fur, the algorithm finds each segment's intersection point with the head contour and checks the acceptable segment orientation range for that contour point. Taking into account the typical range of angles associated with whisking motion, an acceptable orientation range that removes spurious segments with position and orientation incompatible with whiskers pad for every point of the contour can be defined. Segment detection is carried out separately for the left and right whiskers.
Extraction of whisking parameters.
For every detected segment, the whisker angle is defined as the angle between the segment and the head axis of symmetry. Angles are expressed in the head-centered coordinate system (Towal and Hartmann 2006), where the south head orientation is 180°, north is 0, and west and east are both 90°. Left and right vibrissal fields (Fig. 5) are divided according to their intersection point with the contour. Intersection points are expressed in a head-centered polar coordinate system referenced to the head orientation angle. Their angular coordinate describes their order along the head contour; thus the most rostral and the most caudal whisker segments are those with the minimum and maximum intersection point angular value for their respective side.
The mean whisker angle on one side of the snout is defined as the average angle of all segments on that side. The time series of whisking angle is treated with a low-order polynomial in a sliding window (Savitzky and Golay 1964), which smoothes any mean angle variations generated by occlusions and intersections and does not introduce phase delay. Smoothed mean angle is partitioned into half periods, where the signal in each half-period is normalized by subtracting the lower signal envelope and dividing its amplitude by the difference between the upper and the lower signal envelope. The upper envelope is obtained by fitting the peaks with a spline curve, while the lower is obtained by fitting the valleys. The normalized angle, scaled between −1 and 1, is used to compute the phase and the frequency. Phase is estimated using two symmetric arcsine functions to obtain a connected and increasing phase value in both half-periods. Frequency is the reciprocal of the period between points on successive waves with the same phase. Symmetry is defined as the phase difference between the left and right mean angles.
Evaluation of the whisker detection method.
To evaluate the method, we asked two questions: how does it compare to a manually traced film sequence in whisker detection performance, and how do any detection errors impact the mean angle output? To carry out the manual tracing, a human expert used a purpose-built graphic user interface to click on several points along the shaft of each whisker in every frame. The points were interpolated with a cubic spline using the least squares approximation to compensate small pointing errors. The resulting sequence was visually inspected by a second expert and corrected until a consensus was reached. We treated each frame independently; thus in case of total overlap of two whiskers, only one was traced by the human expert. The output of the human experts is denoted “gold standard.”
We applied the comparison between ViSA and the gold standard to two 1,000 fps films. Film 1 was a sequence in which some of the whiskers were trimmed, leaving a total of 15 whiskers intact on the two sides together; an uncompressed 500-frame sequence was analyzed. Film 2 was a sequence in which all whiskers were intact. Because the manual tracing of >30 whiskers per side is laborious, we limited the Film 2 analysis to 50 consecutive frames corresponding to a half-whisk, starting at peak protraction where the whisker spread was minimal (minimal angular difference between the most rostral and the most caudal whisker) and ending at peak retraction where whisker spread was maximal.
We then measured the agreement of the segments automatically detected by ViSA with the gold standard whiskers. Every whisker in the gold standard was converted to a set of coordinates with a dense, subpixel sampling. Analogously, every ViSA-detected segment was represented by 30 equally spaced points. The distance was calculated as the mean value of Euclidean distance between a gold standard whisker and each of the 30 segment points constituting the nearest ViSA-detected segment. Figure 6, A and B, shows individual frames from the two films in which ViSA-detected segments and the gold standard are compared. We used a signal detection framework (Green and Swets 1966) to quantify performance. Specifically, we counted as “correct detections,” those gold standard-ViSA whisker pairs having an average distance below two pixels (Fig. 6, A and B, blue curves and overlapping yellow segments). We counted as “missed detections” any cases where the average distance from a gold-standard-detected whisker to the nearest ViSA-detected whisker was greater than two pixels (Fig. 6, A and B, green whiskers). Similarly, we counted as “false detections” any cases where the distance from a ViSA-detected segment to the nearest gold-standard whisker was greater than two pixels (Fig. 6, A and B, red segments). The category of true negative (correct logging of the absence of a whisker) was not applicable and, for the purpose of performance formulas, was set to 0. Finally, we summed the counts of correct detections, missed detections, and false detections across both film sequences. The results are reported in Table 1.
To assess the impact of the detection errors, we compared the time sequence of mean whisking angles obtained with ViSA to that obtained by the gold standard. As whisking angle varies along the shaft, we “cut” the gold standard whiskers at the same distance from the snout as used by ViSA. The mean angle sequences for Film 1 are illustrated in Fig. 6C. The differences between mean angles were quantified as the squared amplitude difference between the two traces on respective sides and by averaging the error between the left and right sides; squared mean angle error for both Films 1 and 2 are given in Table 1.
The software was written using MATLAB (MathWorks, Natick, MA), with extensive use of the Image Processing Toolbox and the Spline Toolbox. The Parallel Processing Toolbox is also required whenever a large number of films needs to be processed on a multiple processor machine. The parallelization has not been implemented on a per-frame basis, but on a per-film basis, to allow the execution of the code on separate nodes.
The methods can be operated using the graphical user interface or using scripts. The optimal algorithmic parameters are usually set using the graphical user interface, while high volume processing is done using batch processing scripts. This second modality requires the user to specify only the folder containing the films and the number of cores/processors dedicated to the processing. The software itself runs a two step computation, using the Parallel Processing Toolbox to spread the “for loops” on different nodes. The first processing step runs the automatic head tracking initialization on each film. This discards automatically trials in which the rodent is completely occluded or empty videos generated by errors in camera triggering. On the valid films, the head position and initial frame data are used in the second step to run the head tracking and the whisker segment detection software. Starting from the automatically detected initial frame, every film is processed backward and forward in time until the HCGF rate drops under a user-selected value or until all available frames are processed.
A typical batch processing operation requires 28 h to process a folder containing 240 compressed videos with 1,000 frames on a 8-core Intel i7 based workstation with 4 GB of memory. The average processing time on a single core for a single frame is ∼1 s if the rat possesses two whisker rows and increases slightly on rats with the full whisker pad intact. The code has not been optimized for speed and porting the code to a C/C++ format could reduce the processing time by a factor of 10.
We ran the ViSA program on films recorded during whisker-mediated texture discrimination. We used films taken under nonoptimal lighting conditions to increase the rigorousness of program validation. In a typical frame (Fig. 1A), the whiskers are not in focus and the contrast is low. Moreover, the rat has its full set of vibrissae intact and moves without restraint.
ViSA utilizes adaptive template fitting algorithms (Fig. 1, B-D) to detect the animal's head. The program could reliably detect the entry of the animal into the field of view using automatic head detection routines (Fig. 2). The HCGF provides a measure of how closely the snout shape on each frame fits the template. It is defined as the proportion of vectors normal to the detected contour (Fig. 1C) that meets the template within a fixed distance, usually set to eight pixels. Figure 3A illustrates a typical result where, when the rat's head is fully inside the field of view (frames 200–700), HCGF remains well above 0.9. When films are analyzed with automatic initialization (Fig. 2), the HCGF plot provides a rapid control that the head has been adequately tracked in each film.
Experiments designed to characterize the behavior of rodents in an arena often need to quantify whisking during exploration and in proximity of specific objects. Such experiments may benefit from a variable zoom camera, where the minimum zoom is used to monitor the animal's approach and the maximum zoom to monitor object palpation (E. Ahissar, personal communication). ViSA uses a head tracking method that can scale the template as needed. To test the method, we simulated a variable zoom by passing in steps of 1% per frame (with respect to the original size) from 100 to 200% and back to 100%, five times. In Fig. 3B, the head scale, extracted as the output variable from the head tracking algorithm (black line), is compared with the original image scale (grey line). The graph shows that in the peak of the third cycle the snout outline led to a slight overestimate of the template scale but the extracted zoom was accurate for the rest of the film. In Fig. 3C, the HCGF is plotted for the original video (grey line) and for the same video after undergoing periodic rescaling (black line). The template fit the contour with a value of >90% for the original video and >80% for the rescaled film, until the final 20 ms, when the rat left the camera's field of view.
The program uses a segment detection routine (Fig. 4) to detect the full complement of whiskers on both sides of the snout (Fig. 5). Like the head contour routine, the whisker segment detection method is not affected by whisker width variations produced by changes in the zoom factor. To illustrate the ViSA output, we provide in Supplemental Material two videos of rats with all whiskers intact. The first video shows the output of the method in processing a low quality sequence with high compression ratio, while the second illustrates a high quality, low compression video.
Performance was tested by comparing the automatic output of ViSA to a gold-standard tracking performed by human experts. The comparison was made on a 500-frame whisking sequence with a trimmed rat (Fig. 6A) and a 50-frame whisking sequence with a rat possessing all whiskers (Fig. 6B). Table 1 reports a correct detection rate of 99% for Film 1 and 97% for Film 2. Precision and accuracy were 98 and 96%, respectively, for Film 1 and 92 and 90% for Film 2. The slightly lower performance on Film 2 is a consequence of the high complexity of the images. The mean angle sequences for Film 1 are illustrated in Fig. 6C, and it is clear that ViSA produced a whisking estimate nearly identical to the expert human trackers. The squared mean angle error for both Films 1 and 2 is given in Table 1.
In the behavioral experiment, the task of the rat was to approach a plate positioned in front of it and to palpate the plate using its whiskers to identify the texture. Upon identification of the texture, the rat withdrew and turned either to the left or right; the reward location was determined by the identity of the texture presented on that trial. The operation of ViSA on a typical video, taken at 1,000 fps, is given in Figs. 7 and 8. The approach of the head, from the left, towards the textured plate (right side) is illustrated in Fig. 7A. In Fig. 7B, head center of gravity (+ symbol) and head orientation (arrow) are presented for the same film. The texture is depicted by the blue line Fig. 7B, top. The rat approached with its head directed slightly to the right and then straightened to become perpendicular to the texture while it palpated the surface with its whiskers. When it successfully discriminated the texture, the head turned sharply to the right as it began to move toward the reward location. The distance between the forward most point of the snout and the texture is shown in Fig. 7C. The closest approach was made in frame 500, and the snout remained adjacent to the texture for 100 ms. Beginning in frame 600, the snout began a rapid withdrawal.
This example demonstrates that through head tracking the experimenter can gain a detailed characterization of animal behavior, including the speed and direction of approach, duration of tactile sampling, and trajectory of withdrawal. Such information can be useful for the quantification of sensorimotor behavior.
Figure 8 shows the characterization of whisking obtained from the same film. Whisking, the well-known rhythmic forward and backward sweeping of the vibrissae, is evident in Fig. 8A. Individual cycles ranged from 30–50° of swept angle. The same whisking motion is portrayed as a phase plot in Fig. 8B (extraction of phase is outlined in methods). Figure 8C presents the phase difference between the left and right mean angles. Such measures of symmetry are useful in understanding the central pattern generators that govern the whisking motion (Towal and Hartmann 2006), as well as the behavioral significance of left-right coordination; symmetry, and contact-induced break-down of symmetry, have been found to be important components of exploratory behavior (Grant et al. 2009; Mitchinson et al. 2007; Towal and Hartmann 2006) and object localization (Knutsen et al. 2006). Figure 8D illustrates the frequency of left and right whisking.
Among the most crucial limitations of the whisker-related videography is the exorbitant cost of high-speed cameras. This motivated us to measure the effect of frame rate on the quality of the whisking parameters. We did so by sampling a 1,000-fps film at progressively lower rates. Figure 9A shows the reconstruction of whisker angle, over time, when the film is sampled at its original rate (1,000 fps) and at 100 and 50 fps. It is clear that the only small errors, consisting largely of underestimation of the maximum and minimum angles, occur at 100 fps, while very large errors occur at 50 fps. We quantified the error by comparing the whisking phase at a progressively higher frequency series of frame rates to the phase taken at 1,000 fps. Phase error drops to under 10° when frame rate increases from 50 to 100. Further, smaller decreases in error occur with increases of frame rate up to 500. As another measure of accuracy, we computed the square of the angle error (again, with respect to the original sampling rate of 1,000 fps). A huge drop in error occurs between 50 and 100 fps. At higher sampling rates, the accuracy asymptotically approaches that of the original video. These analyses show that reliable estimation of whisking frequency and phase can be achieved with video sampling as low as 100–200 fps. Cameras providing such frame rates are significantly less expensive (sometimes one order of magnitude) than cameras built to record 1,000 fps.
Application of the program, with no modification, to a mouse video is given in Supplemental Fig. S1.
In conclusion, the ViSA suite of programs allows automatic and accurate reconstruction of head orientation and translation, as well as the whiskers' cyclical motion, on video with frame rates as low as 100–200 fps.
We present an unsupervised method, referred to as ViSA, for extracting head and whisking dynamics in freely behaving rodents with all whiskers intact. ViSA consists of two interlaced sets of algorithms, one for head tracking and one for characterization of whisking. By fitting a template on the head contour, the head tracking method is able to self-initiate (that is, to detect the first appearance of the animal in the video) and then to track the snout, position, orientation, and snout contour shape, even when substantial occlusions occur. The method also is able to track the snout concurrent with changes in the scale of the contour over time, opening up the possibility of filming experiments with variable magnification cameras that zoom in on the snout area when the animal approaches an object of interest.
The automatically extracted whisking parameters include mean angle of all detected whiskers in every frame, whisking frequency, and whisking phase. Since the same parameters are extracted on both sides of the snout, an additional set of parameters arises from left/right comparisons. Comparison to gold-standard whisker imaging shows that the ViSA software produces a result nearly identical to the human visual system.
The program was designed to function not only on the video recorded in this laboratory but also on a large data set encompassing experimental conditions from the many laboratories interested in rodent sensorimotor function. To foster this generality, a number of features were incorporated. First, is the functionality at lower frame rates. We analyzed videos after down sampling from the original frame rate of 1,000 fps and found that the whisker segment detection method, together with the spatial averaging and temporal filtering, allows an accurate estimate of head and whisking dynamics at frame rates as low as 100–200 per second. The ability to extract the mean angle and the related whisking parameters from video at low rates means that laboratories may invest in medium-speed video equipment rather than in costly high-speed systems. Moreover, if high-speed video is already in place, the camera can be operated at lower sampling rates to reduce the bulkiness of video files or to allow longer film segments to be stored.
Second is the functionality on video of modest quality. Even when optics and lighting conditions are well tuned, some trials will yield images where the whiskers are out of focus because the full vibrissal pad spans several centimeters of depth of field. This is especially true in unrestrained animals that often bob their heads up and down. The first elements to be lost are the whisker tips, as their progressive tapering makes them blurry or even transparent (Fig. 5). Our approach was to detect whisker segments at a user-selected distance from the snout. As a result, no trials need to be discarded because of degraded image quality.
The third feature is automated initiation of the analysis. Traditionally, initiating the tracking in a single film is a laborious process that begins with selecting a satisfactory initial frame and manually inputting snout position and orientation (Knutsen et al. 2005). Our method achieves unsupervised snout tracking initialization in an innovative way by automated scanning of the film frame by frame until the HCGF (see methods) reaches a predefined threshold. The HCGF is based on the distance between the detected edges and the optimally positioned head template (Fig. 1, C and D).
The fourth feature incorporated to facilitate general usage is the scoring of accuracy for every frame. ViSA does not require visual inspection of the output on single frames to discard invalid trials because it provides feedback on tracking reliability for the entire film (Fig. 3A). Fixed distractors such as poles and platforms edges are eliminated by using a background subtraction process and a narrow mask centered on the snout (Fig. 4B), where whiskers shafts are the only elements to be detected. The mask centering is then verified by comparing the head contour to the head template using the HCGF measure.
The set of algorithms presented here differs in several ways from existing methods for mean angle extraction (Gyory et al. 2010; Venkatraman et al. 2009) and whisker tracking (Knutsen et al. 2005; O'Connor et al. 2010; Ritt et al. 2008; Voigts et al. 2008). A summary of the differences between ViSA and previously published programs is given in Table 2. Compared with existing methods of head tracking, our approach based on template fitting does not require head orientation to remain constant during the experiment (Voigts et al. 2008) nor does it require additional light sources to generate eye reflections (Knusten et al. 2005). It does not require restraint of the body (Venkatraman et al. 2009) or gluing a marker on the head (Venkatraman et al. 2009). The head template was tested on rats and mice, revealing no difference in effectiveness (Supplemental Fig. 1).
Our approach to mean angle detection differs from existing methods because it does not binarize the intensity image and because it has been successfully tested on rodents possessing all whiskers. The angle tracking method of Venkatraman et al. (2009) provides a real-time angle measure at 100 fps but requires a marker to be glued on the whisker, limiting the analysis to a single whisker at time. Gyory et al. (2010) presented a method for detecting the mean angle at the base of multiple whiskers. The method converts the input image into a head-centered polar coordinate-based image, where whiskers appear as nearly straight lines. The orientation of these lines is detected applying the Hough transform to a binarized version of the transformed image. The binary conversion, required for the Hough transform voting scheme, is the weakest point of the procedure, since it rarely finds an optimal set of parameters to select all whiskers.
In relation to existing whisker tracking methods, the detection method used in our approach is more robust to image noise compared with best intensity overlap methods that establish the whisker position choosing the spline (Knutsen et al. 2005) or segments (Ritt et al. 2008) that covers those image pixels with larger intensity sum. It differs also from methods that use iterative joining of neighboring pixels with similar width and orientation in traces (O'Connor et al. 2010; Voigts et al. 2008) because it selects the best segments using a global optimization that considers all combinations of detected intensity peaks together.
Unlike single-whisker tracking methods (Knutsen et al. 2005; O'Connor et al. 2010; Ritt et al. 2008 and Voigts et al. 2008), our method does not provide information about the kinetics of individual whiskers. It is not appropriate for studies concerning how vibrissal mechanics contribute to the sensing of the properties of contacted object (Arabzadeh et al. 2005; Hipp et al. 2006; Ritt et al. 2008; Wolfe et al. 2008; Lottem and Azouz 2008). Instead, it allows fast, automatic extraction of the motion of the head as well as the full array of whiskers, bilaterally, from video at medium frame rate and nonoptimal quality. There are many questions that can be addressed with this level of analysis. How does whisking and head movement vary across species (Munz et al. 2010)? What are the motor strategies adopted by animals in exploration of open space and in the detailed analysis of contacted objects (Towal and Hartmann 2006)? How do motor strategies differ according to the task undertaken by the animal (e.g., discrimination of object shape, size, position, texture) (Ganguly and Kleinfeld 2004)? Are head and whisker motion-sensitive markers for nervous system dysfunction in animal models of substance abuse (Medina and Krahe 2008) or pathology such as Parkinson's disease (Pinna et al. 2010)?
This work was supported by grants from Human Frontier Science Program (Contract RG0041/2009-C), the European Community (Contract BIOTACT-21590), the Compagnia San Paolo, the Italian Institute of Technology through the BMI Project, and the Italian Ministry of Economic Development
No conflicts of interest, financial or otherwise, are declared by the author(s).
We are grateful to Ehud Ahissar, Ehud Fonio, and Goren Gordon, from the Weizmann Institute for providing mouse films. Our work benefited from valuable discussions with Goren Gordon and Vladan Rankov from the University of Sheffield. We are grateful to the members of the laboratory, in particular to YanFang Zuo for providing numerous test films with a variety of light settings, and to various outside collaborators for helpful comments. Erik Zorzin, Fabrizio Manzino, Stefano Parusso, and Marco Gigante provided outstanding technical support. We thank the anonymous reviewers who helped us improve the quality of the manuscript.
- Copyright © 2011 the American Physiological Society