Privacy-enhanced video
MNFL power and flexibility

UCR: Assistive monitoring

This project empowers people to easily setup customized in-home cameras and sensors and monitor the video and data via the web, primarily to assist family/friends in need of care, or to assist oneself (especially for the hearing/vision impaired). The project enables people to setup customized automated notifications (e.g., text message, blinking light) upon detection of critical situations, such as a person not arising in the morning, or to record data to detect longer-term critical trends, such as a person getting less exercise or frequently stumbling. The project's novel methods emphasize people customizing the system to meet their unique and changing needs and situations, including their privacy wishes.

Assistive monitoring analyzes data from cameras and sensors for events of interest, and notifies the appropriate persons in response. The ability of the end-user to customize an assistive monitoring system is essential to the system’s practical use.

Automated in-home assistive monitoring with privacy-enhanced video

We demonstrate that privacy-enhanced video can be as accurate as raw video for eight in-home assistive monitoring goals:

Energy expenditure estimation
In room too long
Leave but not return at night
Arisen in morning
Not arisen in morning
In region too long
Abnormally inactive during day
Fall detection

Units of measurement:

Fidelity is the correlation between the video based energy estimation and the Bodybugg energy estimation. A fidelity of 1.0 is ideal.
Accuracy is 1 minus the approximation error. An accuracy of 100% is ideal.
Sensitivity is the ratio of correct fall detections over actual falls, e.g., if 11 falls were correctly detected but there were 12 total falls, sensitivity is 11/12 = 0.92.
Specificity is the ratio of correct non-fall reports over actual non-falls, e.g., if 10 non-falls were reported but there were 11 non-falls, specificity is 10/11 = 0.91.

Higher measurements are better.

Energy estimation average accuracy	Energy estimation average fidelity	Fall detection sensitivity	Fall detection specificity
90.9%	0.994	0.91	0.92
80.5%	0.991	1.00	0.67
85.0%	0.995	0.91	0.75
85.6%	0.994	0.91	0.92
84.3%	0.997	0.82	0.92

For the remaining goals, the sensitivity and specificity of all privacy enhancements were 1.0, except for blur with room-too-long's sensitivity at 0.5.

Adaptive algorithms for compensation of monitoring goal degradation in privacy-enhanced video

Privacy-enhanced video has degraded monitoring goal accuracy compared to raw video. We developed adaptive algorithms that compensate for the degraded accuracy by improving moving-region tracking.

We developed two adaptive algorithms:

Specific-color hunter identifies the most common color of the moving-region (commonly occuring in silhouette, bounding-oval and bounding-box) then tracks that color.

Tracked region is red rectangle.

Before specific-color hunter

→

After hunter

Specific-color hunter improved moving-region tracking.

Edge-void filler runs edge-detection then fills the void of edges (commonly occuring in blur and bounding-box) in each direction starting from the center of the motion-region.

Tracked region is red rectangle.

Before edge-void filler

→

After edge detection

→

After filler

Edge-void filler improved moving-region tracking.

Specific-color hunter experiments compared to the foregrounding experiments. Specific-color hunter had many improvements in fall detection, and with bounding-oval in energy estimation accuracy.

Energy estimation average accuracy	Energy estimation average fidelity	Fall detection sensitivity/specificity
-1.90%	=	=/=
+0.20%	-0.006	+0.08/=
+0.10%	-0.036	=/+0.09
+2.50%	-0.002	+0.17/+0.17
-1.70%	+0.001	=/+0.17
-0.30%	=	=/=

* Performance remained the same for the remaining monitoring goals: room too long, arisen in morning, region too long, and abnormally inactive.

Edge-void filler experiments compared to the foregrounding experiments. Edge-void filler had big improvements in energy estimation accuracy with blur, bounding-oval, and bounding-box.

Energy estimation average accuracy	Energy estimation average fidelity	Fall detection sensitivity/specificity
-1.80%	=	-0.17/=
+6.30%	-0.467	+0.25/-0.07
-.3.80%	-0.004	=/=
+1.90%	-0.011	-0.41/-0.25
+4.10%	-0.004	=/-0.42
=	=	=/=

* Performance remained the same for the remaining monitoring goals: room too long, arisen in morning, region too long, and abnormally inactive. Except blur, which had decreased sensitivity by 0.5 for room too long, region too long, and abnormally inactive; also, specificity of abnormally inactive by 0.17.

Adaptive algorithms can compensate for the degradation. Energy estimation accuracy degraded from 90.9% to 85.2%, but the adaptive algorithms compensated by bringing the accuracy up to 87.7%. Similarly, fall detection accuracy degraded from 1.0 sensitivity to 0.86 and from 1.0 specificity to 0.79, but the adaptive algorithms compensated accuracy back to 0.92 sensitivity and 0.90 specificity.

Accurate and efficient video-based fall detection using moving-regions

The state-of-the-art video-based fall detection algorithms suffer from either low accuracy (moving-region-based) or inefficient computation (2D or 3D projection-based). We developed an accurate and efficient method using many simple state machines.

A hierarchical break-down of our method. State machines are the basis of each component.

We evaluated our method with the University of Montreal fall video dataset, which was also used to evaluate the methods by Hung, Auvinet, and Rougier.

Comparison of our method to the state-of-the-art methods on fall detection. The format for each cell is: sensitivity/specificity.

# of cameras Our method Hung Auvinet Rougier Anderson Miaou Thome

1 0.960/0.995 0.955/0.964 0.900/0.860 0.820/0.980

2 0.990/1.000 0.958/1.000 1.000/0.938 0.980/1.000

3 0.998/1.000 0.806/1.000

4 1.000/0.995 0.997/0.998

5 1.000/0.993 0.999/1.000

6 1.000/1.000 1.000/1.000

7 1.000/1.000

8 1.000/1.000

* An empty cell means unreported or not applicable, such as Hung's algorithm that uses exactly two cameras.

Our method was more accurate than other moving-region-based methods, while being equally efficient. Also, our method was about 10x more efficient than projection-based algorithms, while being more accurate with 3 cameras and equally accurate with 4+ cameras.

2D head tracking vs. moving-region for video-based fall detection

We compared ideal head tracking (manually-performed) versus automated moving-region tracking for fall detection to determine the extent of possible accuracy improvement with head tracking. We made 78 1-minute videos with one actor exhibiting falls or non-falls. Some videos included confounding scenarios such as the actor moving objects like a chair, or throwing hands into the air, which can confuse a moving-region-based detector.

Head tracking
Moving-region tracking

Both the head and moving-region fall detectors used the same 4 state machines (tuned to the particular input) to determine the fall likelihood. Below is the head-based fall detector. The only difference between fall detectors was the input being the head or moving-region tracker.

We compared the sensitivty and specificity of the head and moving-region detectors.

	Head detector	Moving-region detector
sensitivity	0.75	0.75
specificity	0.71	0.43

The difference in specificity was entirely caused by the head-based fall detector knowing the head was not near the ground, while the moving-region-based fall detector did not know. moving-region-based fall detection is suitable for a variety of scenarios; however, when higher accuracy is necessary and confounding situations are likely, the extra computation cost of head-tracking may be justified. We used manual head tracking in this work to determine an upper-bound.

Automated fall detection on privacy-enhanced video

Falls are detectable by algorithms that process raw video from in-home cameras, but raw video raises privacy concerns, especially when stored on a local computer or streamed to a remote computer for processing. We developed an algorithm for automating fall detection on raw video and privacy enhancements video. The key observation is that the moving-region is almost the exact same height and width for raw video and all privacy-enhanced video.

Same fall with raw and privacy-enhanced video

We compared various features of the moving-region for fall detection sensitivity and specificity. Sensitivity is the ratio of correct fall detections over actual falls, e.g., if 11 falls were correctly detected but there were 12 total falls, sensitivity is 11/12 = 0.92. Specificity is the ratio of correct non-fall reports over actual non-falls, e.g., if 10 non-falls were reported but there were 11 non-falls, specificity is 10/11 = 0.91.

Feature	Average sensitivity	Average specificity
Width of moving-region in pixels	0.91	0.92
Height of moving-region in pixels	0.31	0.30
Height-to-width ratio of moving-region	0.44	0.50
Width-to-height ratio of moving-region	0.64	0.67

We compared the sensitivity and specificity for raw video and privacy-enhanced video using the exact same fall detection algorithm using height of moving-region in pixels.

Video style	Average sensitivity	Average specificity
Raw	0.91	0.92
Blur	1.00	0.67
Silhouette	0.91	0.75
Oval	0.91	0.92
Box	0.82	0.92

The automated fall detection algorithm performed well for the privacy-enhanced videos compared to raw video, except perhaps blur, which suffered from the color of the person and the background blending together thus making the moving object harder to identify.

Energy expenditure estimation from video

Automatically estimating a person’s energy expenditure has numerous uses, including determining whether an elderly person living alone is achieving sufficient levels of daily activity. Sufficient activity has been shown to significantly delay the onset of dementia, to reduce the likelihood of falls, to improve mood, and more. Energy expenditure is also important for monitoring of diabetic patients.

A key expected use of video-based energy expenditure estimation is to compare a person’s activity levels across many days, to detect negative trends and thus introduce interventions. As such, a goal of estimation is not necessarily accurate calorie estimation, but rather correct relative estimation of energy expenditure across days, including correct ratios among low/medium/high activity days. Thus, our first experiments sought to determine the fidelity of our video-based energy estimation. We compared our video-based energy expenditure algorithm to a commerically available device, namely the BodyBugg.

Low activity day

Medium activity day

High activity day

The slope changes of the video-based approach are very similar to the BodyBugg. Notice that the video-based approach is consistenly off by 230 Calories. We exploited this observation to improve the Calorie prediction from 86.4% to 91.1% accuracy.

A comprehensive list of the video recordings can be found with this link. Here's an example video recording:

Privacy perception and fall detection accuracy with privacy-enhanced video

Video of in-home activity provides valuable information for assistive monitoring but raises privacy concerns. Raw video can be privacy-enhanced by obscuring the appearance of a person. We considered raw video and five privacy enhancements:

We conducted an experiment with 376 non-engineering participants to determine whether there exists a privacy enhancement that provides sufficient perceived privacy while enabling accurate fall detection by humans.

The oval is the best trade-off between sufficient privacy and fall detection accuracy. However, the optimal privacy enhancements depends on the end-user's requirements.

Monitoring and Notification Flow Language (MNFL)

MNFL enables end-user customization of assistive monitoring systems. Data flows from monitoring devices on the left to notification methods on the right. Each graphical block is always-executing, intuitively analogous to objects in the physical world. The always-executing behavior gives instant feedback to the end-user when two blocks are connected, making development fast and rewarding.

Sensors typically output Boolean (on/off) or Integer (78° F) data, while cameras output Video data. Cameras are integrated with sensors via feature extractors, which convert Video data to Boolean or Integer data. A feature extractor determines the amount of some physical phenomenon from Video data, e.g. the amount of rightward motion from the camera's perspective.

We implemented MNFL as a web browser application, called EasyNotify. This example video shows possible solutions to the problem of an Alzheimer's patient leaving and not returning for a prolonged period of time at night:

We conducted an experiment with 51 non-engineering, non-science undergraduate participants. Participants spent less than 7 minutes per challenge problem.

Video tutorials
Introduction to EasyNotify
Introduction to Compute Blocks I
Introduction to Compute Blocks II
Introduction to Compute Blocks III
How to setup a camera
How to configure a notification message

Demo applications
Demonstration of End-User Setup and Programming of Assistive Monitoring
Person leaving at night and not returning
Person wakes up in the morning
Person in room too long
Person in room too long with warning beeper and snooze button
Person at front door

Publications
A. Edgcomb, F. Vahid. Accurate and Efficient Algorithms that Adapt to Privacy-Enhanced Video for Improved Assistive Monitoring, ACM Transactions on Management Information Systems (TMIS): Special Issue on Informatics for Smart Health and Wellbeing, 2013. (to appear)
A. Edgcomb, F. Vahid. Automated In-Home Assistive Monitoring with Privacy-Enhanced Video, IEEE International Conference on Healthcare Informatics (ICHI), 2013. (to appear)
A. Edgcomb, F. Vahid. Estimating Daily Energy Expenditure from Video for Assistive Monitoring, IEEE International Conference on Healthcare Informatics (ICHI), 2013. (to appear)
A. Edgcomb, F. Vahid. Privacy Perception and Fall Detection Accuracy for In-Home Video Assistive Monitoring with Privacy Enhancements, ACM SIGHIT (Special Interest Group on Health Informatics) Record, 2012.
A. Edgcomb, F. Vahid. Automated Fall Detection on Privacy-Enhanced Video, IEEE Engineering in Medicine & Biology Society, 2012, 4 pages.
A. Edgcomb, F. Vahid. MNFL: The Monitoring and Notification Flow Language for Assistive Monitoring, ACM SIGHIT International Health Informatics Symposium (IHI), 2012.
A. Edgcomb, F. Vahid. Feature Extractors for Integration of Cameras and Sensors during End-User Programming of Assistive Monitoring Systems, Wireless Health, 2011, 2 pages.

Links to some of the above papers can be found at Professor Vahid's publications page.

People
Frank Vahid, Professor
Alex Edgcomb, Ph.D. researcher
Nathan Martin, undergrad researcher
Francesca Perkins,undergrad researcher

# of cameras	Our method	Hung	Auvinet	Rougier	Anderson	Miaou	Thome
1	0.960/0.995			0.955/0.964		0.900/0.860	0.820/0.980
2	0.990/1.000	0.958/1.000			1.000/0.938		0.980/1.000
3	0.998/1.000		0.806/1.000
4	1.000/0.995		0.997/0.998
5	1.000/0.993		0.999/1.000
6	1.000/1.000		1.000/1.000
7			1.000/1.000
8			1.000/1.000