This project empowers people to easily setup customized in-home
cameras and sensors and monitor the video and data via the web,
primarily
to assist family/friends in need of care, or to assist oneself
(especially
for the hearing/vision impaired). The project enables people to setup
customized automated notifications (e.g., text message, blinking light)
upon detection of critical situations, such as a person not arising in
the
morning, or to record data to detect longer-term critical trends, such
as
a person getting less exercise or frequently stumbling. The project's
novel methods emphasize people customizing the system to meet their
unique and changing needs and situations, including their privacy
wishes.
Assistive monitoring analyzes data from cameras and sensors for
events
of interest, and notifies the appropriate persons in response. The
ability of the end-user to customize an assistive
monitoring system is essential to the system’s practical use.
Automated in-home assistive monitoring with privacy-enhanced video
We demonstrate that privacy-enhanced video can be as accurate as raw video for eight in-home assistive monitoring goals:
Energy expenditure estimation
In room too long
Leave but not return at night
Arisen in morning
Not arisen in morning
In region too long
Abnormally inactive during day
Fall detection
Units of measurement:
Fidelity is the correlation between the video based energy estimation and the Bodybugg energy estimation. A fidelity of 1.0 is ideal.
Accuracy is 1 minus the approximation error. An accuracy of 100% is ideal.
Sensitivity is the ratio of correct fall detections over actual falls, e.g., if 11 falls were correctly detected but there were 12 total falls, sensitivity is 11/12 = 0.92.
Specificity is the ratio of correct non-fall reports over actual non-falls, e.g., if 10 non-falls were reported but there were 11 non-falls, specificity is 10/11 = 0.91.
Higher measurements are better.
Privacy enhancement
Energy estimation average accuracy
Energy estimation average fidelity
Fall detection sensitivity
Fall detection specificity
90.9%
0.994
0.91
0.92
80.5%
0.991
1.00
0.67
85.0%
0.995
0.91
0.75
85.6%
0.994
0.91
0.92
84.3%
0.997
0.82
0.92
For the remaining goals, the sensitivity and specificity of all privacy enhancements were 1.0, except for blur with room-too-long's sensitivity at 0.5.
Adaptive algorithms for compensation of monitoring goal degradation in privacy-enhanced video
Privacy-enhanced video has degraded monitoring goal accuracy compared to raw video. We developed adaptive algorithms that compensate for the degraded accuracy by improving moving-region tracking.
We developed two adaptive algorithms:
Specific-color hunter identifies the most common color of the moving-region (commonly occuring in silhouette, bounding-oval and bounding-box) then tracks that color.
Edge-void filler runs edge-detection then fills the void of edges (commonly occuring in blur and bounding-box) in each direction starting from the center of the motion-region.
Tracked region is red rectangle.
Before edge-void filler
→
After edge detection
→
After filler
Edge-void filler improved moving-region tracking.
Specific-color hunter experiments compared to the foregrounding experiments. Specific-color hunter had many improvements in fall detection, and with bounding-oval in energy estimation accuracy.
Privacy enhancement
Energy estimation average accuracy
Energy estimation average fidelity
Fall detection sensitivity/specificity
-1.90%
=
=/=
+0.20%
-0.006
+0.08/=
+0.10%
-0.036
=/+0.09
+2.50%
-0.002
+0.17/+0.17
-1.70%
+0.001
=/+0.17
-0.30%
=
=/=
* Performance remained the same for the remaining monitoring goals: room too long, arisen in morning, region too long, and abnormally inactive.
Edge-void filler experiments compared to the foregrounding experiments. Edge-void filler had big improvements in energy estimation accuracy with blur, bounding-oval, and bounding-box.
Privacy enhancement
Energy estimation average accuracy
Energy estimation average fidelity
Fall detection sensitivity/specificity
-1.80%
=
-0.17/=
+6.30%
-0.467
+0.25/-0.07
-.3.80%
-0.004
=/=
+1.90%
-0.011
-0.41/-0.25
+4.10%
-0.004
=/-0.42
=
=
=/=
* Performance remained the same for the remaining monitoring goals: room too long, arisen in morning, region too long, and abnormally inactive. Except blur, which had decreased sensitivity by 0.5 for room too long, region too long, and abnormally inactive; also, specificity of abnormally inactive by 0.17.
Adaptive algorithms can compensate for the degradation. Energy estimation accuracy degraded from 90.9% to 85.2%, but the adaptive algorithms compensated by bringing the accuracy up to 87.7%. Similarly, fall detection accuracy degraded from 1.0 sensitivity to 0.86 and from 1.0 specificity to 0.79, but the adaptive
algorithms compensated accuracy back to 0.92 sensitivity and 0.90 specificity.
Accurate and efficient video-based fall detection using moving-regions
The state-of-the-art video-based fall detection algorithms suffer from either low accuracy (moving-region-based) or inefficient computation (2D or 3D projection-based). We developed an accurate and efficient method using many simple state machines.
A hierarchical break-down of our method. State machines are the basis of each component.
Comparison of our method to the state-of-the-art methods on fall detection. The format for each cell is: sensitivity/specificity.
# of cameras
Our method
Hung
Auvinet
Rougier
Anderson
Miaou
Thome
1
0.960/0.995
0.955/0.964
0.900/0.860
0.820/0.980
2
0.990/1.000
0.958/1.000
1.000/0.938
0.980/1.000
3
0.998/1.000
0.806/1.000
4
1.000/0.995
0.997/0.998
5
1.000/0.993
0.999/1.000
6
1.000/1.000
1.000/1.000
7
1.000/1.000
8
1.000/1.000
* An empty cell means unreported or not applicable, such as Hung's algorithm that uses exactly two cameras.
Our method was more accurate than other moving-region-based methods, while being equally efficient. Also, our method was about 10x more efficient than projection-based algorithms, while being more accurate with 3 cameras and equally accurate with 4+ cameras.
2D head tracking vs. moving-region for video-based fall detection
We compared ideal head tracking (manually-performed) versus automated moving-region tracking for fall detection to determine the extent of possible accuracy improvement with head tracking. We made 78 1-minute videos with one actor exhibiting falls or non-falls. Some videos included confounding scenarios such as the actor moving objects like a chair, or throwing hands into the air, which can confuse a moving-region-based detector.
Head tracking
Moving-region tracking
Both the head and moving-region fall detectors used the same 4 state machines (tuned to the particular input) to determine the fall likelihood. Below is the head-based fall detector. The only difference between fall detectors was the input being the head or moving-region tracker.
We compared the sensitivty and specificity of the head and moving-region detectors.
Head detector
Moving-region detector
sensitivity
0.75
0.75
specificity
0.71
0.43
The difference in specificity was entirely caused by the head-based fall detector knowing the head was not near the ground, while the moving-region-based fall detector did not know. moving-region-based fall detection is suitable for a variety of scenarios; however, when higher accuracy is necessary and confounding situations are likely, the extra computation cost of head-tracking may be justified. We used manual head tracking in this work to determine an upper-bound.
Automated fall detection on privacy-enhanced video
Falls are detectable by algorithms that process raw video from in-home cameras, but raw video raises privacy concerns, especially when stored on a local computer or streamed to a remote computer for processing. We developed an algorithm for automating fall detection on raw video and privacy enhancements video. The key observation is that the moving-region is almost the exact same height and width for raw video and all privacy-enhanced video.
Same fall with raw and privacy-enhanced video
We compared various features of the moving-region for fall detection sensitivity and specificity. Sensitivity is the ratio of correct fall detections over actual falls, e.g., if 11 falls were correctly detected but there were 12 total falls, sensitivity is 11/12 = 0.92. Specificity is the ratio of correct non-fall reports over actual non-falls, e.g., if 10 non-falls were reported but there were 11 non-falls, specificity is 10/11 = 0.91.
Feature
Average sensitivity
Average specificity
Width of moving-region in pixels
0.91
0.92
Height of moving-region in pixels
0.31
0.30
Height-to-width ratio of moving-region
0.44
0.50
Width-to-height ratio of moving-region
0.64
0.67
We compared the sensitivity and specificity for raw video and privacy-enhanced video using the exact same fall detection algorithm using height of moving-region in pixels.
Video style
Average sensitivity
Average specificity
Raw
0.91
0.92
Blur
1.00
0.67
Silhouette
0.91
0.75
Oval
0.91
0.92
Box
0.82
0.92
The automated fall detection algorithm performed well for the privacy-enhanced videos compared to raw video, except perhaps blur, which suffered from the color of the person and the background blending together thus making the moving object harder to identify.
Energy expenditure estimation from video
Automatically estimating a person’s energy expenditure has numerous uses, including determining whether
an elderly person living alone is achieving sufficient levels of daily activity. Sufficient activity has been shown
to significantly delay the onset of dementia, to reduce the likelihood of falls, to improve mood, and more. Energy
expenditure is also important for monitoring of diabetic patients.
A key expected use of video-based energy expenditure estimation is to compare a person’s activity levels across many days, to detect negative trends and thus introduce interventions. As such, a goal of estimation is not necessarily accurate calorie estimation, but rather correct relative estimation of energy expenditure across days, including correct ratios among low/medium/high activity days. Thus, our first experiments sought to determine the fidelity of our video-based energy estimation. We compared our video-based energy expenditure algorithm to a commerically available device, namely the BodyBugg.
Low activity day
Medium activity day
High activity day
The slope changes of the video-based approach are very similar to the BodyBugg. Notice that the video-based approach is consistenly off by 230 Calories. We exploited this observation to improve the Calorie prediction from 86.4% to 91.1% accuracy.
A comprehensive list of the video recordings can be found with this link. Here's an example video recording:
Privacy perception and fall detection accuracy with privacy-enhanced video
Video of in-home activity provides valuable information for assistive monitoring but raises privacy concerns. Raw video can be privacy-enhanced by obscuring the appearance of a person. We considered raw video and five privacy enhancements:
We conducted an experiment with 376 non-engineering participants to determine whether there exists a privacy enhancement that provides sufficient perceived privacy while enabling accurate fall detection by humans.
The oval is the best trade-off between sufficient privacy and fall detection accuracy. However, the optimal privacy enhancements depends on the end-user's requirements.
Monitoring and Notification Flow Language (MNFL)
MNFL enables end-user customization of assistive monitoring systems. Data flows from monitoring devices on the left to notification methods on the right. Each graphical block is always-executing, intuitively analogous to objects in the physical world. The always-executing behavior gives instant feedback to the end-user when two blocks are connected, making development fast and rewarding.
Sensors typically output Boolean (on/off) or Integer (78° F) data, while cameras output Video data. Cameras are integrated with sensors via feature extractors, which convert Video data to Boolean or Integer data. A feature extractor determines the amount of some physical phenomenon from Video data, e.g. the amount of rightward motion from the camera's perspective.
We implemented MNFL as a web browser application, called EasyNotify. This example video shows possible solutions to the problem of an Alzheimer's patient leaving and not returning for a prolonged period of time at night:
We conducted an experiment with 51 non-engineering, non-science undergraduate participants. Participants spent less than 7 minutes per challenge problem.
Publications
A. Edgcomb, F. Vahid. Accurate and Efficient Algorithms that Adapt to Privacy-Enhanced Video for Improved Assistive Monitoring, ACM Transactions on Management Information Systems (TMIS): Special Issue on Informatics for Smart Health and Wellbeing, 2013. (to appear)
A. Edgcomb, F. Vahid. Automated In-Home Assistive Monitoring with Privacy-Enhanced Video, IEEE International Conference on Healthcare Informatics (ICHI), 2013. (to appear)
A. Edgcomb, F. Vahid. Estimating Daily Energy Expenditure from Video for Assistive Monitoring, IEEE International Conference on Healthcare Informatics (ICHI), 2013. (to appear)
A. Edgcomb, F. Vahid. Privacy Perception and Fall Detection Accuracy for In-Home Video Assistive Monitoring with Privacy Enhancements, ACM SIGHIT (Special Interest Group on Health Informatics) Record, 2012.
A. Edgcomb, F. Vahid. Automated Fall Detection on Privacy-Enhanced Video, IEEE Engineering in Medicine & Biology Society, 2012, 4 pages.
A. Edgcomb, F. Vahid. MNFL: The Monitoring and Notification Flow Language for Assistive Monitoring, ACM SIGHIT International Health Informatics Symposium (IHI),
2012.
A. Edgcomb, F. Vahid. Feature Extractors for Integration of Cameras and Sensors during End-User Programming of Assistive Monitoring Systems, Wireless Health, 2011, 2 pages.