Методи розпізнавання рухів, дій людей на відео послідовностях

Denys Soldatov

doi:10.20535/2617-0965.2019.2.3.164709

PDF

Опубліковано: чер 28, 2019

DOI: https://doi.org/10.20535/2617-0965.2019.2.3.164709

Ключові слова:

розпізнавання рухів, відеопослідовність, оптичний потік, SVM, CNN

Denys Soldatov

Національний Технічний Університет України "Київський Політехнічний Інститут імені Ігоря Сікорського"

Анотація

В статті розглянуто постановку проблеми розпізнавання рухів об’єктів на відеопослідовностях, етапи її вирішення, проведено аналіз основних методів кожного з етапів. Розглянуто ключові складнощі, що виникають при вирішенні задачі. Наведено способи порівняння різних методів. Проаналізовано існуючі підходи до розпізнавання рухів на відеопослідовностях, виявлено особливості, сильні та слабкі сторони та обмеження різних методів виявлення ознак та їх класифікації. Обрано методи для подальшого дослідження та вдосконалення.

Як цитувати

Soldatov, D. (2019). Методи розпізнавання рухів, дій людей на відео послідовностях. Електронна та Акустична Інженерія, 2(3), 27–33. https://doi.org/10.20535/2617-0965.2019.2.3.164709

Номер

Том 2 № 3 (2019)

Розділ

Електронні системи та сигнали

Ця робота ліцензується відповідно до Creative Commons Attribution 4.0 International License.

Автори, які публікуються у цьому журналі, погоджуються з наступними умовами:

Автори залишають за собою право на авторство своєї роботи та передають журналу право першої публікації цієї роботи на умовах ліцензії Creative Commons Attribution License, котра дозволяє іншим особам вільно розповсюджувати опубліковану роботу з обов'язковим посиланням на авторів оригінальної роботи та першу публікацію роботи у цьому журналі.
Автори мають право укладати самостійні додаткові угоди щодо неексклюзивного розповсюдження роботи у тому вигляді, в якому вона була опублікована цим журналом (наприклад, розміщувати роботу в електронному сховищі установи або публікувати у складі монографії), за умови збереження посилання на першу публікацію роботи у цьому журналі.
Політика журналу дозволяє і заохочує розміщення авторами в мережі Інтернет (наприклад, у сховищах установ або на особистих веб-сайтах) рукопису роботи, як до подання цього рукопису до редакції, так і під час його редакційного опрацювання, оскільки це сприяє виникненню продуктивної наукової дискусії та позитивно позначається на оперативності та динаміці цитування опублікованої роботи (див. The Effect of Open Access).

Посилання

Y. Du, F. Chen, and W. Xu, “Human interaction representation and recognition through motion decomposition,” IEEE Signal Processing Letters, vol. 14, no. 12, pp. 952–955, 2007. DOI: 10.1109/LSP.2007.908035

C. Schüldt, I. Laptev, and B. Caputo, “Recognizing human actions: A local SVM approach,” in Proceedings - International Conference on Pattern Recognition, 2004, vol. 3, pp. 32–36. DOI: 10.1109/ICPR.2004.1334462

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 12, pp. 2247–2253, 2007. DOI: 10.1109/TPAMI.2007.70711

Y. Ke, R. Sukthankar, and M. Hebert, “Event detection in crowded videos,” in Proceedings of the IEEE International Conference on Computer Vision, 2007, pp. 1–8. DOI: 10.1109/ICCV.2007.4409011

J. Yuan, Z. Liu, and Y. Wu, “Discriminative subvolume search for efficient action detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2442–2449. DOI: 10.1109/CVPRW.2009.5206671

I. Laptev and P. Pérez, “Retrieving actions in movies,” in Proceedings of the IEEE International Conference on Computer Vision, 2007, pp. 1–8. DOI: 10.1109/ICCV.2007.4409105

M. D. Rodriguez, J. Ahmed, and M. Shah, “Action MACH: A spatio-temporal maximum average correlation height filter for action recognition,” in 26th IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, 2008. DOI: 10.1109/CVPR.2008.4587727

I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” in 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008. DOI: 10.1109/CVPR.2008.4587756

M. Marszałek, I. Laptev, and C. Schmid, “Actions in context,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009, pp. 2929–2936. DOI: 10.1109/CVPRW.2009.5206557

K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild,” 2012. URL: http://arxiv.org/abs/1212.0402

C. Snoek, B. Ghanem, J.C. Niebles, F.C. Heilbron, W. Barrios, V. Escorcia, and P. Mettes. ActivityNet: A Large-Scale Activity Recognition Challenge.

S. M. Kang and R. P. Wildes, “Review of Action Recognition and Detection Methods,” 2016. URL: http://arxiv.org/abs/1610.06906

J. Wang, P. Liu, M. F. H. She, A. Kouzani, and S. Nahavandi, “Supervised learning probabilistic Latent Semantic Analysis for human motion analysis,” Neurocomputing, vol. 100, pp. 134–143, 2013. DOI: 10.1016/j.neucom.2011.10.033

A. Klaeser, M. Marszalek, and C. Schmid, “A Spatio-Temporal Descriptor Based on 3D-Gradients,” in British Machine Vision Conference, 2012, p. 99.1-99.10. DOI: 10.5244/c.22.99

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” in Proceedings - 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, VS-PETS, 2005, pp. 65–72. DOI: 10.1109/VSPETS.2005.1570899

D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. DOI: 10.1023/B:VISI.0000029664.99615.94

K. Mikolajczk and C. Schmid, “A performance of local descriptors,” IEEE Conf. Comput. Vis. Pattern Recognit., vol. 27, no. 10, pp. 1615–1630, 2003. DOI: 10.1109/TPAMI.2005.188

L. Yeffet and L. Wolf, “Local trinary patterns for human action recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2009, pp. 492–497. DOI: 10.1109/ICCV.2009.5459201

E. Shechtman and M. Irani, “Space-time behavior-based correlation - OR - How to tell if two underlying motion fields are similar without computing them?,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 11, pp. 2045–2056, 2007. DOI: 10.1109/TPAMI.2007.1119

H. Ning, T. X. Han, D. B. Walther, M. Liu, and T. S. Huang, “Hierarchical space-time model enabling efficient search for human actions,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 6, pp. 808–820, 2009. DOI: 10.1109/TCSVT.2009.2017399

O. Chomat and J. L. Crowley, “Probabilistic recognition of activity using local appearance,” 2003, pp. 104–109. DOI: 10.1109/cvpr.1999.784616

J. M. Gryn, R. P. Wildes, and J. K. Tsotsos, “Detecting motion patterns via direction maps with application to surveillance,” Comput. Vis. Image Underst., vol. 113, no. 2, pp. 291–307, 2009. DOI: 10.1016/j.cviu.2008.10.006

A. A. Efros, A. C. Berg, G. Mori, and J. Malik, “Recognizing action at a distance,” in IEEE International Conference on Computer Vision, 2004, pp. 726–733. DOI: 10.1109/iccv.2003.1238420

A. Fathi and G. Mori, “Action recognition by learning mid-level motion features,” in 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008. DOI: 10.1109/CVPR.2008.4587735

N. Dalal, B. Triggs, and C. Schmid, “Human detection using oriented histograms of flow and appearance,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3952 LNCS, pp. 428–441, 2006. DOI: 10.1007/11744047_33

H. Wang, A. Kläser, C. Schmid, and C. L. Liu, “Action recognition by dense trajectories,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011, pp. 3169–3176. DOI: 10.1109/CVPR.2011.5995407

H. Wang and C. Schmid, “Action recognition with improved trajectories,” Proc. IEEE Int. Conf. Comput. Vis., pp. 3551–3558, 2013. DOI: 10.1109/ICCV.2013.441

H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2006, vol. 3951 LNCS, pp. 404–417. DOI: 10.1007/11744023_32

J. Shi and C. Tomasi, “Good features to track,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR-94, 1994, pp. 593–600. DOI: 10.1109/CVPR.1994.323794

H. Bilen, B. Fernando, E. Gavves, A. Vedaldi, and S. Gould, “Dynamic Image Networks for Action Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3034–3042. DOI: 10.1109/cvpr.2016.331

X. Wang, A. Farhadi, and A. Gupta, “Actions ~ Transformations,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2658–2667. URL: http://arxiv.org/abs/1512.00795

Z. Shou, D. Wang, and S.-F. Chang, “Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1049–1058. URL: http://arxiv.org/abs/1601.02129

J. Yue et al., “Beyond short snippets: Deep networks for video classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, vol. 07–12–June, pp. 4694–4702. DOI: 10.1109/CVPR.2015.7299101

S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei, “End-to-end Learning of Action Detection from Frame Glimpses in Videos,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2678–2687. DOI: 10.1109/CVPR.2016.293

A. Basharat, A. Gritai, and M. Shah, “Learning object motion patterns for anomaly detection and improved object detection,” in 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008, pp. 1–8. DOI: 10.1109/CVPR.2008.4587510

C. Fanti, L. Zelnik-Manor, and P. Perona, “Hybrid models for human motion recognition,” in Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, 2005, vol. I, pp. 1166–1173. DOI: 10.1109/CVPR.2005.179

S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei, “End-to-end Learning of Action Detection from Frame Glimpses in Videos,” 2015. URL: http://arxiv.org/abs/1511.06984

S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” Dep. Inf. Eng. Chinese Univ. Hong Kong, 2018. URL: http://arxiv.org/abs/1801.07455

Бічна панель сторінки статті

Основний зміст сторінки статті

Анотація

Блок інформації про статтю

Посилання