Multimedia Data Mining: Integration of Multi-modal Data for a Better Retrieval and Analysis (CAREER Award)

Summary:

Large amounts of multi-modal data is already being stored in many collections. There is a huge amount of information, but it is not possible to access or make use of this information unless it is efficiently organized to extract the semantics. The proposed research will show how semantics can be learned from loosely related multi-modal data. Learning from such a data is important, because it is available in large amounts. On the other hand, it is very hard to obtain the tightly related data which can be only produced by manual labeling. We present a new approach for multi-modal data sets, focusing on image and video collections with associated textual information. Learning the relationships between visual and textual information is an interesting example of multimedia data mining, particularly because it is hard to apply data mining techniques to collections of images and videos. The approach is mainly based on modelling the joint distribution of visual and textual features. First, we will focus on collections where visual features are attached to text. These include image collections with annotated keywords, news videos with speech recognition text, and handwritten Ottoman documents with transcripts. We will propose novel methods for object recognition, face recognition, and hand writing recognition. The main contribution of the approach will be learning large quantities of concepts which is usually limited in traditional computer vision approaches. The proposed system will lead to efficient retrieval and browsing, and some interesting applications including auto-annotation, auto-illustration and auto-documentary. Another focus of the project will be modeling the information extracted from different sensors including optic and infrared cameras and microphones. These data will be used to analyse the movements of patients and elderly people in hospitals and nursing homes. Abrupt changes in movements will be noticed immediately, and a daily summary report will be provided to the doctor.

People:

Faculty
- Pınar Duygulu (Principal investigator)
Graduate Students
- Esra Ataer (M.S. student)
- Muhammet Baştan (Ph.D. student)
- Tolga Can (M.S. student)
- Derya Özkan (M.S. student)

Duration:

April 2005 - April 2010

Sponsor:

TUBITAK - Scientific and Technical Research Council of Turkey (Grant no: 104E065)

Budget:

157,000 YTL (~US$90,000)

Publications:

Pinar Duygulu, Muhammet Bastan, "Translating images to words for recognizing objects in large image and video collections," in J. Ponce, M. Hebert, C. Schmid, A. Zisserman, eds., Towards Category-Level Object Recognition, Springer Lecture Notes in Computer Science Series, 2006.
Kobus Barnard, Pinar Duygulu, David Forsyth, "Exploiting Text and Image Feature Co-occurrence Statistics in Large Datasets," in R.C. Veltkamp, H.-P. Kriegel, L. Shapiro, eds., Trends and Advances in Content-Based Image and Video Retrieval, Springer Lecture Notes in Computer Science Series, 2005.
Giridharan Iyengar, Pinar Duygulu, Shaolei feng, Pavel Ircing, Sanjeev Khudanpur, Dietrich Klakow, Matthew Krause, R. Manmatha, Harriet J. Nock, Desislava Petkova, Brock Pytlik, Paola Virga, "Joint Visual-Text Modeling for Automatic Retrieval of Multimedia Documents," in 13th ACM Multimedia Conference, Singapore, November 6-12, 2005.
Nazli Ikizler, Pinar Duygulu, "Person Search Made Easy," in 4th International Conference on Image and Video Retrieval, Singapore, July 20-22, 2005, also published in Lecture Notes in Computer Science, vol. 3568/2005.
Paola Virga, Pinar Duygulu, "Systematic Evaluation of Machine Translation Methods for Image and Video Annotation," in 4th International Conference on Image and Video Retrieval, Singapore, July 20-22, 2005, also published in Lecture Notes in Computer Science, vol. 3568/2005.
Nazli Ikizler, Pinar Duygulu, "Haber Videolari icin Yuz Bulma Yontemlerinin Iyilestirilmesi (in Turkish)," in IEEE 13. Sinyal Isleme ve Iletisim Uygulamalari Kurultayi, Kayseri, Turkey, May 16-18, 2005.