Database/Bioinformatics Lab

The Database/Bioinformatics Laboratory at Chungbuk National University, we are a group of blue-chip scientists of Computer Science Research in the country by the high productivity of our research, excellent student quality and, increased world-wide recognition of our ability, and our ability of increasingly world-wide recognition. All of us, 211 members, are preparing a new progress toward the development of studies as well as technologies about spatiotemporal databases, data mining, and Bioinformatics since 1986.

Our research concentrates on the theories and applications of databases especially based on spatiotemporal databases and extended with the accumulated technologies. Our lab is divided into several teams according to research aspects: Spatiotemporal Database, Spatiotemporal Mining, and Bioinformatics, which are described in detail as below.

In addition, 56 projects funded by KRF(Korea Research Foundation), KOSEF(Korea Science and Engineering Foundation), a KISTEP(Korea Institute of Science & Technology Evaluation and Planning), KAIST(Korea Advanced Institute of Science and Technology), KISTI(Korea Institute of Science and Technology Information), ETRI(Electronics and Telecommunication Research Institute), MIC(Ministry of Information and Communication Republic of Korea), SMBA(Small and Medium Business Administration), etc. have been performing.

Spatiotemporal Database Group

Moving Object The strong growth in wireless communications and the ever increasing availability of mobile multi-purpose devices have created a global computing environment that plays a key role in the daily activities of millions of people. Conventional issues in data management have to be thought and evaluated anew in this rapidly changing environment. Non-traditional issues including semantics of data, location-centric data services, broadcast and multicast delivery, data availability techniques, security of data, as well as privacy questions have to be addressed.

Major interest topic lists:

  • Data management and indexing for ubiquitous/pervasive/wearable computing
  • Data management in sensor and mobile ad hoc networks
  • Data stream processing in mobile/sensor networks
  • Web access and Internet applications using mobile devices
  • Context-aware computing and location-based services
  • Location tracking of vehicles and moving objects

Temporal GIS Spatiotemporal database system that manages both space and time information is an important research direction. Applications that need the manipulation of spatiotemporal objects changing their positions or shapes over time require efficient and effective management. Spatiotemporal database systems allow users to pose queries that are related to space and time. Our research focuses on efficient query processing techniques to support spatiotemporal queries, query optimization methods for spatiotemporal queries, as well as indexing methods for supporting efficient query process. We are also interested in processing of objects with uncertainty.

Major interest topic lists:

  • Spatiotemporal Data Model for Indeterminate Objects
  • Spatiotemporal Aggregate Method
  • Multidimensional Indexing Method
  • Spatiotemporal Query Processing, especially Continuous Nearest Neighbor Query
  • Spatiotemporal Aggregate Reasoning (Histogram, Sampling, etc)

Database Security Current intrusion detection systems (IDSs) usually generate a large amount of false alerts and cannot fully detect novel attacks or variations of known attacks. In addition, all the existing IDSs focus on low-level attacks or anomalies; none of them can capture the logical steps or strategies behind these attacks. Consequently, the IDSs usually generate a large amount of alerts. In situations where there are intensive intrusive actions, not only will actual alerts be mixed with false alerts, but the amount of alerts will also become unmanageable. As a result, it is difficult for human users or intrusion response systems to understand the intrusions behind the alerts and take appropriate actions.

Bioinformatics Group

  • Data mining for Alert correlation analysis of IDS
  • Adaptation Security Model for Network IDS
  • Secure Routing Algorithm in Wireless Sensor Network
  • Privacy preserving in data mining
  • Privacy preserving for location tracking of vehicles and moving objects

Recently, a mass of biological information has been made in many laboratories since the techniques to get the sequences and structural information of genomes or proteins in HGP(Human Genome Project) have been improved. Now, what we have to do in Post-Genome Era is to identify the function from their unknown genome information. The biologists have connected the public biological Databases and retrieve information which is similar with what they have, in order to identify their unknown genome information. Our work aims to help to reduce their efforts in homology research, functional analysis, prediction and so on. Our interesting study area consists of structural binding site, motif finding, bio-ontology, biological data integration and management base on biological data mining.

Protein Structures Our research goal is to develop a system to predict the functional relationships of proteins from protein surfaces.

Major interest topic lists:

  • The methods to identify the locations of active sites based on the structural and biochemical features of protein surfaces
  • The approaches to predict protein-protein interactions based on the biophysical and chemical characters of protein surfaces
  • The strategy to infer the functional relations among proteins based on physical and biochemical characteristics
  • The synthetic methods to predict the protein functions by using the features extracted from protein sequences, protein folding, and protein surfaces.

Integration and Transformation of Biological Data The biologists connect the public biological Databases and retrieve sequences which are similar to what they have, and then this work is utilized in homology research, functional analysis and prediction. Unfortunately, there are scarcely the software packages to deal with the sequence data in most of biological laboratories and they are just stored in file formats. The integration and management technique of heterogeneous sequence data from public sequence databases is widely used to make diverse information and prediction. Thus the database management technique that is suitable for a sequence data is required. Especially, an integrated data model which handles the modification of program and data is necessary for analysis on the various programs.

Major interest topic lists:

  • Sequence and spatial mining about the motif resources
  • Motif recognition or identification
  • Motif discovery using both sequence and structure information
  • Analysis about biochemistry, geometry, topology and properties of motif

Data Mining for Biological Data Analysis Recent progress in biology, medical science, bioinformatics and biotechnology has led to the accumulation of tremendous amounts of bio-data that demands in depth analysis. Moreover, recent progress in data mining research has led to the development of numerous efficient and scalable methods for mining interesting patterns in large database. We study the data mining methods that help bio-data analysis.

Major interest topic lists:

  • analysis of frequent patterns, sequential patterns and structured patterns : identification of co-occurring or correlated bio-sequence or bio-structure patterns
  • classification and cluster analysis methods
  • Medeling of biological networks
  • Interpretation/Evaluation and Visualization

Gene Ontology Research Gene ontology develops three structured, controlled vocabularies (ontologies) to describe gene products in terms of: biological process, cellular component, molecular function in a species-independent manner. Gene Ontology determines common terms to describe the information of genes and then describes the relations of them. It was constructed by RDF, RDF/XML, OWL, and so on. Based on the gene ontology, semantic search is possible from the heterogeneous database. We can research the relations between ontology-based similarity of genes and functional properties.

Major interest topic lists:

  • The relations between ontology-based similarity of genes and functional properties
  • The associations of biologically relevant terms to groups of genes
  • The semantic similarity measures

spatiotemporal Data Mining Group

Since 1999, our research group has been studying data mining which is defined as finding hidden information in a large database. In particular, we had usually studied spatial & temporal data mining using existing data mining methods such as association, classification, clustering, sequential pattern and so on. Recently, we are interested in wireless sensor network which handles large multivariate stream data. In this area, we have been studying multivariate stream data processing related to stream data aggregation, continuous query, data reduction (attribute/feature selection) and stream data mining which assumes an incoming data stream is partitioned into sequential chunks of fixed size.

Spatio-temporal data mining Spatial data are data that have a spatial or location component. Spatial data can be viewed as data about objects that are located themselves in a physical space. Generally speaking, spatial mining or knowledge discovery in spatial databases is data mining as applied to spatial database or spatial data. Some of the applications for spatial data mining are in the areas of GIS systems, geology, environmental science, resource management, agriculture, medicine, and robotics.

Major interest topic lists:

  • Spatial data generalization and specialization
  • Spatial association/clustering/classification rule
  • Visualization for spatial knowledge
  • Temporal association/classification rule
  • Sequential pattern and knowledge representation

Stream data mining Mining data streams brings unique opportunities but also new challenges. The main challenge is that `data-intensive' mining is constrained by limited resources of time, memory, and sample size. Data mining has traditionally been performed over static datasets, where data mining algorithms can afford to read the input data several times. When the source of data items is an open-ended data stream, not all data can be loaded into the memory and off-line mining with a fixed size dataset is no longer technically feasible due to the unique features of streaming data.

Major interest topic lists:

  • Aggregation based on wireless sensor networks
  • Query indexing based on wireless sensor networks
  • Stream query processing
  • Stream data classification
  • Attribute/Feature selection measure for multivariate stream data