Microsoft word - ijrte02018387.doc

FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 
EVISTA – Interactive Visual Clustering System 
K. Thangavel1, P. Alagambigai2 
1 Department of Computer Science, Periyar University, Salem, Tamilnadu, India 
Email: [email protected] 
2 Department of Computer Applications, Easwari Engineering College, Chennai, Tamilnadu, India 
 Email: [email protected] 
Abstract—Due to the enormous increase in the data, exploring 
involvement in the interactive process. More recently there 
and analyzing them is increasingly important but difficult to 
are a lot of discussions on visualization for data mining. 
achieve. Information visualization and visual data mining can 
Visual data mining can be viewed as an integration of data 
help to deal with this. Visual data exploration has a high 
visualization and data mining [5, 15]. Considering 
potential and many applications such as fraud detection and 
visualization as a supporting technology in data mining, four 
data mining will use information visualization technology for an 
improved data analysis. The advantage of visual data 
possible approaches are stated in [1]. The first approach is the 
exploration is that the user is directly involved in the data 
usage of visualization technique to present the results that are 
mining process. There are a large number of information 
obtained from mining the data in the database. Second 
visualization techniques which have been developed over the last 
approach is applying the data mining technique to 
decade to support the exploration of large data sets. VISTA is an 
visualization by capturing essential semantics visually. The 
interactive visual cluster rendering system which invites human 
third approach is to use visualization techniques to 
into the clustering process, but there are some limitations in 
complement the data mining techniques. The fourth approach 
identifying the cluster distribution and human-computer 
uses visualization technique to steer mining process. 
interaction. In this paper, we propose an Enhanced VISTA 
In general, visualization can be used to explore data to 
(EVISTA) which addresses these drawbacks. EVISTA improves 
the visualization in two ways: first it uses the weighted vector 
confirm a hypothesis or to manipulate a view. Exploratory 
normalization instead of max-min normalization, which visualization creates a dynamic scenario in which interaction 
improves the data visualization such that the user can 
is critical. The user not necessarily know that what he/she is 
understand the underlying pattern without human intervention. 
looking for, can search for structures or trends and is 
Secondly it completely eliminates the use of α tuning, which 
attempting to arrive at some hypothesis. The confirmatory 
reduces the complexity in visual distance computation and eases 
visualization, in which the system parameters are often 
the human computer interaction in a better way. The 
predetermined and the visualization tools are used to confirm 
experiment results show that EVISTA explore the underlying 
or refute the hypothesis. The manipulative visualization 
pattern of the dataset effectively and reduces the user operation 
focuses on refining the visualization to optimize the 
burden greatly. 
 
presentation. Visualization has been categorized in to two 
Index Terms— Clustering, EVISTA, Human-computer major areas: i) scientific visualization –which focuses 
interaction, Information visualization, Visual data mining. 
primarily on physical data such as human body, etc. ii) Information visualization – which focuses on abstract 
nonphysical data such as text, hierarchies and statistical data. Data mining techniques primarily oriented on information 
Data visualization is essential for understanding the 
visualization [4]. Both scientific visualization and 
concept of multidimensional spaces [5]. It allows the user to 
information visualization create graphical models and visual 
explore the data in different ways at different levels of 
representations from data that support direct user interaction 
abstraction to find the right levels of details. Therefore 
for interaction for exploring and acquiring insight in to useful 
techniques are most useful if they are highly interactive, 
information embedded in the underlying data [10, 15]. Even 
permit direct manipulation and include a rapid response time. 
though visualization techniques have advantages over 
Visualization is defined by ware as "a graphical automatic methods, it brings up some specific problems such 
representation of data or concepts" which is either an 
as limitation in visibility, visual bias due to mapping of 
"internal construct of the mind" or an "external artifact 
dataset to 2D/ 3D representation, easy-to-use visual interface 
supporting decision making". Visualization provides valuable 
operations and reliable human-computer interaction. In most 
assistance to the human by representing information visually. 
of the visualization methods the human-computer interaction 
This assistance may be called cognitive support. Visualization 
costs than automated [9]. In general, the visual data mining is 
can provide cognitive support through a number of different from scientific visualization and it has the following 
mechanisms such as grouping related information for easy 
characteristics: 
search and access, representing large volumes of data in a 
 Wide range of users 
small space and imposing structure on data and tasks can 
 Wide choice of visualization techniques and 
reduce time complexity, allowing interactive exploration 
 Important dialog function. 
through manipulation of parameter values [11]. 
The users of scientific visualization are scientists and 
Visualization techniques could enhance the current engineers who can endure the difficulty in using the system 
knowledge and data discovery methods by increasing the user 
for little at most, whereas a visual data mining must have the 
 2009 ACADEMY PUBLISHER

FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 
possibility that the general persons uses widely and so on 
circumference of the circle C, where the unit vectors are 
easily [16]. By considering this issue, this paper proposes a 
novel information visualization technique called enhanced 
visual clustering system (EVISTA), an extension version of 
Si = (cos(
1 2,., k  
VISTA [8]. VISTA, a dynamic data visualization model 
which invite human into the clustering process. Even though 
And the 2D point Q( x, y) is obtained by, 
VISTA proved to be an efficient interactive visual cluster 
rendering system, it requires a complete user interaction 
throughout the clustering process. When the number of 
)∑ yi'sin
dimension increases, the human computer interaction 
becomes tedious. EVISTA designed in such a way to provide 
an efficient data visualization such that the user can able to 
understand the underlying pattern of the given data set 
without human intervention. 
The rest of the paper is organized as follows: Section 2 
x  represents the given data object, i
discusses reviews of the related works in the domain of 
normalized data value based on weighted vector wt  [14] 
information visualization. Section 3 deals with the EVISTA. 
Section 4 discusses the experimental analysis. Section 5 
concludes the paper. 
EVISTA employs the design of VISTA visual cluster 
II. RELATED WORKS 
rendering proposed by KeKe Chen and L. Liu [8] provides an 
Various efforts are made to visualize multidimensional 
intuitive way to visualize clusters with interactive feedbacks 
datasets [2, 10, 11, 13]. The early research on general plot 
to encourage domain experts to participate in the clustering 
based data visualization is Grand Tour and Projection Pursuit 
revision and cluster validation process. It allows the user to 
[2]. The purpose of the Grand Tour and Projection Pursuit is 
interactively observe potential clusters in a series of 
to guide user to find the interesting projections. 
continuously changing visualizations through α. More 
L.Yang [2] utilizes the Grand Tour technique to show 
importantly, it can include algorithmic clustering results and 
projections of datasets in an animation. They project the 
serve as an effective validation and refinement tool for 
dimensions to co-ordinate in a 3D space. However, when the 
irregularly shaped clusters [9]. The VISTA system has two 
3D space is shown on a 2D screen, some axes may be 
unique features. First, it implements a linear and reliable 
overlapped by other axes, which make it hard to perform 
visualization model to interactively visualize the multi-
direct interactions on dimensions. 
dimensional datasets in a 2D star-coordinate space. Second, it 
Star coordinate [7] is an interactive visualization model 
provides a richest set of user-friendly interactive rendering 
which treats dimensions uniformly, in which data are 
operations, allowing users to validate and refine the cluster 
represented coarsely and by simple and more space efficient 
structure based on their visual experience as well as their 
points, which result in less cluttered visualization for large 
domain knowledge. 
The VISTA visualization model consists of two linear 
 Interactive visual clustering (IVC) [10] combines spring-
mappings: Max-min normalization followed by α-mapping. 
embedded graph layout techniques with user interaction and 
Equation (5) represents the Max-Min normalization: is used 
constrained clustering. 
to normalize the columns in the datasets so as to eliminate the 
VISTA [8, 9] is a recent visualization models utilizes star 
dominating effect of large-valued columns. 
coordinate system provide similar mapping function like star 
⎡ 2 (v − min)
co-ordinate systems. There are two types of cluster rendering 
in VISTA model. The former one is unguided rendering and 
the latter is guided rendering. 
where v  is the original and v  is the normalized value. The α - mapping maps k dimensional points on to two 
III. ENHANCED VISUAL CLUSTERING SYSTEM 
dimensional visual spaces with the convenience of visual parameter tuning. 
Enhanced VISTA (EVISTA) is an information 
The proposed visualization model EVISTA utilizes the 
visualization frameworks employs improved data weighted vector normalization which is performed on rows 
visualization and reveal the hidden patterns in complex high 
instead of columns, such that the visualization model defines 
dimensional data sets, without human intervention. The 
the reliable position of Q ( x , y ) . EVISTA completely 
EVISTA model is designed based on the star coordinates. 
eliminates the usage of α- tuning, since α- mapping is tedious 
Star coordinate system is a traditional multivariate data 
when the number of dimensions is high. And each change in 
visualization technique in which the k-axis is defined by an 
α- values requires a fresh visual distance computation. As the 
O = ( x, y) 
k coordinate 
number of dimensions increases, visual distance computation 
S ,1 S 2, S ,.,
S represents the k dimensions in 2D spaces. 
process may create time complexity. Similar effects may 
The k coordinates are equidistantly distributed on the 
occur when the number of data objects increases. This makes 
 2009 ACADEMY PUBLISHER

FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 
the human computer interaction ineffective and affects the 
B. Results and Discussion 
applicability of VISTA. 
IV. EXPERIMENTAL ANLYSIS 
To illustrate the efficiency of our proposed visualization, 
empirical analyses are conducted on number of bench mark 
data sets available in the UCI machine learning data 
repository. The performance of EVISTA is compared against 
VISTA system and the automatic clustering algorithm K-
Figure 1. Visualization of Iris Dataset using VISTA system 
Means. The experiments in VISTA are conducted by setting α 
value as 1.The detailed information of the data sets is shown 
DETAILS OF DATASETS 
A. Cluster validation 
Validation of clusters is very important in cluster analysis, 
Figure 2. Visualization of Iris Dataset after α- tuning using VISTA system 
because clustering methods tend to generate clustering even 
for fairly homogeneous datasets. The quality of clusters 
obtained through visual clustering is measured in terms of 
three classical methods proposed in [3]; 
The Rand index and Jaccard coefficient validations 
Attributes Classes 
Figure 3. Visualization of Iris Dataset using EVISTA system 
In VISTA, the domain knowledge plays a vital role in 
finding the optimum number of clusters. In general, the 
domain knowledge in the form of labeled items obtained by 
e traditional automatic clustering algorithms such as K-Means 
d on the agreement between clustering results and 
can be incorporated in to the visual clustering process. And a 
the "ground truth". 
user without domain knowledge may fail in finding the 
The classical validity measures are heavily related to the 
optimum clusters, since α tuning change the data point 
geometry or density nature of clusters and they do not work 
distribution. Most of the automated clustering algorithms 
well for arbitrary shaped clusters [8]. In such cases, visual 
require the number of clusters to be specified prior, that may 
perception plays an important in deciding right clusters. 
not coincide with real cluster distribution of the dataset. This 
increases the complexity of clustering process. EVISTA 
Iris Data:  Iris dataset is a benchmark dataset widely used 
reduces the complexity of clustering by eliminating the usage 
in pattern recognition and clustering. It is formed by 150 four 
of α. Figure. 3 show the iris dataset visualization based on 
dimensional instances of the three classes of plants classified 
according to the sepal length and width and the petal length 
From the results, it is observed that one cluster is 
and width. The iris dataset consists of three clusters with 
completely separated from the others and the visual 
equal distribution. One cluster is linearly separable from the 
boundaries between the other two clusters are clearly 
other two; the latter two are not exactly linearly separable 
identified. It is also noticed that there are only two data points 
from each other. Figure.1 shows the initial visualization of 
are overlapped. Since EVISTA doesn't possess α tuning the 
iris dataset in VISTA model, where we observe the possibility 
process of visual distance computation process is completely 
of three clusters. And it is observed from the figure that, one 
eliminated, which reduces the time complexity. EVISTA 
cluster is completely separated from the other two, where the 
doesn't require the domain knowledge in any form, which 
remaining two are found to be overlapped. After performing 
eases the human computer interaction and it visualizes the 
interactive visual clustering with suitable α tuning the visual 
exact pattern of the given dataset without human intervention. 
boundaries between the clusters become clearer. Figure. 2 
show the visualization of iris dataset after α tuning. As the 
Australian Data:  
literature of iris dataset specified, the two clusters are not 
Australian Dataset concerns with credit card applications. 
linearly separable. In VISTA it could be observed after the 
This dataset is interesting because there is a good mix of 
fine tuning of α. And the small region which consisting of the 
attributes continuous, nominal with small numbers of 
overlapping data points are also observed. And more 
values, and nominal with larger numbers of values. This data 
importantly the separation of two clusters found to be 
set also has missing values. Suitable statistical based 
difficult for the users. 
computation is applied for finding the missing values. It has 
 2009 ACADEMY PUBLISHER

FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 
two classes. The class distribution is 44.5% for class A and 
data visualization. EVISTA is designed with weight vector 
55.5% for class B. 
normalization, which improves the data exploration. And the 
 Figure.4 show the visualization of Australian data set in 
elimination of α tuning in the visualization process reduces 
VISTA, where possibly one single cluster is observed. During 
the complexity of human – computer interaction. More 
α tuning, the user can able to identify the two clusters. If the α 
importantly EVISTA doesn't require the domain knowledge 
tuning is not performed carefully, the user may get different 
in any form, which improves the applicability of EVISTA. 
pattern which may leads confusion. Figure. 5 show the 
The experiment results show that the EVISTA efficiently 
process of α tuning, where it is observed four cluster 
identifies the cluster distribution and reduces the complexity 
distribution. This leads a poor cluster quality. In such case, 
in the visual distance computation. Specifically it eases the 
domain knowledge is the only aid to identify the optimum 
human-computer interaction. 
number of clusters. Figure. 5 show the cluster distribution 
using EVISTA; where two potential clusters are observed. 
Since α tuning is not included in the EVISTA model, the 
cluster distribution can be clearly visualized. Even though the 
user doesn't have enough domain knowledge in any of the 
form such as: number of clusters, cluster distribution, 
visualization model EVISTA suitably identifies the optimum 
Figure 4. Visualization of Australian Dataset using VISTA system
number of clusters. 
Pima Data 
Pima Dataset is an Indian Diabetes Database with 768 data 
objects. It has two classes with class distribution as 500 and 
268. It consists of attributes such as number of times 
pregnant, Plasma glucose concentration, Diastolic blood 
pressure (mm Hg), Triceps skin fold thickness (mm), 
Diabetes pedigree function, etc. Figure. 7 show the VISTA 
Figure 5. Visualization of Australian Dataset using VISTA system with α- 
visualization of pima Indian dataset. When the pima dataset is 
visualized using VISTA, one possible cluster is observed. Even the suitable α tuning doesn't distinguish the clusters. 
The boundary regions of the two clusters are possibly not 
Whereas EVISTA visualization of pima dataset clearly 
shows two potential clusters. From Fig. 8 it is observed that 
pima dataset contains two potential clusters, and few data 
objects are scattered around the potential area. Since EVISTA 
doesn't require α tuning the user may find it very flexible in finding the underlying pattern of the dataset without human 
Figure 6. Visualization of Australian Dataset using EVISTA 
intervention. And with suitable geometric transformation 
such as scaling and rotation the user may able to observe the 
cluster distribution according to their visual perception. 
C. Comparative Analysis 
This part of the section compares the results of EVISTA 
with VISTA and the centroid based automatic clustering 
algorithm K-Means. In EVISTA the cluster labeling is 
Figure 7. Visualization of Pima Dataset using VISTA system 
performed using free hand drawing. The area with potential 
data points are covered by convex hull and the data points in the convex hull are labeled as one single cluster. The cluster 
results are evaluated based on Rand Index and Jaccard 
coefficients are shown in Table II and Table III. The results 
of VISTA are obtained by conducting the experiments on 
several runs and the average of them is taken for experimental 
Figure 8. Visualization of Pima Dataset using EVISTA system 
With the development of data collection technology, 
effective data visualization models are required to understand the pattern of multidimensional and multivariate data. In this 
paper Enhanced VISTA is proposed to gain improvement in 
 2009 ACADEMY PUBLISHER

FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 
First author expresses his thanks to University Grants 
Commission for financial support (F-No. 34-105/2008, SR). 
COMPARISON OF EVISTA WITH VISTA AND K-MEANS BASED ON RAND 
Visual Clustering 
[1] Bhavani Thuraisingham, "DataMining: Technologies, 
Techniques, Tools and Trends", CRC press, London,Newyork, 
Washington,1999. 
Without α 
With α 
[2] Cook, D.R., Buja, A., Cabrea, J., and Harley, H.: Grand Tour 
and Projection pursuit. J.Computational and Graphical 
Statistics, v23, (1995). 
[3] Daxin Jiang, Chun Tang, Aidong Zhang, "Cluster analysis for 
68.00 65.45 61.71 
gene expression data: a survey", IEEE Transactions on 
Knowledge and Data Engineering, Vol. 16, No.11, 2004. 
55.13 [4] Daniel, Keim, A., and Hans-Peter (1996), ‘Visualization 
Australian 63.46 
Techniques for Mining Large Databases:A Comparison', IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 
[5] J. Han and M. Kamber," Data Mining: Concepts and 
Techniques," Morgan Kaufmann Publishers, August 2000, ISBN 1-55860-489-8. 
[6] A., K ,Jain,, M. N., Murty and Flynn P.J," Data clustering : A 
COMPARISON OF EVISTA WITH VISTA AND K-MEANS 
Review", ACM computing surveys, 1999. 
BASED ON JACCARD COEFFICIENT 
[7] E. Kandogan," Visualizing Multi-dimensional Clusters," 
Trends and outliers using star co-ordinates, Proc of ACM 
Visual Clustering 
KDD, 2001. 
[8] Keke Chen and Liu. L, "VISTA: "Validating and Refining 
With α 
clusters via Visualization", Information Visualization, Vol. 3, 4, 
α tuning 
 Keke Chen and Liu.L, "iVIBRATE:" Interactive Visualization-
Based Framework for Clustering Large Datasets", ACM 
58.00 64.31 59.05 
Transactions on Information Systems, Vol. 24, April 2006, 
45.84 [10] Marie desJardins, James MacGlashan, Julia Ferraioli," 
Australian 48.82 
Interactive visual clustering," Intelligent User Interfaces 2007, 
[11] Melanie Tory and Torsten Moller, "Human Factors in 
Visualization Research," IEEE Transactions on Visualization and Computer Graphics, 10(1), 2004. 
[12] Pang-ning Tan, Michael Steinbach and Vipin Kumar, 
"Introduction to Data Mining", Pearson Addison Wesley, Boston, 2006. 
[13] O.,Sourina., D., Liu.,"Visual interactive 3-dimensional 
clustering with implicit functions", Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Volume: 
1, 1-3 Dec 2004, pp. 382-386. 
[14] Thangavel. K and Ashok Kumar. D, ‘Optimization of code 
Figure 9. Comparison based on Rand Index 
book in Vector Quantization", International Journal Annals of Operations Research, Vol.143, No.1, 317-325, 2006. 
[15] Ye N., "The Hand Book of Data Mining", Lawrence Erlabum 
Associates, Publishers, Mahwah, Newjersey, 2003. 
[16] Zhen Liu, Shinichi Kamohara., Minyi Guo,"A Scheme of 
interactive Data Mining Support System in Parallel and Distributed Environment," ISPA 2003, LCNS 2745, Springer-verlag, pp. 263-272, 2003. 
Figure 10. Comparison based on Jaccard coefficients 
 2009 ACADEMY PUBLISHER
Source: http://www.academypublisher.com/ijrte/vol02/no01/ijrte02018387.pdf
australia21.org.au
Editors Bob Douglas and Jo Wodak Trauma-related stress in AustraliaEssays by leading Australian thinkers and researchers Stress arising from trauma is affecting millions of Australians. A national conversation is required to consider how we can better manage this problem. Australia21 LtdABN: 25096242410 ACN: 096242410 PO Box 3244 Weston ACT 2611Phone: +61(0)2 62880823Email: [email protected]
Datasheet for bl21(de3) competent e. coli (c2527; lot 30)
5 Minute Transformation Protocol BL21(DE3) A shortened transformation protocol resulting in approximately 10% effi-ciency compared to the standard protocol may be suitable for applications Competent E. coli where a reduced total number of transformants is acceptable.Follow the Transformation Protocol with the following changes:1. Steps 3 and 5 are reduced to 2 minutes.