Wednesday, November 27, 2013
A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data
Posted by AI bot at 5:07 PM
0 Comments
ABSTRACT
Feature
selection involves identifying a subset of the most useful features that
produces compatible results as the original entire set of features. A feature
selection algorithm may be evaluated from both the efficiency and effectiveness
points of view. While the efficiency concerns the time required to find a
subset of features, the effectiveness is related to the quality of the subset
of features. Based on these criteria, a fast clustering-based feature selection
algorithm (FAST) is proposed and experimentally evaluated in this paper. The
FAST algorithm works in two steps. In the first step, features are divided into
clusters by using graph-theoretic clustering methods. In the second step, the
most representative feature that is strongly related to target classes is
selected from each cluster to form a subset of features. Features in different
clusters are relatively independent; the clustering-based strategy of FAST has
a high probability of producing a subset of useful and independent features. To
ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree
(MST) clustering method. The efficiency and effectiveness of the FAST algorithm
are evaluated through an empirical study. Extensive experiments are carried out
to compare FAST and several representative feature selection algorithms,
namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types
of well-known classifiers, namely, the probabilitybased Naive Bayes, the
tree-based C4.5, the instance-based IB1, and the rule-based RIPPER before and
after feature selection. The results, on 35 publicly available real-world
high-dimensional image, microarray, and text data, demonstrate that the FAST
not only produces smaller subsets of features but also improves the
performances of the four types of classifiers.
Existing System
The embedded methods incorporate
feature selection as a part of the training process and are usually specific to
given learning algorithms, and therefore may be more efficient than the other
three categories. Traditional machine learning algorithms like decision trees
or artificial neural networks are examples of embedded approaches. The wrapper
methods use the predictive accuracy of a predetermined learning algorithm to
determine the goodness of the selected subsets, the accuracy of the learning
algorithms is usually high. However, the generality of the selected features is
limited and the computational complexity is large. The filter methods are
independent of learning algorithms, with good generality. Their computational
complexity is low, but the accuracy of the learning algorithms is not
guaranteed. The hybrid methods are a combination of filter and wrapper methods
by using a filter method to reduce search space that will be considered by the
subsequent wrapper. They mainly focus on combining filter and wrapper methods
to achieve the best possible performance with a particular learning algorithm
with similar time complexity of the filter methods.
Disadvantages
1. The generality of
the selected features is limited and the computational complexity is large.
2. Their computational
complexity is low, but the accuracy of the learning algorithms is not
guaranteed.
Proposed System
Feature subset selection
can be viewed as the process of identifying and removing as many irrelevant and
redundant features as possible. This is because irrelevant features do not
contribute to the predictive accuracy and redundant features do not redound to
getting a better predictor for that they provide mostly information which is
already present in other feature(s). Of the many feature subset selection
algorithms, some can effectively eliminate irrelevant features but fail to
handle redundant features yet some of others can eliminate the irrelevant while
taking care of the redundant features. Our proposed FAST algorithm falls into
the second group. Traditionally, feature subset selection research has focused
on searching for relevant features. A well-known example is Relief which weighs
each feature according to its ability to discriminate instances under different
targets based on distance-based criteria function. However, Relief is
ineffective at removing redundant features as two predictive but highly
correlated features are likely both to be highly weighted. Relief-F extends
Relief, enabling this method to work with noisy and incomplete data sets and to
deal with multiclass problems, but still cannot identify redundant features.
Advantages:
1. Good feature subsets contain
features highly correlated with (predictive of) the class, yet uncorrelated
with each other.
2. The efficiently and
effectively deal with both irrelevant and redundant features, and obtain a good
feature subset.
Implementation
is the stage of the project when the theoretical design is turned out into a
working system. Thus it can be considered to be the most critical stage in
achieving a successful new system and in giving the user, confidence that the
new system will work and be effective.
The implementation stage involves
careful planning, investigation of the existing system and it’s constraints on
implementation, designing of methods to achieve changeover and evaluation of
changeover methods.
Main Modules:-
1.
User Module :
In this module,
Users are having authentication and security to access the detail which is
presented in the ontology system. Before accessing or searching the details
user should have the account in that otherwise they should register first.
2. Distributed
Clustering :
The Distributional clustering has
been used to cluster words into groups based either on their participation in
particular grammatical relations with other words by Pereira et al. or on the
distribution of class labels associated with each word by Baker and McCallum .
As distributional clustering of words are agglomerative in nature, and result
in suboptimal word clusters and high computational cost, proposed a new
information-theoretic divisive algorithm for word clustering and applied it to
text classification. proposed to cluster features using a special metric of
distance, and then makes use of the of the resulting cluster hierarchy to
choose the most relevant attributes. Unfortunately, the cluster evaluation
measure based on distance does not identify a feature subset that allows the
classifiers to improve their original performance accuracy.Furthermore, even
compared with other feature selection methods, the obtained accuracy is lower.
3. Subset Selection Algorithm
The Irrelevant features, along with redundant
features, severely affect the accuracy of the learning machines. Thus, feature
subset selection should be able to identify and remove as much of the
irrelevant and redundant information as possible. Moreover, “good feature
subsets contain features highly correlated with (predictive of) the class, yet
uncorrelated with (not predictive of) each other. Keeping these in mind, we
develop a novel algorithm which can efficiently and effectively deal with both
irrelevant and redundant features, and obtain a good feature subset.
4. Time Complexity :
The major amount of work for
Algorithm 1 involves the computation of SU values for TR relevance and
F-Correlation, which has linear complexity in terms of the number of instances
in a given data set. The first part of the algorithm has a linear time
complexity in terms of the number of features m. Assuming features are selected
as relevant ones in the first part, when k ¼ only one feature is selected.
.
Speed - 1.1 Ghz
RAM - 256
MB(min)
Hard
Disk - 20 GB
Floppy
Drive - 1.44 MB
Key
Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - SVGA
v Operating System :Windows95/98/2000/XP
v Application
Server : Tomcat5.0/6.X
v Front End : HTML, Java, Jsp
v Scripts : JavaScript.
v Server side Script :
Java Server Pages.
v Database : Mysql 5.0
v Database Connectivity :
JDBC.
Tags:
Categories
- AERONAUTICAL (3)
- AEROSPACE (3)
- AGRICULTURE (1)
- ANDROID (5)
- Android project titles (1)
- Animation projects (1)
- Artificial Intelligence (1)
- AUTOMOBILE (1)
- BANK JOBS (1)
- BANK RECRUITMENTS (1)
- BIG DATA PROJECT TITLES (1)
- Bio instrumentation Project titles (2)
- BIO signal Project titles (2)
- BIO-TECHNOLOGY (10)
- BIOINFORMATICS (3)
- BIOMEDICAL (11)
- Biometrics projects (2)
- CAREER (1)
- CAT 2014 Questions (2)
- CHEMICAL (1)
- CIVIL (4)
- Civil projects (1)
- cloud computing (4)
- COMP- PROJ-DOWN (2)
- COMPUTER SCIENCE PROJECT DOWNLOADS (8)
- COMPUTER(CSE) (13)
- CONFERENCE (2)
- Data mining Projects (1)
- Data protection. (1)
- Design projects (1)
- DIGITAL SIGNAL PROCESSING IEEE Project titles (1)
- Dot net projects (2)
- EBOOKS (4)
- ELECTRICAL MINI PROJECTS (8)
- ELECTRICAL PROJECTS DOWNLOADS (6)
- ELECTRONICS MINI PROJECTS (8)
- ELECTRONICS PROJECT DOWNLOADS (8)
- EMG PROJECTS (1)
- employment (1)
- Engineering projects (1)
- Exams (2)
- Facts (2)
- final year projects (1)
- FOOD TECHNOLOGY (1)
- FREE IEEE 2014 project (1)
- Free IEEE Paper (1)
- FREE IEEE PROJECTS (1)
- GATE (3)
- GAte scorecard (1)
- GOVT JOBS (1)
- Green projects (1)
- GSM BASED (3)
- Guest authors (7)
- HIGHWAY (1)
- IEEE 2014 projects (1)
- ieee 2015 projects (4)
- IEEE computer science projects (1)
- IEEE Paper (4)
- IEEE PAPER 2015 (1)
- ieee project titles (1)
- IEEE projects (6)
- IEEE Transactions (2)
- INDUSTRIAL (2)
- INNOVATIVE PROJECTS (16)
- INTERFACING (1)
- IT (2)
- IT LIST (1)
- Java projects (3)
- labview projects (2)
- LATEST TECHNOLOGY (8)
- list of project centers (9)
- Low cost projects (1)
- m.com (1)
- MARINE (1)
- Matlab codes (3)
- MATLAB PROJECT TITLES (7)
- MATLAB PROJECTS (17)
- MBA (4)
- MBA 2015 projects (2)
- MCA (1)
- MECHANICAL (4)
- MECHANICAL PROJECTS DOWNLOAD (2)
- MINI PROJECTS (1)
- modelling projects (1)
- MP3 (1)
- MP3 cutter (1)
- Mp4 (1)
- Networking topics (1)
- ns2 projects (1)
- online jobs (2)
- PETROCHEMICAL (1)
- PHYSIOLOGICAL MODELLING projects (1)
- physiotheraphy Projects (1)
- Power electronics (10)
- power system projects (3)
- PRODUCTION (2)
- project centers (2)
- project downloads (1)
- Prosthesis projects (1)
- RAILWAY RECRUITMENT 2012 (2)
- Recent (17)
- RECENT TECHNOLOGY (5)
- RECENT TECHNOLOGY LIST (1)
- RECRUITMENT (3)
- Rehabilitation projects (1)
- renewable power (1)
- respiration projects (1)
- RESUME FORMAT. (1)
- Ring Tone Cutter (1)
- Robotics projects. Robots in medical (1)
- social network jobs (2)
- Solar projects (1)
- Songs Cutter (1)
- Speech-music separation-Abstract (1)
- structural engineering (1)
- TECHNOLOGY (1)
- technology management (1)
- TELE COMMUNICATION (2)
- Telegram project (1)
- TEXTILE (1)
- TOP ENGINEERING COLLEGES (3)
- Training (1)
- VLSI (1)
Labels
AERONAUTICAL
AEROSPACE
AGRICULTURE
ANDROID
Android project titles
Animation projects
Artificial Intelligence
AUTOMOBILE
BANK JOBS
BANK RECRUITMENTS
BIG DATA PROJECT TITLES
Bio instrumentation Project titles
BIO signal Project titles
BIO-TECHNOLOGY
BIOINFORMATICS
BIOMEDICAL
Biometrics projects
CAREER
CAT 2014 Questions
CHEMICAL
CIVIL
Civil projects
cloud computing
COMP- PROJ-DOWN
COMPUTER SCIENCE PROJECT DOWNLOADS
COMPUTER(CSE)
CONFERENCE
Data mining Projects
Data protection.
Design projects
DIGITAL SIGNAL PROCESSING IEEE Project titles
Dot net projects
EBOOKS
ELECTRICAL MINI PROJECTS
ELECTRICAL PROJECTS DOWNLOADS
ELECTRONICS MINI PROJECTS
ELECTRONICS PROJECT DOWNLOADS
EMG PROJECTS
employment
Engineering projects
Exams
Facts
final year projects
FOOD TECHNOLOGY
FREE IEEE 2014 project
Free IEEE Paper
FREE IEEE PROJECTS
GATE
GAte scorecard
GOVT JOBS
Green projects
GSM BASED
Guest authors
HIGHWAY
IEEE 2014 projects
ieee 2015 projects
IEEE computer science projects
IEEE Paper
IEEE PAPER 2015
ieee project titles
IEEE projects
IEEE Transactions
INDUSTRIAL
INNOVATIVE PROJECTS
INTERFACING
IT
IT LIST
Java projects
labview projects
LATEST TECHNOLOGY
list of project centers
Low cost projects
m.com
MARINE
Matlab codes
MATLAB PROJECT TITLES
MATLAB PROJECTS
MBA
MBA 2015 projects
MCA
MECHANICAL
MECHANICAL PROJECTS DOWNLOAD
MINI PROJECTS
modelling projects
MP3
MP3 cutter
Mp4
Networking topics
ns2 projects
online jobs
PETROCHEMICAL
PHYSIOLOGICAL MODELLING projects
physiotheraphy Projects
Power electronics
power system projects
PRODUCTION
project centers
project downloads
Prosthesis projects
RAILWAY RECRUITMENT 2012
Recent
RECENT TECHNOLOGY
RECENT TECHNOLOGY LIST
RECRUITMENT
Rehabilitation projects
renewable power
respiration projects
RESUME FORMAT.
Ring Tone Cutter
Robotics projects. Robots in medical
social network jobs
Solar projects
Songs Cutter
Speech-music separation-Abstract
structural engineering
TECHNOLOGY
technology management
TELE COMMUNICATION
Telegram project
TEXTILE
TOP ENGINEERING COLLEGES
Training
VLSI
Subscribe Our Newsletter
- COMPUTER SCIENCE ENGINEERING PROJECT TITLES(1500+ TOPICS TO CHOOSE)
- LIST OF MECHANICAL PROJECTS
- ELECTRICAL MINI PROJECTS DOWNLOAD
- INDUSTRIAL ENGINEERING PROJECT TITLES
- LIST OF PROJECT TOPICS FOR MBA || MBA PROJECT TOPICS || MBA PROJECTS || MANAGEMENT PROJECTS||
- CHEMICAL ENGINEERING PROJECT TOPICS
- Electronics Mini Project Topics and Ideas
- MCA PROJECTS/TITLES/TOPICSTO BE CHOOSED
- EEE/ECE PROJECT LIST
- TEXTILE ENGINEERING PROJECT IDEAS/TOPICS /TITLES
Popular Posts
Powered by Blogger.
0 comments:
THANKS FOR UR COMMENT ....