2008

Central Authentication System(CAS) Management Shanmugapriya A (04C82), Sudarson P (04C87),Vinoth kumar S (04C101)

The CAS is a single sign-on protocol designed to allow untrusted web applications to authenticate users against a trusted central server. The Lightweight Directory Access Protocol , or LDAP, is an application protocol for querying and modifying directory services running over TCP/IP. In our college various services like shell access, mail, Intranet has been provided by a single sign on setup using LDAP directroy services. We are restucturing the schema, changing the LDAP backend and providing web-application to manage the CAS setup in our college. In backend rather than BDB. In the module of Admin fronted our Deliverable is a secure front-end applcation to manage the various services that are authenticated from the LDAP like, create a new user in the LDAP directroy, seraching for a particular user in particular Organisation unit, Removing the user from the database.

Web Hosting Automation Geethapriya R (04C24), Ravisankar Balaji M (04C17).

In the era of web 2.0 the open source software is growing in a faster pace and is conquering the maket of innovation. In this scenario, with a spirit of Free/Open source Software and as a part of contribution to open source community we developed a open source product Web-Hosting Automation. This software is web based software to automate the web-based hosting in our college. A more comon form of automation is the Software Automation where a software program performs the scheduled task without human innovation. Software automation can increase the performance of the system avoiding human errors. Web Hosting Automation is a web based software to automate the web-hosting in TCE. This web-hosting management application automates the creation of shell account, database account for web-master and the configuration of virtual-host. GNU Linux project is the major contributor to open source community and Apache is the most visible open source project that dominates the public internet web server market. We feel proud to develop our project by exploiting these two giant contributors to oopen source community. This software is blended using open source technologies like python, cheetah, Jquery, mod_python and postgres database.

Domain Name Server Management Application Alagammai N(04C04), Ashok S(04C13).

DNSadmin is a web based interface to manage the BIND DNS server of the college Intranet. The tool uses python and cheetah for application development and user interface design. The power and flexibility of modpython, a python interpreter built into apache enhances the performance. The major functionalities associated with DNSadmin are zone addition,deletion,records addition and record deletion.This tool eases the cumbersome job of an administrator by giving an easy-to-use user interface. The tool also automatically generates the reverse zone files without any user intrevention. The DNS records are stored in DB backend hence providing a level of redundacy to the configuration details. Thus the configuration files can be generated in case of an accidental delete or say a denial of service attack. The development of the tool was done using Deian GNU/Linux, python programming language, modpython, cheetah for templating and postgresql as the database backend.

Firewall and Proxy Server Management Application Balachandran S(O4C16), Venkatapathy M(04C97)

For any network of considerable size, the firewall and proxy servers are a major part. And TCE is no exception to this. We have separate firewall and proxy servers for protecting our network and optimising web delivery respectively. Maintenance and administration of either of these is really compex, without any specialised tools. Since both these involve a lot of syntatic rules and require a complete mastery of the underlying network architecture, it becomes all the more complex. The fw-proxy-Admin, a fully Free Open Source Software, provides a solution for this. It is a combined interface that provides a web-based interface to configure and maintain both the firewal and proxy server. The main advantage of this tool is not just the interface, but the built-in scripts that it comes with. These scripts are sufficient for seting up standard firewall and proxy servers. The fw-proxy-admin is developed using python, an object oriented programming language, cheetah, a python based templating engine, and postgresSQl database for the backend. The apache module mod\_python is used at the server side for serving the clients with python pages.

Hybrid Alert Classification for False Positive Reduction in Intrusion Detection, George Mathew (07CS04)

Intrusion Detection Systems (IDSs) aim at detecting intrusions that is, actions that attempt to compromise the confidentiality, integrity and availability of computer resources. This work deals with the problem of false positives in intrusion detection which propose the concept of training an alert classifier using a human analyst’s feedback and show how to build an efficient alert classifier using machine learning techniques.The advantage of using a hybrid neuro-fuzzy approach to reduce the number of false alarms is demonstrated. Also the effectiveness of this approach is compared with that of RIPPER algorithm. The two approaches were evaluated using DARPA 1999 network traffic dataset with Snort IDS, used for generating alerts. It is observed that RIPPER produces optimal rules when there is sufficient background knowledge. Neuro-fuzzy approach reduces false alarm more efficiently than RIPPER with lesser background knowledge.

Task Scheduling-Processor Involvement In Communication, R. Shanmugapriya(07CS21)

Task scheduling is an important aspect for efficient parallel computer utilization. A parallel program can be represented by a node and edge weighted directed acyclic graph(DAG), in which the node weighs represent task processing times and the edge weights respresent data dependencies as well as the communication times between tasks. Scheduling of applications modeled by DAG is a key issue in cluster computing environment. This problem is known to be NP- Hard.List scheduling is proposed which provide accurate and efficient schedules for real systems. The concept of edge scheduling, used in contention aware scheduling, is extended to the scheduling of the edges on the processors in order to reflect the processors involvement. To implement List scheduling, it was divided into two phase: a)Direct scheduling b)List scheduling. Direct scheduling means scheduling all leaving edges on the source processor, directly ager the origin node. The scheduling of the edges on the links and the destination processors can take place when the destination node is scheduled. List scheduling, is how to decide which task is to be scheduled next. This is achieved by assigning priorities to the nodes or the edges of the input DAG, and thus the task with the highest priroty will be scheduled next.

DAG Scheduling on cluster of workstations using hybrid particle swarm optimization, M. Suguna(06CS24)

A well known strategy behind efficient execution of a huge application is to partition it into multiple independent tasks and schedule such tasks over a set of available processors. Such a partitioned application can be represented by a Directed Acyclic Graph(DAG), Scheduling of applications modeled by DAG is a key issue in cluster computing environment. The task scheduling problem has been shown to be NP-Complete in general as well as in several restricted cases. This paper presents a List Scheduling algorithm using Particle Swarm Optimization (PSO) based on the concept of Tabu Search (TS). This approach combines the excellence of both PSO and TS. This is different from the existing methods since the procedure adaptively incorporates information about Tabu lists into PSO algorithm. The proposed algorithm outperforms other algorithms in the aspects of performance and scalability. The experimental results manifest that the proposed hybrid method is effective and efficient in finding near optimal schedule length.

Optimal Text Categorization,Karthikeyan.P(06CS08)

Text Categorization is the process of assigning documents to a set of previously fixed categories.In the project work, it has been found out that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create high accuracy text classifier. By assuming that documents are created by a parametric generative model, Expectation Maximization(EM) finds local maximum aposteriori models and classifiers from all the documents.

Initially preprocessing the documents is done. In the preprocessing phase, normalization and keyword extractions are done. In the normalization phase, removal of irrelevant words from the documents by using stop-word removal and word stemming are performed. We have been identified the feature vector from most frequently occuring words from the focument. These keywords are given to naive bayes classifier and EM.Naive Bayes classifier is built in the standard supervised fashion from the limited amount of labelled training data. This technique predicts the class label of the documents. It was found that performance significantly improve by using selective sampling to select high quality initializations. This work substantially improves the classification accuracy, especially when labelled data are limited. In the work, it has been found out that the EM technique substantially improves classification accuracy by 70 percentage.

Unsupervised Learnign of Natural Language Using Grammatical Inference, N.Sheena(07CS22)

The study of language aims to characterize and explain the way humans acquire, produce and understand language based on the main assumption that rules (grammar) are used to structure all three of these processes. But for decades researchers have been trying to devise formal and detailed grammars which would capture the observed regularities of language. All such efforts fall short of their goal, because grammars tend to mutate, depending on how grammaticality is defined and on the purpose to which language is put.

This project addresses the problem of extracting patterns from a simple context free language corpora to infer the underlying rules that govern their production. The unsupervised algorithm recursively distills from the corpora, hierarchically structured patterns. It relies on a statistical method for pattern extraction and on structured generalisation, two processes that have been implicated in language acquisition. It has been evaluated in the main scheme of grammar induction using artificial context-free grammars.

Optimization Of Frequent Itemset Mining Using FP-Trees,M.Gowri(04C27),K.Jayalakshmi(04C30),V.Karthikeyan(04C38)

Frequentitemset mining plays a major role in the mining of association rules and various other datamining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-Tree, for storing compressed information about frequent itemsets. In this project, we present a more efficient method for frequent itemset mining involving the FP array technique that greatly reduces the need to traverse FP-Trees, thus obtaining significantly improved performance for FP-Tree based algorithms. Our project focuses on mining all frequent itemsets from a given dataset as well as mining maximal and closed frequent itemsets. Our algorithms use the FP-Tree datastructure in combination with the FP-array technique efficiently incoporate various optimisation techniques like MFI and CFI trees. We also present experimental results comparing our methods with existing techniques and this comparisin proves that our algorithm is faster than the existing algorithm.