Philippe Lenca (IMT Atlantique, philippe.lenca@imt-atlantique.fr)
Thanh-Nghi Do (CTU, dtnghi@cit.ctu.edu.vn)
Decision Tree Toolkit (DT2)
DT2 aims at learning random decision trees for mining imbalanced datasets. The training algorithms include different split functions (off-centered entropy, asymmetric entropy, Shannon entropy, generalized entropy), local labeling rules, random forests (bagging, random forest, arc-x4).
References
P. Lenca, S. Lallich and B. Vaillant. Construction of an off-centered entropy for the supervised learning of imbalanced classes: Some first results. Communications in Statistics - Theory and Methods, Taylor & Francis, vol. 39(3), pp. 493-507, 2010.
T-N. Do, P. Lenca and S. Lallich. Enhancing network intrusion classification through the Kolmogorov-Smirnov splitting criterion. in Journal of Science and technology, Special Issue on Theories and Application of Computer Science, Vol.48(4): 50-61, 2010.
P. Lenca, S. Lallich, T-N. Do and N-K. Pham. A comparison of different off-centered entropies to deal with class imbalance for decision trees. in proc. of PAKDD'2008, The Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer-Verlag, 2008, pp. 634-643.
N-K. Pham, T-N. Do, P. Lenca and S. Lallich. Using local node information in decision trees: coupling a local decision rule with an off-centered. in proc. of DMIN'08, Intl Conf. on Data Mining, CSREA Press, 2008, pp. 117-123.
S. Lallich, P. Lenca, B. Vaillant. Construction of an off-centered entropy for supervised learning. in proc. of The XIIth Intl Symposium on Applied Stochastic Models and Data Analysis, Chania, Crete, Greece, 8p., May 29 - June 1 2007.
S. Lallich, P. Lenca, B. Vaillant. Construction d'une entropie décentrée pour l'apprentissage supervisé. Atelier Qualité des Données et des Connaissances (associé à la conférence Extraction et Gestion des Connaissances 2007), Namur, Belgique, pp. 45-54, 23 janvier 2007.
Random Forest of Oblique Decision Trees (RF-ODT)
Random oblique tree (RF-ODT) aims at classifying very-high-dimensional datasets.
The main idea is to use linear SVMs for performing multivariate node splitting during tree construction, producing individual classifiers that are stronger than in classical forests. RF-ODT deals with tasks of classification (multi-class, imbalanced datasets), regression, feature extraction.
References
T-N. Do, P. Lenca, S. Lallich. Classifying Many-Class High Dimensional Fingerprint Datasets Using Random Forest of Oblique Decision Trees. in Vietnam Journal of Computer Science, Vol.2(1): 3-12, Springer, 2015.
T-N. Do, S. Moga and P. Lenca. Random forest of oblique decision trees for ERP semi-automatic configuration. The 6th Asian Conference on Intelligent Information and Database Systems, vol. 551 of SCI (Studies in Computational Intelligence), Springer, pp. 25-34, Bangkok, Thailand, April 7-9, 2014.
T-N. Do, S. Lallich, N-K. Pham and P. Lenca. Classifying very-high-dimensional data with random forests of oblique decision trees. in Advances in Knowledge Discovery and Management, Studies in Computational Intelligence Vol.292: 39-55, Springer-Verlag, 2010.
T-N. Do, S. Lallich, N-K. Pham and P. Lenca. Un nouvel algorithme de forêts aléatoires d'arbres obliques particulièrement adapté à la classification de données en grandes dimensions. in proc. of EGC2009, RNTI-E-15, Revue des Nouvelles Technologies de l'Information - Série Extraction et Gestion des Connaissances, Cépaduès Editions, 2009, pp. 79-90.