texture

OpenML dataset with id 40499

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/4535764/texture.arff

Upload date: 29 July 2016

Dataset Characteristics

Number of classes: 11
Number of features: 41 (numeric: 40, symbolic: 1 and in total binary: 0 )
Number of instances: 5,500
Number of instances with missing values: 0
Number of missing values: 0

Description

Author: Laboratory of Image Processing and Pattern Recognition (INPG-LTIRF), Grenoble - France. Source: ELENA project Please cite: None

1. 1. 1. 1. Summary

This database was generated by the Laboratory of Image Processing and Pattern Recognition (INPG-LTIRF) in the development of the Esprit project ELENA No. 6891 and the Esprit working group ATHOS No. 6620. ```

(a) Original source:

  P. Brodatz "Textures: A Photographic Album for Artists and Designers",
  Dover Publications,Inc.,New York, 1966.

(b) Creation: Laboratory of Image Processing and Pattern Recognition

  Institut National Polytechnique de Grenoble INPG
  Laboratoire de Traitement d'Image et de Reconnaissance de Formes LTIRF
  Av. Felix Viallet, 46
  F-38031 Grenoble Cedex
  France

(c) Contact: Dr. A. Guerin-Dugue, INPG-LTIRF, guerin@tirf.inpg.fr

```

1. 1. 1. 2. Past Usage:

This database has a private usage at the TIRF laboratory. It has been created in order to study the textures discrimination with high order statistics.

``` A.Guerin-Dugue, C. Aviles-Cruz, "High Order Statistics from Natural Textured Images", In ATHOS workshop on System Identification and High Order Statistics, Sophia-Antipolis, France, September 1993.

Guerin-Dugue, A. and others, Deliverable R3-B4-P - Task B4: Benchmarks, Technical report, Elena-NervesII "Enhanced Learning for Evolutive Neural Architecture", ESPRIT-Basic Research Project Number 6891, June 1995. ```

1. 1. 1. 3. Relevant Information:

The aim is to distinguish between 11 different textures (Grass lawn, Pressed calf leather, Handmade paper, Raffia looped to a high pile, Cotton canvas, ...), each pattern (pixel) being characterised by 40 attributes built by the estimation of fourth order modified moments in four orientations: 0, 45, 90 and 135 degrees.

A statistical method based on the extraction of fourth order moments for the characterization of natural micro-textures was developed called "fourth order modified moments" (mm4) [Guerin93], this method measures the deviation from first-order Gauss-Markov process, for each texture. The features were estimated in four directions to take into account the possible orientations of the textures (0, 45, 90 and 135 degrees). Only correlation between the current pixel, the first neighbourhood and the second neighbourhood are taken into account. This small neighbourhood is adapted to the fine grain property of the textures.

The data set contains 11 classes of 500 instances and each class refers to a type of texture in the Brodatz album.

The database dimension is 40 plus one for the class label. The 40 attributes were build respectively by the estimation of the following fourth order modified moments in four orientations: 0, 45, 90 and 135 degrees: mm4(000), mm4(001), mm4(002), mm4(011), mm4(012), mm4(022), mm4(111), mm4(112), mm4(122) and mm4(222).

!! Patterns are always sorted by class and are presented in the increasing order of their class label in each dataset relative to the texture database (texture.dat, texture_CR.dat, texture_PCA.dat, texture_DFA.dat)

1. 1. 1. 4. Class:

The class label is a code for the following classes: ```

               Class         Class label
 2   Grass lawn                      (D09)  
 3   Pressed calf leather            (D24) 
 4   Handmade paper                  (D57) 
 6   Raffia looped to a high pile:   (D84) 
 7   Cotton canvas                   (D77) 
 8   Pigskin                         (D92) 
 9   Beach sand:                     (D28) 
 10  Beach sand                      (D29) 
 12  Oriental straw cloth            (D53) 
 13  Oriental straw cloth            (D78) 
 14  Oriental grass fiber cloth      (D79)

```

1. 1. 1. 5. Summary Statistics:

Table here below provides for each attribute of the database the dynamic (Min and Max values), the mean value and the standard deviation.

``` Attribute Min Max Mean Standard

                                      deviation

   1   -1.4495    0.7741   -1.0983    0.2034
   2   -1.2004    0.3297   -0.5867    0.2055
   3   -1.3099    0.3441   -0.5838    0.3135
   4   -1.1104    0.5878   -0.4046    0.2302
   5   -1.0534    0.4387   -0.3307    0.2360
   6   -1.0029    0.4515   -0.2422    0.2225
   7   -1.2076    0.5246   -0.6026    0.2003
   8   -1.0799    0.3980   -0.4322    0.2210
   9   -1.0570    0.4369   -0.3317    0.2361
  10   -1.2580    0.3546   -0.5978    0.3268
  11   -1.4495    0.7741   -1.0983    0.2034
  12   -1.0831    0.3715   -0.5929    0.2056
  13   -1.1194    0.6347   -0.4019    0.3368
  14   -1.0182    0.1573   -0.6270    0.1390
  15   -0.9435    0.1642   -0.4482    0.1952
  16   -0.9944    0.0357   -0.5763    0.1587
  17   -1.1722    0.0201   -0.7331    0.1955
  18   -1.0174    0.1155   -0.4919    0.2335
  19   -1.0044    0.0833   -0.4727    0.2257
  20   -1.1800    0.4392   -0.4831    0.3484
  21   -1.4495    0.7741   -1.0983    0.2034
  22   -1.2275    0.5963   -0.7363    0.2220
  23   -1.3412    0.4464   -0.7771    0.3290
  24   -1.1774    0.6882   -0.5770    0.2646
  25   -1.1369    0.4098   -0.5085    0.2538
  26   -1.1099    0.3725   -0.4038    0.2515
  27   -1.2393    0.6120   -0.7279    0.2278
  28   -1.1540    0.4221   -0.5863    0.2446
  29   -1.1323    0.3916   -0.5090    0.2526
  30   -1.4224    0.4718   -0.7708    0.3264
  31   -1.4495    0.7741   -1.0983    0.2034
  32   -1.1789    0.5647   -0.6463    0.1890
  33   -1.1473    0.6755   -0.4919    0.3304
  34   -1.1228    0.3132   -0.6435    0.1441
  35   -1.0145    0.3396   -0.4918    0.1922
  36   -1.0298    0.1560   -0.5934    0.1704
  37   -1.2534    0.0899   -0.7795    0.1641
  38   -1.0966    0.1944   -0.5541    0.2111
  39   -1.0765    0.2019   -0.5230    0.2015
  40   -1.2155    0.4647   -0.5677    0.3091

```

The dynamic of the attributes is in [-1.45 - 0.775]. The database resulting from the centering and reduction by attribute of the Texture database is on the ftp server in the `REAL/texture/texture_CR.dat.Z' file.

1. 1. 1. 6. Confusion matrix.

The following confusion matrix of the k_NN classifier was obtained with a Leave_One_Out error counting method on the texture_CR.dat database. k was set to 1 in order to reach the minimum mean error rate : 1.0 +/- 0.8%.

```

Class    2      3      4      6      7      8      9      10     12     13     14 
2      97.0    1.0    0.4    0.0    0.0    0.0    1.6    0.0    0.0    0.0    0.0   
3       0.2   99.0    0.0    0.0    0.0    0.0    0.4    0.0    0.0    0.0    0.4   
4       1.0    0.0   98.8    0.0    0.0    0.0    0.2    0.0    0.0    0.0    0.0   
6       0.0    0.0    0.0   99.4    0.0    0.0    0.0    0.6    0.0    0.0    0.0   
7       0.0    0.0    0.0    0.0  100.0    0.0    0.0    0.0    0.0    0.0    0.0   
8       0.0    0.0    0.0    0.0    0.0   98.6    0.0    1.4    0.0    0.0    0.0   
9       0.4    0.0    0.2    0.0    0.0    0.2   98.8    0.4    0.0    0.0    0.0   
10      0.0    0.0    0.0    0.0    0.0    1.4    0.0   98.6    0.0    0.0    0.0   
12      0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0  100.0    0.0    0.0   
13      0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   99.8    0.2   
14      0.0    0.4    0.0    0.0    0.0    0.4    0.0    0.0    0.2    0.0   99.0

```

7. Result of the Principal Component Analysis:

The Principal Components Analysis is a very classical method in pattern recognition [Duda73]. PCA reduces the sample dimension in a linear way for the best representation in lower dimensions keeping the maximum of inertia. The best axe for the representation is however not necessary the best axe for the discrimination. After PCA, features are selected according to the percentage of initial inertia which is covered by the different axes and the number of features is determined according to the percentage of initial inertia to keep for the classification process.

This selection method has been applied on the texture_CR database. When quasi-linear correlations exists between some initial features, these redundant dimensions are removed by PCA and this preprocessing is then recommended. In this case, before a PCA, the determinant of the data covariance matrix is near zero; this database is thus badly conditioned for all process which use this information (the quadratic classifier for example).

The following file is available for the texture database: texture_PCA.dat.Z, it is the projection of the texture_CR database on its principal components (sorted in a decreasing order of the related inertia percentage; so, if you desire to work on the database projected on its x first principal components you only have to keep the x first attributes of the texture_PCA.dat database and the class labels (last attribute)).

Table here below provides the inertia percentages associated to the eigenvalues corresponding to the principal component axis sorted in the decreasing order of the associated inertia percentage. 99.85 percent of the total database inertia will remain if the 20 first principal components are kept.

```

      Eigen Value   Inertia      Cumulated
        value      percentage      inertia

 1   30.267500000 75.6687000000  75.6687000000 
 2   3.6512500000  9.1281300000  84.7969000000 
 3   2.2937000000  5.7342400000  90.5311000000 
 4   1.7039700000  4.2599300000  94.7910000000 
 5   0.6716540000  1.6791300000  96.4702000000 
 6   0.5015290000  1.2538200000  97.7240000000 
 7   0.1922830000  0.4807070000  98.2047000000 
 8   0.1561070000  0.3902670000  98.5950000000 
 9   0.1099570000  0.2748920000  98.8699000000 
 10  0.0890891000  0.2227230000  99.0926000000 
 11  0.0656016000  0.1640040000  99.2566000000 
 12  0.0489988000  0.1224970000  99.3791000000 
 13  0.0433819000  0.1084550000  99.4875000000 
 14  0.0345022000  0.0862554000  99.5738000000 
 15  0.0299203000  0.0748007000  99.6486000000 
 16  0.0248857000  0.0622141000  99.7108000000 
 17  0.0167901000  0.0419752000  99.7528000000 
 18  0.0161633000  0.0404083000  99.7932000000 
 19  0.0128898000  0.0322246000  99.8254000000 
 20  0.0113884000  0.0284710000  99.8539000000 
 21  0.0078481400  0.0196204000  99.8735000000 
 22  0.0071527800  0.0178820000  99.8914000000 
 23  0.0067661400  0.0169153000  99.9083000000 
 24  0.0053149500  0.0132874000  99.9216000000 
 25  0.0051102600  0.0127757000  99.9344000000 
 26  0.0047116600  0.0117792000  99.9461000000 
 27  0.0036193700  0.0090484300  99.9552000000 
 28  0.0033222000  0.0083054900  99.9635000000 
 29  0.0030722400  0.0076806100  99.9712000000 
 30  0.0026373300  0.0065933300  99.9778000000 
 31  0.0020996800  0.0052492000  99.9830000000 
 32  0.0019376500  0.0048441200  99.9879000000 
 33  0.0015642300  0.0039105700  99.9918000000 
 34  0.0009679080  0.0024197700  99.9942000000 
 35  0.0009578000  0.0023945000  99.9966000000 
 36  0.0007379780  0.0018449400  99.9984000000 
 37  0.0006280250  0.0015700600  100.000000000
 38  0.0000000040  0.0000000099  100.000000000 
 39  0.0000000001  0.0000000003  100.000000000 
 40  0.0000000008  0.0000000019  100.000000000

```

This matrix can be found in the texture_EV.dat file.

The Discriminant Factorial Analysis (DFA) can be applied to a learning database where each learning sample belongs to a particular class [Duda73]. The number of discriminant features selected by DFA is fixed in function of the number of classes (c) and of the number of input dimensions (d); this number is equal to the minimum between d and c-1. In the usual case where d is greater than c, the output dimension is fixed equal to the number of classes minus one and the discriminant axes are selected in order to maximize the between-variance and to minimize the within-variance of the classes.

The discrimination power (ratio of the projected between-variance over the projected within-variance) is not the same for each discriminant axis: this ratio decreases for each axis. So for a problem with many classes, this preprocessing will not be always efficient as the last output features will not be so discriminant. This analysis uses the information of the inverse of the global covariance matrix, so the covariance matrix must be well conditioned (for example, a preliminary PCA must be applied to remove the linearly correlated dimensions).

The Discriminant Factorial Analysis (DFA) has been applied on the 18 first principal components of the texture_PCA database (thus by keeping only the 18 first attributes of these databases before to apply the DFA preprocessing) in order to build the texture_DFA.dat.Z database file, having 10 dimensions (the texture database having 11 classes). In the case of the texture database, experiments shown that a DFA preprocessing is very useful and most of the time improved the classifiers performances.

[Duda73] Duda, R.O. and Hart, P.E.,Pattern Classification and Scene Analysis, John Wiley & Sons, 1973.

This page was built for dataset: texture