streams/library/text_languagedetect
2024-03-10 08:10:07 +11:00
..
data move php-id3 and text_languagedetect from composer (/vendor) back to /library because their autoloaders are no longer supported in composer and no longer work in some distros. The packages themselves are unmaintained and it is unlikely they will be fixed upstream. This leaves HTMLPurifier to be fixed - which is in progress and should happen in the next several days. 2020-06-15 18:01:20 -07:00
docs fix tests and update languagedetect library (discovered in tests) 2024-03-10 08:10:07 +11:00
tests fix tests and update languagedetect library (discovered in tests) 2024-03-10 08:10:07 +11:00
Text fix tests and update languagedetect library (discovered in tests) 2024-03-10 08:10:07 +11:00
composer.json fix tests and update languagedetect library (discovered in tests) 2024-03-10 08:10:07 +11:00
package.xml move php-id3 and text_languagedetect from composer (/vendor) back to /library because their autoloaders are no longer supported in composer and no longer work in some distros. The packages themselves are unmaintained and it is unlikely they will be fixed upstream. This leaves HTMLPurifier to be fixed - which is in progress and should happen in the next several days. 2020-06-15 18:01:20 -07:00
phpcs.xml fix tests and update languagedetect library (discovered in tests) 2024-03-10 08:10:07 +11:00
README.rst fix tests and update languagedetect library (discovered in tests) 2024-03-10 08:10:07 +11:00

*******************
Text_LanguageDetect
*******************
PHP library to identify human languages from text samples.
Returns confidence scores for each.


Installation
============

PEAR
----
::

    $ pear install Text_LanguageDetect

Composer
--------
::

    $ composer require pear/text_languagedetect


Usage
=====
Also see the examples in the ``docs/`` directory and
the `official documentation`__.

__ http://pear.php.net/package/Text_LanguageDetect/docs

Language detection
------------------
Simple language detection::

    <?php
    require_once 'Text/LanguageDetect.php';

    $text = 'Was wäre, wenn ich Ihnen das jetzt sagen würde?';

    $ld = new Text_LanguageDetect();
    $language = $ld->detectSimple($text);

    echo $language;
    //output: german

Show the three most probable languages with their confidence score::

    <?php
    require_once 'Text/LanguageDetect.php';

    $text = 'Was wäre, wenn ich Ihnen das jetzt sagen würde?';

    $ld = new Text_LanguageDetect();
    //3 most probable languages
    $results = $ld->detect($text, 3);

    foreach ($results as $language => $confidence) {
        echo $language . ': ' . number_format($confidence, 2) . "\n";
    }

    //output:
    //german: 0.35
    //dutch: 0.25
    //swedish: 0.20
    ?>


Language code
-------------
Instead of returning the full language name, ISO 639-2 two and three
letter codes can be returned::

    <?php
    require_once 'Text/LanguageDetect.php';
    $ld = new Text_LanguageDetect();

    //will output the ISO 639-1 two-letter language code
    // "de"
    $ld->setNameMode(2);
    echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";

    //will output the ISO 639-2 three-letter language code
    // "deu"
    $ld->setNameMode(3);
    echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";
    ?>


Supported languages
===================
- albanian
- arabic
- azeri
- bengali
- bulgarian
- cebuano
- croatian
- czech
- danish
- dutch
- english
- estonian
- farsi
- finnish
- french
- german
- hausa
- hawaiian
- hindi
- hungarian
- icelandic
- indonesian
- italian
- kazakh
- kyrgyz
- latin
- latvian
- lithuanian
- macedonian
- mongolian
- nepali
- norwegian
- pashto
- pidgin
- polish
- portuguese
- romanian
- russian
- serbian
- slovak
- slovene
- somali
- spanish
- swahili
- swedish
- tagalog
- turkish
- ukrainian
- urdu
- uzbek
- vietnamese
- welsh


Links
=====
Homepage
  http://pear.php.net/package/Text_LanguageDetect
Bug tracker
  http://pear.php.net/bugs/search.php?cmd=display&package_name[]=Text_LanguageDetect
Documentation
  http://pear.php.net/package/Text_LanguageDetect/docs
Unit test status
  https://travis-ci.org/pear/Text_LanguageDetect

  .. image:: https://travis-ci.org/pear/Text_LanguageDetect.svg?branch=master
     :target: https://travis-ci.org/pear/Text_LanguageDetect


Notes
=====
Where are the data from?

 I don't recall where I got the original data set.
 It's just the frequencies of 3-letter combinations in each supported language.
 It could be generated from a few random wikipedia pages from each language.