Language, Speech and Multimedia Technologies Observatory
10/28/2014 - 01:11

deepmindBen Woods of The Next Web reports, “Google has joined forces with the University of Oxford in the UK in order to better study the potential of artificial intelligence (AI) in the areas of image recognition and natural language processing. The hope is that by joining forces with an esteemed academic institution, the research will progress more rapidly than going it alone for its DeepMind project. In total, Google has hired seven individuals (who also happen to be world experts in deep learning for natural language understanding), three of which will remain as professors holding joint appointments at Oxford University.” continued…
10/10/2014 - 00:11

Watson adimen artifizialeko sistema bat da, besteren artean hizkuntzaren prozesamendua (ingeles arrunta) eta ikaskuntza automatikoa erabiltzen dituena. Ikaskuntza automatikoa gidaririk gabeko autoetan, hizketa ezagutzeko softwarean eta spama bereizteko irazkietan erabiltzen da. Hizkuntzaren prozesamenduak funtzionamendua hobetzeko erabiltzen du ikaskuntza automatikoa. 2011n, adimen artifiziala noraino iristen ari den erakusteko asmoz, Watsonek Estatu Batuetako Jeopardy! lehiaketan hartu zuen parte eta irabazle izan zen. Partehartzaileei jakintza orokorreko gaiei buruzko pistak ematen zitzaizkien, formulatu behar zuten galdera bati zegozkionak. Adibidez, partehartzaileari esaten zioten: Milorad Cavic-ek ia zapuztu zion bere 2008ko Olinpiada, harekiko segundo ehunen bategatik galduz. Asmatu behar zen galdera hauxe zen:"Nor da Michael Phelps?"

Jeopardy!n ematen dituzten pistek hitz-jokoak, jargoia, zentzu bikoitzeko esaldiak eta ordenagailu arruntek nekez ulertzen dituzten hizkuntza-ñabardurak izaten dituzte.

Lehiaketa horretako emaitza ikusita zenbait zientzialari eszeptiko agertzen zen, esanez Watson funtsean testu-bilaketako algoritmo bat datu-base batera konektatua dela, Googleren bilaketa-motorraren antzera, eta ez duela adimenik.

Horrek ez ditu asko kezkatu IBMko teknikariak eta beste kontu batez arduratu dira: ea nola bihur daitekeen edozertarako balio duen tresna ahaltsu bat produktu komertzial gisa interesa duena.

IBM 2012az geroztik ari da Watson osasun arloko organizazioei eta finantza arloko konpainiei saltzen, baina 2014ko urtarrilean aurrerapauso bat eman zuen Watson Group sortu zuenean, Watson komertzializatzeko egitura berezitu moduan. Watson Group batez ere osasun, finantza, txikizkako merkataritza eta herri administrazioko arloei zuzendua dago. Horiek guztiek datu kopuru erraldoiak erabiltzen dituzte baina datu horietatik %80 egituratu gabeak izaten dira. Horrexek egiten du Watson interesgarria, hizkuntzaren prozesamenduaren bitartez ezkutuko patroiak eta korrelazioak aurkitzeko gai delako.

Berez Watson hori baino konplexuagoa da. Teknologia askoren konbinazioa baita. Dena den, aplikazio praktikoei begiratzen badiegu, Watsonek lagundu diezaieke, esate baterako, bezeroen arretarako zerbitzuei bezeroen galderei erantzuten edo sukaldariei errezetak iradokitzen.
10/03/2014 - 00:11

When I was an undergrad, probably my favorite CS class I took was algorithms. I liked it (a) because my background was math so it was the closest match to what I knew and (b) because even though it was "theory," a lot of the stuff we learned was really relevant. Over time, it seemed like the area had distilled worthwhile algorithms from interesting-in-theory-but-you'll-never-actually use algorithms.

In fact, I think this is a large part of why most undergraduate CS degrees today require a course in algorithms. You have these very nice, clearly defined statements, and very elegant solutions to those statements that in most cases (at the UG level) are known to be optimal.

Fast forward N years.

My claim today---and I'm speaking really as an NLP person, which is how I self-identify---is that machine learning is the new core. Everything that algorithms was to computer science 15 years ago, machine learning is today. That's not to say it won't move in another 10 years, but that's how I see it.


For the most part, algorithms (especially as taught at th UG level) is the study of one thing: Given a perfect input, how do I most efficiently compute the optimal output.

The problem is the "perfect input" part.

All of my experience in the past N years has told me that you never have a perfect input, and that it's far far far more important to be able to synthesize information from a large number of sources and reason about it than it is to find the exact-right-solution to some problem that exists only to Plato.

Even within machine learning you see this effect. Lots of numerical analysis people have worked on good algorithms for getting that last little bit of precision out of optimization algorithms. Does it matter? Nope! Model specification, parameter tuning, features, and data matter infinitely more than that last little bit of precision. (In some fields, for instance, scientific computing, that last little bit of precision may matter. I don't know enough to know one way or the other.)

Let's play a thought game. Say you're an UG CS major. You graduate and get a job in CS (not grad school). Which are you more likely to use: (1) a weighted cost flow algorithm or (2) a perceptron/decision tree?

Clearly I think the answer is (2). And I loved flow algorithms when I was an undergrad and have actually spent since 2006 trying to figure out how I can use them for a problem I want to solve. No dice.

I would actually go further. Suppose you have a problem whose inputs are ill-specified (as they always are when dealing with data), and whose structure actually does look like a flow problem. There are two CS students trying to solve this problem. Akiko knows about machine learning but not flows; Bob knows about flows but not machine learning. Bob tries to massage his data by hand into the input to an optimal flow algorithm, and then solves it exactly. Akiko uses machine learning to get good edge weights and hacks together some greedy algorithm for flows, not even knowing it's called a flow. Who's solution works better? I would put almost any amount of money on Akiko.

Full disclosure: those who know about my research in structured prediction will recognize this as a recurring theme in my own research agenda: fancy algorithms always lose to better models.

There's another big difference between N years ago and today: almost every algorithm you could possibly care about (or learn about as an UG) is implemented in a library for any reasonable programming language. That's not to say that it's unimportant to know how things work in order to use them, but I would argue it's much less important in a field like algorithms whose knowledge is comparatively stable, versus a field like machine learning where things are still changing and there is no "one right answer" to the "machine learning problem." In a field that's still a bit of an art rather than a science, understanding how things work under the hood feels a lot more important. Quicksort, heaps, minimum spanning trees, ... these are all here to stay.
Okay, so now I've convinced myself that we should yank algorithms out as an UG requirement and replace it with machine learning.

But wait, I can hear my colleagues yelling, taking algorithms isn't about learning algorithms: it's about learning how to think! But that's also what I think is great about machine learning: the distance between theory and algorithms is actually usually quite small (I try to get this across at various points in CiML, to varying degrees of success). If the only point of an algorithms class (I've heard exactly this argument made about automata theory, for instance) is to teach students how to think, I think we could do much better.

Okay, so I've thrown down the gauntlet. Someone should come smack me with theirs :P!

Edit after some comments:

I think I probably wrote badly and as a result my main point got lost. I'll try to restate it here briefly and then I'll edit the main post.

Main point: I feel like for 15 years, algorithms has been at the heart of most of what computer science does. I feel like that coveted position has now changed to machine learning or, more generically, statistical reasoning. I feel this way because figuring out how to map a real world problem into something an "algorithm" can consume, especially when that needs statistical modeling of various inputs, is (IMO) a more important and harder problem than details about flow algorithms. 


let me give a concrete example that may actually be a real world example, but i don't know (though see this paper). that of path finding for taxis or cars. the world is a graph and given directed edge costs we can run dijkstra or whatever to find LEAST-TIME (shortest) paths. this is basically google maps/etc.

of course, we never know the true time to travel some segment. we might know it now, but by the time the driver gets to some road (5 or 10 minutes from now) the conditions may have changed. and of course we have historical data on traffic from which we can predict what the condition of the road will be like in 10 minutes.

so here, "foo" is a function that takes the time of data, historical traffic data, weather and whathaveyou, and maps it to edge costs.

"bar" is dijkstra's algorithm or whatever shortest path algorithm you like.

my claim is that if you really want to solve this problem, it's much more important to understand how to create foo than how to create bar. in particular, if i gave you a greedy or near greedy approach to bar, combined with a really good foo, i bet this would be significantly better than an optimal bar and a crappy foo.
06/24/2014 - 00:11


MARTTI (My Accessible Real-Time Trusted Interpreter) deritzon sistemak osasun arloko profesionalen eta pazienteen arteko komunikazioa errazten du batzuek eta besteek hizkuntza bera egiten ez dutenean.  MARTTI AEBko Methodist Health System-eko erietxeetan hasi dira jartzen. Dagoeneko horrelako 35 sistema jarri dituzte ospitaleetan.

MARTTIren bitartez pazienteak eta osasun arloko profesionalak, bideoa erabiliz, Language Access Network sareko 210 hizkuntzatako interpreteekin jartzen dira harremanetan. Interprete horiek osasun arloan bereziki trebatuak dira.
06/13/2014 - 00:11
Iturria:  Overview

Overviewkode irekiko tresna bat da, kazetariei gaur egun eskuragarri dauden hainbat informazio-iturritatik jasotako informazioa antolatzen eta ikuskatzen laguntzeko. Dokumentu multzo horiek izan ditzakete ehunka mila orrialde, baina ezinezkoa izaten da aurkitu ezagutzen ez dena. Arazo hori konpontzeko,  Overview-k hizkuntzaren prozesamenduko algoritmoak erabiltzen ditu dokumentuak gaiaren arabera automatikoki ordenatuz eta dokumentu multzo bateko eduki guztiak arakatzen laguntzeko ikuspegi bat sortzen du.

Batzuetan jakiten dugu zehazki zer aurkitu nahi dugun. Orduan tresna bikaina gertatzen da Overview. Beste batzuetan, ordea, aurkitu nahi duguna ez dugu hain zehazki mugatua. Kasu horietarako, Overview-ren gai-zuhaitzak dokumentu multzoaren egitura erakusten digu eta hortik abiatuz geure kategoria-etiketak erabil ditzakegu karpetak markatzeko, eta ondoren esporta ditzakegu etiketa horiek, ikuspegiak sortzeko.

Dokumentu guztiak irakurri behar izanez gero, berriz, oso lagungarria gertatzen da Overview. Optimizaturik dago dokumentu multzoak arakatzeko. Batez ere denbora asko aurrezteko balio du antzeko dokumentuak elkarrekin jarrita daudenean, bikoiztuak irakurtzen ez delako denbora galdu beharrik. 
Ingelesez, frantsesez, alemanez, suedieraz eta espainieraz idatzitako testu lauak, hau da, bereziki egituratu gabeko testuak, erabiltzeko gai da aplikazio hau.

Ez dago instalatu beharrik; nahikoa da dohainikako web aplikazioa erabiltzea, eta zerbitzari batean instala daiteke, segurtasun handiagoa nahi izanez gero.

Overview The Associated Press-en egitasmo bat da.
06/11/2014 - 00:11

The EAMT Best Thesis Award for 2013 has been awarded to Gennadi Lembersky, University of Haifa, Israe for his thesis titled The Effect of Translationese on Statistical Machine Translation.
06/07/2014 - 00:11

Iturria: Multilizer Translation Blog 

Itzulpen automatikoaren kalitatea itzultzen diren hizkuntzek baldintzatzen dute batez ere. Ez dira kalitate bereko emaitzak lortzen ingelesetik alemanera edo ingelesetik suomierara itzultzen denean, esate baterako. Horren arrazoia garbia da: itzulpen automatikoa garatzea negozioa da eta hedadura handien arteko itzulpena lantzea errentagarriagoa da beste hizkuntza batzuen artekoa garatzea baino. Hedadura handiko hizkuntzen erabiltzaileak asko dira eta egindako itzulpenen kopurua ere handiagoa da. Erabiltzaile asko izatea itzulpenak doan eskaintzen dituztenei ere interesatzen zaie, publizitatetik ateratzen dituztelako irabaziak. Egindako itzulpenen kopuruak, berriz, metodo estatistikoetan oinarrituriko itzulpen-tekniken eraginkortasunean du eragina. 

Hedadura handiko hizkuntzen arteko itzulpena hedadura murritzagokoen artekoa baino kalitate hobekoa izaten den bezala, kideko hizkuntzen artekoa ere kalitate hobekoa izaten da oso desberdinak diren hizkuntzen artekoa baino, erregeletan eta hiztegian antzekoagoak direlako.

Itzulpen-datu gutxi dituzten hizkuntzak sorgin-gurpil batean sarturik daude: kalitatea txarra denean, erabiltzaile gutxi izaten dira eta erabiltzaile gutxi izanez gero, datuen urritasunak kalitaterik eza eragiten du.
04/03/2014 - 00:11

Press Release - Immediate - Paris, France, April 3, 2014
Opening of the ISLRN Portal
ELRA, LDC, and AFNLP/Oriental-COCOSDA announce the opening of the ISLRN Portal @
Further to the establishment of the International Standard Language Resource Number (ISLRN) as a unique and universal identification schema for Language Resources on November 18, 2013, ELRA, LDC and AFNLP/Oriental-COCOSDA now announce the opening of the ISLRN Portal ( As a service free of charge (…)

International Standard Language Resource Number (ISLRN)
03/08/2014 - 01:11

Here’s a short-ish introduction to the Lucene search engine which shows you how to use the current API to develop search over a collection of texts. Most of this post is excerpted from Text Processing in Java, Chapter 7, Text Search with Lucene. Lucene Overview Apache Lucene is a search library written in Java. It’s […]
09/10/2013 - 00:11

The International Association for Machine Translation presented the IAMT 2013 Lifetime Achievement Award to John Hutchins....

Syndicate content