달력

092017  이전 다음

  •  
  •  
  •  
  •  
  •  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30

[원문] http://psoug.org/reference/undocumented.html


CREATE TABLE t (
col1 VARCHAR2(5),
col2 VARCHAR2(20));

INSERT INTO t VALUES (111, 'This');
INSERT INTO t VALUES (111, 'is');
INSERT INTO t VALUES (111, 'a');
INSERT INTO t VALUES (111, 'test');
INSERT INTO t VALUES (222, 'This is not');

SELECT * FROM t;

col concat format a40

SELECT col1, wmsys.wm_concat(col2) CONCAT
FROM t
GROUP BY col1;

SELECT col1, TRANSLATE(wmsys.wm_concat(col2), 'A,', 'A ') CONCAT
FROM t

GROUP BY col1; 


신고
Posted by Tornado tornado
 

/*********************************************************************

 

SQL SERVER 2005 이상의 64BIT 머신에서 링크드 서버연결시

아래의 메시지가 출력되면서 오류가 발생함.

 

*********************************************************************/

 

연결된 서버 "APP"의 OLE DB 공급자 "SQLNCLI10"이(가) 메시지 "지정되지 않은 오류입니다."을(를) 반환했습니다.

연결된 서버 "APP"의 OLE DB 공급자 "SQLNCLI10"이(가) 메시지 "이 작업을 완료하는 데 필요한 저장 프로시저가 서버에 없습니다. 시스템 관리자에게 문의하십시오."을(를) 반환했습니다.

Msg 7311, Level 16, State 2, Line 1

연결된 서버 "APP"에 대한 OLE DB 공급자 "SQLNCLI10"의 스키마 행 집합 "DBSCHEMA_TABLES_INFO"을(를) 가져올 수 없습니다. 공급자가 인터페이스를 지원하지만 이 인터페이스를 사용하면 오류 코드가 반환됩니다.

SQL Server 구문 분석 및 컴파일 시간:

   CPU 시간 = 0ms, 경과 시간 = 0ms.

 

 

 

/*********************************************************************

 

아래의 프로시저를 오류가 발생한 링크드 서버의 MASTER 데이터베이스에

생성함으로 문제 해결됨.

 

*********************************************************************/

create procedure sp_tables_info_rowset_64

      @table_name sysname,

      @table_schema     sysname = null,  

      @table_type nvarchar(255) = null

as

 

declare @Result int set @Result = 0

 

exec @Result = sp_tables_info_rowset @table_name, @table_schema, @table_type

 

신고
Posted by Tornado tornado


Save (Not Permitted) Dialog Box

SQL Server 2008 R2

The Save (Not Permitted) dialog box warns you that saving changes is not permitted because the changes you have made require the listed tables to be dropped and re-created.

The following actions might require a table to be re-created:

  • Adding a new column to the middle of the table

  • Dropping a column

  • Changing column nullability

  • Changing the order of the columns

  • Changing the data type of a column

To change this option, on the Tools menu, click Options, expand Designers, and then click Table and Database Designers. Select or clear the Prevent saving changes that require the table to be re-created check box.

신고
Posted by Tornado tornado
Sql Server Management Studio 를 설치할 때 Express 버전을 깔면 Profiler 가 안깔림. 

또는 Visual Studio 설치를 먼저 하면 또 Profiler 가 안깔릴 수도 있음. 왜 Express 버전을 넣어놔서 T.T

프로파일러를 사용할 다른 방법이 없나.. 찾아보니, 역시 있음.

http://sites.google.com/site/sqlprofiler/ 

 별도 어플로 깔리는데, 사용하는 방법은 동일하다.

 
신고
Posted by Tornado tornado

[출처] http://jekyung.com/tc/entry/MSSQL-%C5%D7%C0%CC%BA%ED%BA%B0-size-%B9%D7-row-%B1%B8%C7%CF%B1%E2



1. 개요
 가. MSSQL DB 의 테이블별 size 및 row 를 보여주는 쿼리문입니다.

 나. 테스트환경

2. MSSQL 테이블별 size 및 row 구하기
 가. 테이블 size 구하는 쿼리문
       SELECT
             table_name = convert(varchar(30), min(o.name)), table_size = ltrim(str(sum
             (reserved) * 8192 / 1024.,15,0) + 'KB')
       FROM sysindexes i
             INNER JOIN sysobjects o on (o.id = i.id)
       WHERE i.indid in (0, 1, 255)
             and o.xtype = 'U'
       GROUP BY i.id
       order by table_size desc

       (큰사이즈의 테이블로 인한 오버플로우 오류 발생시 수식 부분을 수정하셔야
         합니다.)

 나. 테이블별 row 구하는 쿼리문
       SELECT
             o.name
             , i.rows
       FROM sysindexes i
              INNER JOIN sysobjects o ON i.id = o.id
       WHERE i.indid < 2
              AND o.xtype = 'U'
       ORDER BY i.rows desc
신고
Posted by Tornado tornado

[링크] http://www.devholic.net/1000495

MS-SQL Server 2005 에서 외부 조인 연산자("*=" 또는 "=*")를 사용할때 다음과 같은 오류가 발생한다.

 

쿼리에서 ANSI 형식이 아닌 외부 조인 연산자("*=" 또는 "=*")를 사용합니다.

이 쿼리를 수정하지 않고 실행하려면 저장 프로시저 sp_dbcmptlevel을 사용하여 현재 데이터베이스의 호환성 수준을 80 이하로 설정하십시오.

가장 좋은 방법은 ANSI 외부 조인 연산자(LEFT OUTER JOIN, RIGHT OUTER JOIN)를 사용하여 쿼리를 다시 작성하는 것입니다.

SQL Server의 다음 버전에서는 ANSI 형식이 아닌 조인 연산자는 역호환성 모드에서도 지원되지 않습니다.

 

메세지 내용처럼 호환성 수준을 80으로 설정하려면

view source
print?
1.-- 현재 호환성 수준 확인하기
2.sp_dbcmptlevel 'DB명'
3.   
4.-- 호환성을 80으로 설정하기
5.sp_dbcmptlevel 'DB명', 80

 

허나 기왕이면 LEFT OUTER JOIN, RIGHT OUTER JOIN을 사용하길 권장한다.

신고
Posted by Tornado tornado
  • 오라클 서버

    오라클 서버를 설치하기 위해서는 리눅스 서버를 새로 구축한다.

     

    • Linux Server 신규 구축

      리눅스 서버는 위의 VirtualBox 만드는 과정을 통해 순수하게 리눅스만 설치한다.

      파티션은 아래와 같이 나눈다.

       

파티션

용량

Swap

2GB

/

나머지 용량 전부(여유있게 20GB 잡을것)

 

 

  • 방화벽 설정

    SELinux 는 Disable 시키고 작업한다

    콘솔에서 아래의 명령을 실행 후 그림과 같이 한다.

# setup

 

Firewall configuration 으로 이동 후 엔터(탭 키를 누른다)

 

 

Security Level 을 Disable 시킨 후 탭을 이동하여 OK 한다.

다시 Quit 을 선택하여 종료한다.

 

  • 오라클 다운로드

     

    다운로드 링크 :

    http://www.oracle.com/technology/software/products/database/index.html

     

    위 주소에서 Oracle Database 10g Release 2 (10.2.0.1.0) for Linux x86 를 다운로드

    한다(회원 인증 필요함)

    다운로드 후 /oracle10g 디렉토리에 압축을 해제한다.

    ZIP 파일일 경우 아래와 같이 압축을 해제한다.

# unzip 10201_database_linux32.zip

# mkdir /oracle10g

# mv database /oracle10g

 

  • 오라클 설치를 위한 환경 설정 및 설치
    • /etc/redhat-release 파일 수정하기.

      Vi 에디터로 /etc/redhat-release 파일의 내용을 아래와 같이 고친다.

       

# vi /etc/redhat-release

 

기존 내용

 

 

 

 

바뀐 내용

- Cent OS 는 레드햇 리눅스의 변종(?) 이므로 레드햇으로 수정해야 오라클이 설치된다.

 

  • /etc/security/limits.conf 파일 수정

    위 파일을 vi 에디터로 열고 제일 하단에 아래의 내용을 추가한다.

     

* soft nproc 2047

* hard nproc 16384

* soft nofile 1024

* hard nofile 65536

 

  • /etc/sysctl.conf 파일 수정

    Vi 에디터로 파일을 열고 아래와 같이 수정한다.

    제일 아래의 ########## 아래의 내용은 신규 추가한다.

     

# Kernel sysctl configuration file for Red Hat Linux

#

# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and

# sysctl.conf(5) for more details.

 

# Controls IP packet forwarding

net.ipv4.ip_forward = 0

 

# Controls source route verification

net.ipv4.conf.default.rp_filter = 1

 

# Do not accept source routing

net.ipv4.conf.default.accept_source_route = 0

 

# Controls the System Request debugging functionality of the kernel

kernel.sysrq = 0

 

# Controls whether core dumps will append the PID to the core filename

# Useful for debugging multi-threaded applications

kernel.core_uses_pid = 1

 

# Controls the use of TCP syncookies

net.ipv4.tcp_syncookies = 1

 

# Controls the maximum size of a message, in bytes

kernel.msgmnb = 65536

 

# Controls the default maxmimum size of a mesage queue

kernel.msgmax = 65536

 

# Controls the maximum shared segment size, in bytes

kernel.shmmax = 4294967295

 

# Controls the maximum number of shared memory segments, in pages

kernel.shmall = 268435456

 

####################

kernel.shmmni = 4096

kernel.sem = 250 32000 100 128

fs.file-max = 65536

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default=262144

net.core.rmem_max=262144

net.core.wmem_default=262144

net.core.wmem_max=262144

 

수정 후 아래의 명령을 실행하여 오류가 있는지 체크(설정 내용이 올바로 출력되면 OK)

/sbin/sysctl -p

 

 

  • /etc/pam.d/login 파일 수정

    Vi 에디터로 오픈하여 아래의 내용을 하단에 추가.

session        required    /lib/security/pam_limits.so

 

  • 설치에 필요한 필수 파일을 yum 으로 업데이트 한다.

    아래의 명령으로 설치에 필요한 필수 파일을 업데이트 한다. 네트워크가 반드시 연결 되어 있어야 한다.

yum -y install make setarch glibc* libaio cpp compat* gcc libXp openmotif elfutils-libelf* unixODBC* sysstat binutils

 

  • 오라클 설치 계정 추가

# groupadd oinstall    // oinstall 그룹 추가

# groupadd dba     // dba 그룹 추가

# useradd -g oinstall -G dba oracle    // oracle 계정을 oinstall 그룹에 -m -g 옵션으로 추가, 동시에 dba 그룹에 -G 옵션으로 추가

# passwd oracle // 비밀번호 지정(111111)

 

  • 오라클 설치 디렉토리 생성

# mkdir -p /u01/app/oracle/product/10.2.0/db_1

# chown -R oracle.oinstall /u01

# mkdir -p /u02/oradata

# chown -R oracle.oinstall /u02

 

  • /hosts 파일 수정

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1 localhost

IP주소 ccu_oracle

 

  • /etc/sysconfig/network 리눅스 호스트 이름 변경

    vi 에디터로 오픈하여 HOSTNAME 부분을 변경한다

NETWORKING=yes

NETWORKING_IPV6=no

HOSTNAME=ccu_oracle

 

  • 오라클 사용자 계정 환경변수 설정

# su – oracle

$ cd ~

$ vi .bash_profile

 

아래의 내용을 파일 하단에 추가한다.

TMP=/tmp; export TMP

TMPDIR=$TMP; export TMPDIR

ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE

ORACLE_HOME=$ORACLE_BASE/product/10.2.0/db_1; export ORACLE_HOME

ORACLE_SID=orcl; export ORACLE_SID

ORACLE_TERM=xterm; export ORACLE_TERM

PATH=/usr/sbin:$PATH; export PATH

PATH=$ORACLE_HOME/bin:$PATH; export PATH

export DISPLAY=:0.0

export LANG=C

 

수정이 다 되었다면 저장 후 아래의 명령을 실행하여 적용시킨다.

$ source .bash_profile

        

        위 설정이 다 완료 되었다면 시스템을 한번 리부팅 해준다.        

# shutdown –r now

 

  • 오라클 설치

    주의 : 반드시 oracle 계정으로 설치해야 한다.

     

    오라클 압축 해제 디렉토리(/oracle10g/database) 로 이동하여 runInstaller 를 실행한다.

     

# xhost + ß 반드시 루트 권한에서 실행해야 함

# su – oracle

$ cd /oracle10g/database

$ ./runInstaller

 

오라클 인스톨러 화면

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

오라클 설치중 화면

 

 

 

 

데이터베이스 생성 화면

 

데이터베이스 생성 완료 화면.

 

 

 

 

 

설치가 거의 완료 되면 아래와 같이 ROOT 권한으로 실행할 파일을 알리는 창이 뜬다. 새로운 터미널을 열어서 root 권한으로 해당 스크립트를 실행한다.

# cd /u01/app/oracle/oraInventory/

# ./orainstRoot.sh

# cd /u01/app/oracle/product/10.2.0/db_1/

# ./root.sh ß 이 명령 실행 후 local bin 디렉토리를 묻는데 기본값으로 하면 된다.

 

 

  • 오라클 구동 및 상태보기.

오라클의 구동 및 설정은 oracle 유저로 해야 한다.

 

  • 오라클 상태 보기

# su – oracle

$ lsnrctl status

 

오라클 SID 는 orcl 로 설정하였으며, orcl 리스너가 ready 상태이여야 한다.

 

  • 오라클 리스너 시작/종료

     

리스너 시작

$ lsnrctl start

리스너 종료

$ lsnrctl stop

리스너 상태보기

$ lsnrctl status

 

 

 

 

 

 

 

 

 

 

 

 

  • 오라클 데이터베이스 구동

# su – oracle

$ sqlplus /nolog

SQL> connect sys/password as sysdba

SQL> startup

 

 

  • 오라클 emctl 시작(오라클 엔터프라이즈 매니저)

$ emctl start dbconsole

 


웹브라우저 에서 http://localhost:1158/em/console/aboutApplication 를 오픈하여 엔터프라이즈 매니저로 접근 되는지 확인.

 

아이디 패스워드는 sys/password 이다.

 

 

 

신고
Posted by Tornado tornado

QUARTS 로 했었는데... 이것도 편하군.

###################################################################
--- 테이블 생성

CREATE TABLE TB_TEST
(
 ID VARCHAR(10),
 NAME VARCHAR(10)
);


--- 패키지 헤더 생성

CREATE OR REPLACE PACKAGE PKG_TORNADO

AS

PROCEDURE UP_TEST_INSERT (v_id IN TB_TEST.ID%type, v_name IN TB_TEST.NAME%type);

END;
/


--- 패키지 바디 생성

CREATE OR REPLACE PACKAGE BODY PKG_TORNADO
AS
PROCEDURE UP_TEST_INSERT
(
 v_id IN TB_TEST.ID%type,
 v_name IN TB_TEST.NAME%type
)
IS

BEGIN

 INSERT INTO TB_TEST (ID, NAME)
 VALUES (v_id, v_name );

 commit;
END;

END;
/


--- ORACLE JOB 생성


VAR JOBNO NUMBER;
BEGIN
DBMS_JOB.SUBMIT 
(:JOBNO
,'PKG_TORNADO.UP_TEST_INSERT(''HEESUNG'', ''TORNADO'');'
,SYSDATE
,'SYSDATE + 1');
COMMIT WORK;
END;
/

 

-- JOBNO 확인

PRINT JOBNO

     JOBNO
----------
        21

-- USER_JOBS 상세 확인

SELECT JOB, LAST_DATE, NEXT_DATE, BROKEN, WHAT FROM USER_JOBS;

       JOB LAST_DATE NEXT_DATE B
---------- --------- --------- -
        21           09-APR-09 N


-- JOB 시작

EXEC DBMS_JOB.RUN(21);

-- JOB 수행 결과 보기.

SELECT * FROM TB_TEST

 


 

신고
Posted by Tornado tornado

[출처] http://docstore.mik.ua/orelly/oracle/bipack/ch13_04.htm




13.4 DBMS_JOB Examples

The DBMS_JOB package has all kinds of useful applications waiting to be discovered. DBAs can schedule jobs that look for problem conditions in the database or track and record resource utilization. Developers can schedule large batch operations at off hours without requiring operator intervention.

13.4.1 Tracking Space in Tablespaces

I decided to implement a very simple tracking system that can be used to track the growth of data in tablespaces. Such a system could be used for capacity planning or to trigger an alert of impending space problems.

The system consists of a table called db_space, a view called tbs_space, and a procedure called space_logger. Here is the source code for the system:

/* Filename on companion disk: job6.sql */*
CREATE TABLE db_space
   (tablespace_name   VARCHAR(30)  NOT NULL
   ,calc_date      DATE    NOT NULL
   ,total_bytes    NUMBER  NOT NULL
   ,free_bytes     NUMBER  NOT NULL);

CREATE OR REPLACE VIEW tbs_space
   (tablespace_name
   ,total_bytes
   ,free_bytes)
AS
   SELECT  DF.tbsname         tablespace_name
          ,DF.totbytes        total_bytes
          ,FS.freebytes       free_bytes
     FROM 
          (SELECT  tablespace_name     tbsname
                  ,SUM(bytes)          totbytes
             FROM  dba_data_files
            GROUP BY tablespace_name
          ) DF
         ,(SELECT  tablespace_name     tbsname
                  ,SUM(bytes)          freebytes
             FROM  dba_free_space
            GROUP BY tablespace_name
          ) FS
    WHERE
          DF.tbsname = FS.tbsname;


CREATE OR REPLACE PROCEDURE space_logger
AS
   /*
   || records total size and free space for all
   || tablespaces in table db_space
   ||
   || Author:  John Beresniewicz, Savant Corp
   ||
   || 01/26/98: created
   ||
   || Compilation requirements:
   ||
   || SELECT on TBS_SPACE view
   || INSERT on DB_SPACE table
   */
   CURSOR tbs_space_cur
   IS
   SELECT tablespace_name, total_bytes, free_bytes
     FROM tbs_space;
     
BEGIN
   FOR tbs_space_rec IN tbs_space_cur
   LOOP
      INSERT INTO db_space VALUES
         (tbs_space_rec.tablespace_name
         ,SYSDATE
         ,tbs_space_rec.total_bytes
         ,tbs_space_rec.free_bytes);
   END LOOP;
   COMMIT;
END space_logger;

To set the system in motion, the space_logger procedure can be submitted to the job queue for regular execution as follows:

DECLARE
   jobno   NUMBER;
BEGIN
   DBMS_JOB.SUBMIT
      (job  => jobno
      ,what => 'begin space_logger; end;'
      ,next_date => SYSDATE
      ,interval  => 'SYSDATE+1/24');
   COMMIT;
END;
/

Each time space_logger executes, it records total space, free space, tablespace name, and a timestamp for each tablespace in the database. Adjusting the interval parameter for the job adjusts the frequency of data collection.

13.4.2 Fixing Broken Jobs Automatically

Charles Dye recommended the next example, probably based on his experiences with replication. When jobs have relatively complex execution requirements in terms of the database objects on which they depend, they can easily become broken by incurring multiple execution failures. Perhaps the DBA has modified some database links or recreated tables or views, and the job's definition has been temporarily compromised. Well, it's a pain to manually reset the broken flag for these "not really broken" jobs, so why not have a job that regularly tries to unbreak jobs? Sounds good to me; here is a procedure called job_fixer to do just that:

/* Filename on companion disk: job5.sql */*
CREATE OR REPLACE PROCEDURE job_fixer
AS
   /*
   || calls DBMS_JOB.BROKEN to try and set
   || any broken jobs to unbroken
   */
   
   /* cursor selects user's broken jobs */
   CURSOR broken_jobs_cur
   IS
   SELECT job
     FROM user_jobs
    WHERE broken = 'Y';
    
BEGIN
   FOR job_rec IN broken_jobs_cur
   LOOP
      DBMS_JOB.BROKEN(job_rec.job,FALSE);
   END LOOP;
END job_fixer;

The job_fixer procedure works only on a user's own jobs, so each user submitting jobs to the queue will need a separate job_fixer in the queue.

13.4.3 Self-Modifying and Self-Aware Jobs

The ability to reference the job, next_date, and broken parameters in the job definition allows the procedure executed to alter its own job characteristics. Thus, a job could remove itself from the job queue, or assign its own next execution date based on some criteria decided at runtime by the procedure itself. I've written a small skeleton procedure that demonstrates this capability. It is called smart_job, and makes use of all three of the referenceable parameters when submitted as a job.

When submitted to the job queue, smart_job uses the job definition parameters to modify itself in the following ways:

  • Reschedules itself to parm1_IN minutes after finishing if parm2_IN = "RESTART"

  • Sets next_date NULL causing automatic removal from queue if parm2_IN != "RESTART"

  • Flags itself as broken if any exceptions are encountered

  • Uses the job number to raise an exception

Pay close attention to how the smart_job procedure modifies itself. It uses the fact that the next_date and broken parameters support both IN and OUT modes when referenced by the job definition. Thus, when the broken_out parameter of smart_job has the broken parameter assigned to it in the call to DBMS_JOB.SUBMIT, the value set for broken_out by the procedure gets set for the job by the job queue when the job completes. In this way, smart_job changes its job characteristics without calling any DBMS_JOB procedures.

Here is the source code for smart_job:

/* Filename on companion disk: job4.sql */*
PROCEDURE smart_job
   (parm1_IN IN INTEGER
   ,parm2_IN IN VARCHAR2
   ,next_date_OUT IN OUT DATE
   ,broken_OUT IN OUT BOOLEAN
   ,job_IN IN INTEGER := -1)
IS
   /* declare an exception for testing */
   JOB_LESSTHAN_100  EXCEPTION;

BEGIN
   /* 
   || Do the procedure main line functions
   */
   null;

   /* 
   || use job_IN to branch to exception handler
   || for testing self-breaking
   */
   IF job_IN < 100
   THEN 
      RAISE JOB_LESSTHAN_100;
   END IF;

   /*
   || After main processing is finished, job decides
   || if it should re-execute and determines its own
   || next execution date by adding parm1_IN minutes to 
   || the current time
   */
   IF parm2_IN = 'RESTART'
   THEN
      next_date_OUT := SYSDATE + parm1_IN/1440;
   ELSE
      /* 
      || NULL next_date will cause automatic removal of
      || job from queue
      */
      next_date_OUT := NULL;
   END IF;

EXCEPTION
   /* 
   || job "breaks" itself if unexpected error occurs
   */
   WHEN OTHERS
   THEN
      broken_OUT := TRUE;

END smart_job;

The following test script exercises smart_job:

/* Filename on companion disk: job4.sql */*
var jobno NUMBER
BEGIN
   /*
   || Test the ability to modify next_date.
   */
   DBMS_JOB.SUBMIT
      (:jobno
      ,'smart_job(180,''RESTART'',next_date,broken,job);'
      ,SYSDATE + 1/1440
      ,'SYSDATE + 1');

   COMMIT WORK;
END;
/
print jobno

BEGIN
   /*
   || Test the ability to autoremove
   */
   DBMS_JOB.SUBMIT
      (:jobno
      ,'smart_job(180,''NO_RESTART'',next_date,broken,job);'
      ,SYSDATE + 1/1440
      ,'SYSDATE + 1');

   COMMIT WORK;
END;
/
print jobno

BEGIN
   /*
   || Test the ability to break itself.
   */
   DBMS_JOB.ISUBMIT
      (99
      ,'smart_job(180,''RESTART'',next_date,broken,job);'
      ,SYSDATE + 1/1440
      ,'SYSDATE + 1');

   COMMIT WORK;
END;
/

After executing the test script in SQL*Plus, the following jobs are in the queue:

SQL> SELECT job,last_date,next_date,broken FROM user_jobs;


JOB       LAST_DATE           NEXT_DATE           B
--------- ------------------- ------------------- -
      307                     1997:11:25:11:50:39 N
      308                     1997:11:25:11:50:39 N
       99                     1997:11:25:11:50:40 N

A few minutes later, the job queue looks like this:

SQL> SELECT job,last_date,next_date,broken FROM user_jobs;

JOB       LAST_DATE           NEXT_DATE           B
--------- ------------------- ------------------- -
      307 1997:11:25:11:50:42 1997:11:25:14:50:42 N
       99 1997:11:25:11:50:42 1997:11:26:11:50:42 Y

The tests worked! Job 308 ran once and was removed from the queue for having a NULL next_date. Job 307 ran and rescheduled itself three hours later, which is different from the interval specified in the call to DBMS_JOB.SUBMIT. Finally, job 99 set itself to broken status because its job number was less than 100.

신고
Posted by Tornado tornado

(펌)http://cafe202.daum.net/_c21_/bbs_search_read?grpid=zchT&fldid=EchX&contentval=00005zzzzzzzzzzzzzzzzzzzzzzzzz&nenc=8SNCt.UzXo8cFfsmTecuug00&dataid=5&fenc=5TjRNwaVVNk0


여기서 설명할 것은 패키지하는 방법과 사용법이다.
일반적으로 그냥 프로시저만을 사용하는 경우가 많으나(나도 그렇지만..), 패키지하여 사용하는 것이
여러 부분에서 우월하므로 왠만하면 패키지화하는 습관을 들이는 것이 좋겠다. 뒷부분에 설명하겠다.


함수의 생성과 사용
프로시저는 뭐고 함수는 또 뭔가?
함수는 리턴값이 있고, 프로시저는 없다. 사실 프로시저도 리턴값을 패러미터로 받을 순 있다.
C언어에서의 call by reference정도로 생각하면 되겠다.

현재시간을 알아보게 싶게 2002-9-19과 같이 리턴하는 함수를 만들어 보자.

SQL> create or replace function print_date return varchar is
  2   cur_date varchar(15);
  3  begin
  4   select to_char(sysdate, 'YYYY-MM-DD') into cur_date from dual;
  5   return cur_date;
  6  end print_date;
  7  /

함수가 생성되었습니다.

SQL> select print_date() from dual;

PRINT_DATE()
-------------------------------------
2002-09-19


함수는 위와 같이 사용하면 되겠다. 별다른 설명은 필요없으리라 본다.
그럼 프로시저를 보자.


프로시저의 생성과 사용
test라는 테이블에 데이터를 insert하는 프로시저를 생성하자.
프로시저는 insert, update, delete등에 사용하면 적절하겠다.

SQL> create table test
  2  (id varchar(10), name varchar(10));

테이블이 생성되었습니다.

SQL> create or replace procedure insert_test
  2  (
  3   v_id    IN TEST.ID%type,  -- ID
  4   v_name  IN TEST.NAME%type  -- 이름
  5  )
  6  is
  7  begin
  8   insert into test(id, name) values(v_id, v_name);
  9   commit;
10  end;
11  /

프로시저가 생성되었습니다.

SQL> exec insert_test('maddog', '강명규');

PL/SQL 처리가 정상적으로 완료되었습니다.

SQL> select * from test;

ID         NAME
---------- ----------
maddog     강명규

SQL> rollback;

롤백이 완료되었습니다.

SQL> select * from test;

선택된 레코드가 없습니다.

SQL>

데이터입력값의 유효성체크와 이에 따른 commit,rollback등 에러처리 추가해 주면 더욱 좋을 것이다.
전체 테이블컬럼을 참조한다면 위의 type대신 rowtype을 사용하는 것도 고려해볼만 하겠다.
예제라 간단하게 했는데, 부족함을 느낀다면 .. 음.. 스스로 하도록 하자.


패키지의 사용
위에서 만든 함수와 프로시저를 하나의 패키지에 담도록 하자.
사실 관련있는 놈들끼리 패키지화 해야 하는데.. 어디까지나 예제이므로 그냥 해 보겠다.

SQL> create or replace package pkg_dbakorea
  2  as
  3  procedure insert_test(v_id IN TEST.ID%type, v_name IN TEST.NAME%type);
  4  function print_date return varchar;
  5  end;
  6  /

패키지가 생성되었습니다.

SQL> create or replace package body pkg_dbakorea
  2  as
  3   procedure insert_test
  4   (                                               
  5    v_id    IN TEST.ID%type,  -- ID                
  6    v_name  IN TEST.NAME%type  -- 이름             
  7   )                                               
  8   is                                              
  9   begin                                           
10    insert into test(id, name) values(v_id, v_name);
11    commit;                                        
12   end;                                            
13  
14   function print_date return varchar is
15    cur_date varchar(15);                                        
16   begin                                                         
17    select to_char(sysdate, 'YYYY-MM-DD') into cur_date from dual;
18    return cur_date;                                             
19   end print_date;                                               
20 
21  end;
22  /

패키지 본체가 생성되었습니다.

그럼, 생성된 패키지내의 프로시저와 함수를 어떻게 사용하는지 보자.

SQL> exec pkg_dbakorea.insert_test('dbakorea','강명규');

PL/SQL 처리가 정상적으로 완료되었습니다.

SQL> select * from test;

ID         NAME
---------- ----------
maddog     강명규
dbakorea   강명규

SQL> select pkg_dbakorea.print_date() from dual;

PKG_DBAKOREA.PRINT_DATE()
---------------------------------------------------
2002-09-26

SQL>




오라클 매거진 영문판 2002 1월/2월호를 참고했습니다.

프로시저에 비해 패키지가 왜 우월한가 의문을 가지고 따질 사람이 있을 수 있으므로 함 따져보자.

모듈화
애플리케이션 디자인 용이함
Information Hiding
추가된 기능성
더 좋은 성능


프로시저A는 프로시저B를 호출하는 코드를 가진다고 하자.
그런데 프로시저B의 코드가 변경되면 어떻게 되는가?
프로시저A는 프로시저B의 변경에 영향을 받으므로 재컴파일되어야 한다. 귀찮겠다.
그럼, 패키지를 사용하면 어떻길래? 따지냐.. --; 이주일씨의 명복을 빕니다.

프로시저A를 포함한 패키지PKG_A와 프로시저B를 가진 패키지PKG_B가 있고
PKG_A는 PKG_B의 스펙에 대한 의존성을 가진다고 하자.
이 경우에서, PKG_B의 스펙이나 인터페이스에 대한 변경이 아니라면 PKG_B의 BODY는 PKG_A에 상관없이 변경될 수 있다.

SQL> create or replace procedure B
  2  as
  3  begin
  4   null;
  5  end;
  6  /

프로시저가 생성되었습니다.

SQL> create or replace procedure A
  2  as
  3  begin
  4   B;
  5  end;
  6  /

프로시저가 생성되었습니다.

SQL>


위에서 보듯, 프로시저A는 프로시저B를 호출하고 있다. 즉, 의존성을 가지고 있겠다.
당연히, 프로시저B를 먼저 생성한 다음 프로시저A를 생성해야 한다. 반대로 해볼까?


SQL> drop procedure B;

프로시저가 삭제되었습니다.

SQL> drop procedure A;

프로시저가 삭제되었습니다.

SQL> create or replace procedure A
  2  as
  3  begin
  4   B;
  5  end;
  6  /

경고: 프로시저 생성시 컴파일 오류가 발생했습니다.

SQL> drop procedure a;

프로시저가 삭제되었습니다.

SQL>


위에서 보듯이 안된다.
그럼.. 무엇을 말하려고? 음.. 패키지는 그렇지 않다는 것이다.


SQL> create or replace package pkg_a
  2  as
  3   procedure a;
  4  end;
  5  /

패키지가 생성되었습니다.

SQL> create or replace package pkg_b
  2  as
  3   procedure b;
  4  end;
  5  /

패키지가 생성되었습니다.

SQL>


위에서 패키지에 대한 스펙을 생성했다.
스펙의 생성순서는 상관이 없고 서로 의존성도 가지지 않는다.
그럼 패키지 body를 생성해 보자.

SQL> create or replace package body pkg_a
  2  as
  3   procedure a
  4   is
  5   begin
  6    pkg_b.b;
  7   end;
  8  end;
  9  /

패키지 본체가 생성되었습니다.

SQL> create or replace package body pkg_b
  2  as
  3   procedure b
  4   is
  5   begin
  6    null;
  7   end;
  8  end;
  9  /

패키지 본체가 생성되었습니다.


설명 필요없겠다. 위에서 보듯 생성순서는 상관이 없다.
패키지A에 있는 프로시저A는 패키지B의 프로시저B를 호출하고 있지만,
이 시점에서 패키지B의 프로시저B는 존재하지 않는다. 하지만, 상관없다는 것을 알 수 있다.
다시 말하면, 패키지의 스펙에 의존하지만, 패키지의 body에는 의존하지 않음을 알 수 있다.


SQL> col object_type format a15
SQL> col object_name format a10
SQL> col status format a10
SQL> select object_type, object_name, status
  2  from user_objects
  3  where status = 'INVALID'
  4  order by object_type, object_name;

선택된 레코드가 없습니다.

SQL>


그럼 실제 프로시저와 패키지가 변경해서 위에서 말한 내용이 사실인지 확인해 보자.


SQL> create or replace procedure B
  2  as
  3  begin
  4   null;
  5  end;
  6  /

프로시저가 생성되었습니다.

SQL> create or replace procedure A
  2  as
  3  begin
  4   B;
  5  end;
  6  /

프로시저가 생성되었습니다.

SQL> create or replace procedure b
  2  as
  3  begin
  4   null;
  5  end;
  6  /

프로시저가 생성되었습니다.

SQL> select object_type, object_name, status
  2  from user_objects
  3  where status = 'INVALID'
  4  order by object_type, object_name;

OBJECT_TYPE     OBJECT_NAM STATUS
--------------- ---------- ----------
PROCEDURE       A          INVALID

SQL> create or replace package body pkg_b
  2  as
  3   procedure b
  4   is
  5   begin
  6    null;
  7   end;
  8  end;
  9  /

패키지 본체가 생성되었습니다.

SQL> select object_type, object_name, status
  2  from user_objects
  3  where status = 'INVALID'
  4  order by object_type, object_name;

OBJECT_TYPE     OBJECT_NAM STATUS
--------------- ---------- ----------
PROCEDURE       A          INVALID

SQL>

위에서 보듯이 프로시저는 변경에 대해 민감하고(?), 패키지는 둔하다.
고로 패키지가 더 flexible하다 할 수 있겠다.
신고
Posted by Tornado tornado
출처 : http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html


Get It Done With MySQL 5&6, Chapter 20. Copyright © Peter Brawley and Arthur Fuller 2008. All rights reserved.

TOC    Previous    Next

Working with Graphs in MySQL

Graphs and SQL   Edge list   Edge-adjacency list model of a tree
   Nested set model of a tree   Edge-list model of a network   Parts explosions

Most non-trivial data is hierarchical. Customers have orders, which have line items, which refer to products, which have prices. Population samples have subjects, who take tests, which give results, which have sub-results and norms. Web sites have pages, which have links, which collect hits, which distribute across dates and times. With such data, we know the depth of the hierarchy before we sit down to write a query. The fixed depth of the hierarchy logically limits the number of JOINs needed in a query.

But if our data describes a family tree, or a browsing history, or a bill of materials, then hierarchical depth depends on the data. We no longer know how many JOINs our query will require. We need a different data model. 

That model is the graph (Fig 1), which is a set of nodes (vertices) and the edges (lines or arcs) that connect them. This chapter is about how to model and query graphs in a MySQL database.

Graph theory is a branch of topology. It is the study of geometric relations which aren't changed by stretching and compression—rubber sheet geometry, some call it. Graph theory is ideal for modelling hierarchies—like family trees, browsing histories, search trees and bills of materials—whose shape and size we can't know in advance.

Let the set of nodes in Fig 1 be N, the set of edges be L, and the graph be G. Then G is the tuple or ordered pair {N,L}:

    N = {A,B,C,D,E,F}
    L = {AC,CD,CF,BE}
    G = {N,L}

If the edges are directed, the graph is a digraph or directed graph. A mixed graph has both directed and undirected edges.

Examples of graphs are organisational charts; itineraries; route maps; parts explosions; massively multiplayer games; language rules; chat histories; network and link analysis in a wide variety of fields, for example search engines, forensics, epidemiology and telecommunications; data mining; models of chemical structure hierarchies; and biochemical processes.

Graph characteristics and models

Nodes and edges : Two nodes are adjacent if there is an edge between them. Two edges are adjacent if they connect to a common node. In a complete graph, all nodes are adjacent to all other nodes.

In a digraph or directed graph, the number of edges entering a node is its indegree; the number leaving is its outdegree. A node of indegree zero is a root node, a node of outdegree zero is a leaf node.

In a weighted graph, used for example to solve the travelling salesman problem, edges have a weight attribute. A digraph with weighted edges is a network.

Paths and cycles: A connected sequence of edges is a path, its length the number of edges traversed. Two nodes are connected if there is a path between them. If there is a path connecting every pair of nodes, the graph is a connected graph.

A path in which no node repeats is a simple path. A path which returns to its own origin without crossing itself is a cycle or circuit. A graph with multiple paths between at least one pair of nodes is reconvergent. A reconvergent graph may be cyclic or acyclic. A unit length cycle is a loop.

If a graph's edges intersect only at nodes, it is planar. Two paths having no node in common are independent.

Traversing graphs: There are two main approaches, breadth-first and depth-first. Breadth-first traversal visits all a node's siblings before moving on to the next level, and typically uses a queue. Depth-first traversal follows edges down to leaves and back before proceeding to siblings, and typically uses a stack.

Sparsity: A graph where the size of E approaches the maximum N2 is dense. When the multiple is much smaller than N, the graph is considered sparse.

Trees: A tree is a connected graph with no cycles. It is also a graph where the indegree of the root node is 0, and the indegree of every other node is 1. A tree where every node is of outdegree <=2 is a binary tree.  A forest is a graph in which every connected component is a tree.

Euler paths: A path which traverses every edge in a graph exactly once is an Euler path. An Euler path which is a circuit is an Euler circuit.

If and only if every node of a connected graph has even degree, it has an Euler circuit (which is why the good people of Königsberg cannot go for a walk crossing each of their seven bridges exactly once). If and only if a connected graph has exactly 2 nodes with odd degree, it has a non-circuit Euler path. The degree of an endpoint of a non-cycle Euler path is 1 + twice the number of times the path passes through that node, so it is always odd.

Models for computing graphs

Traditionally, computer science textbooks have offered edge lists, adjacency lists and adjacency matrices as data structures for graphs, with algorithms implemented in languages like C, C++ and Java. More recently other models and tools have been suggested, including query languages customised for graphs.

Edge list: The simplest way to represent a graph is to list its edges: for Fig 1, the edge list is {AC,CD,CF,BE}. It is easy to add an edge to the list; deletion is a little harder.

Table 1
Nodes Adjacent nodes
A C
B E
C F,D,A
D C
E B
F C
Adjacency list: An adjacency list is a ragged array: for each node it lists all adjacent nodes. Thus it represents a directed graph of n nodes as a list of n lists where list i contains node j if the graph has an edge from node i to node j.

An undirected graph may be represented by having node j in the list for node i, and node i in the list for node j. Table 1 shows the adjacency list of the graph in Fig 1 interpreted as undirected.

Adjacency matrix: An adjacency matrix represents a graph with n nodes as an n x n matrix, where the entry at (i,j) is 1 if there is an edge from node i to node j, or zero if there is not.

An adjacency matrix can represent a weighted graph using the weight as the entry, and can represent an undirected graph by using the same entry in both (i,j) and (j,i), or by using an upper triangular matrix.

There are useful glossaries here and here.

Graphs and SQL

Often standard SQL has been thought cumbersome for graph problems. Craig Mullins once wrote that "the set-based nature of SQL is not simple to master and is anathema to the OO techniques practiced by Java developers."

A few years after Mullins wrote that, SQL is everywhere, and it is increasingly applied to graph problems. DB2 has a WITH operator for processing recursive sets. Oracle has a CONNECT BY operator for graphs that are trees. SQL Server has recursive unions. MySQL has no such enhancements for graphs, but Joe Celko and Scott Stephens, among others, have published general SQL graph problem solutions that are simpler and smaller than equivalent C++, C# or Java code. Here we implement some of these ideas in MySQL.

Beware that in ports of edge list and adjacency list methods to SQL, there has been name slippage. What's often called the adjacency list model in the SQL world is actually an edge list model. If you follow the now-common practice in the SQL world of referring to edge lists as adjacency lists, don't be surprised to find that the model isn't quite like the adjacency list in Table 1. Here we waffle. We call them edge-adjacency lists.

There are also two newer kinds of models: what Joe Celko called the nested sets modealso known as the interval modelwhich uses greater-than/less-than arithmetic to encode tree relationships and modified preorder tree traversal (MPTT) to query them, and Tropashko's materialised path model, where each node is stored with its path to the root. So we have four main possibilities for modelling graphs in MySQL:

  • edge-adjacency lists: based on an adaptation by EF Codd of the logic of linked lists to table structures and queries,
  • adjacency matrices,
  • nested sets for trees simplify some queries, but insertion and deletion are cumbersome, and
  • materialised paths.

Here we work out how to implement edge-adjacency, nested set and materialised path models— or parts of them—in MySQL 5&6.

The edge list

The edge list is the simplest way to represent a graph: minimally, a one-column table of nodes and a two-column table of edges. The edges table can be thought of as a nodes-nodes bridge table, each row containing pointers to origin and destination nodes. Other details of nodes and edges can be encoded in extra nodes and edges columns, or in child tables.

In the real world, the nodes table might be a table of personnel, or assembly parts, or locations on a map. It might have many other columms of data. The edges table might also have additional columns for edge properties. The key integers of both tables might be BIGINTs.

To model Fig 1, though, we keep things as simple as possible:

Listing 1
CREATE TABLE nodes(
  nodeID CHAR(1) PRIMARY KEY
);
CREATE TABLE edges(
  childID CHAR(1) NOT NULL,
  parentID CHAR(1) NOT NULL,
  PRIMARY KEY(childID,parentID)
);
INSERT INTO nodes VALUES('A'), ('B'), ('C'), ('D'), ('E'), ('F');
INSERT INTO edges VALUES ('A','C'), ('C','D'), ('C','F'), ('B','E');
SELECT * FROM edges;
+---------+----------+
| childID | parentID |
+---------+----------+
| A       | C        |
| B       | E        |
| C       | D        |
| C       | F        |
+---------+----------+

Now, without any assumptions whatever about whether the graph is connected, whether it is directed, whether it is a tree, or whatever, how hard is it to write a reachability procedure, a procedure which tells us where we can get to from here, wherever 'here' is?

A simple approach is a breadth-first search:

  1. Seed the list with the starting node,
  2. Add, but do not duplicate, nodes which are children of nodes in the list,
  3. Add, but do not duplicate, nodes which are parents of nodes in the list,
  4. Repeat steps 2 and 3 until there are no more nodes to add.

Here it is as a MySQL stored procedure. It avoids duplicate nodes by defining reached.nodeID as a primary key and adding reachable nodes with INSERT IGNORE.

If you are running MySQL 5.0.6 through 5.0.15, you will have to make two changes to this and subsequent procedures: move CREATE TABLE statements outside the procedures, and set log_bin_trust_routine_creators=TRUE.

Listing 2
DROP PROCEDURE IF EXISTS ListReached;
DELIMITER |

CREATE PROCEDURE ListReached( IN root CHAR(1) )
BEGIN
  DECLARE rows SMALLINT DEFAULT 0;
  DROP TABLE IF EXISTS reached;
  CREATE TABLE reached (
    nodeID CHAR(1) PRIMARY KEY
  ) ENGINE=HEAP;
  INSERT INTO reached VALUES (root );
  SET rows = ROW_COUNT();
  WHILE rows > 0 DO
    INSERT IGNORE INTO reached
      SELECT DISTINCT childID
      FROM edges AS e
      INNER JOIN reached AS p ON e.parentID = p.nodeID;
    SET rows = ROW_COUNT();
    INSERT IGNORE INTO reached
      SELECT DISTINCT parentID
      FROM edges AS e
      INNER JOIN reached AS p ON e.childID = p.nodeID;
    SET rows = rows + ROW_COUNT();
  END WHILE;
  SELECT * FROM reached;
  DROP TABLE reached;
END;
|
DELIMITER ;
CALL ListReached('A');
+--------+
| nodeID |
+--------+
| A      |
| C      |
| D      |
| F      |
+--------+

To make the procedure more versatile, give it input parameters which tell it whether to list child, parent or all connections, and whether to recognise loops (for example C to C).

To give the model referential integrity, use InnoDB and make edges.childID and edges.parentID foreign keys. To add or delete a node, add or delete desired single rows in nodes and edges. To change an edge, edit it. The model does not require the graph to be connected or treelike, and does not presume direction.

The edge list is basic to what SQLers often call the adjacency list model.

Edge-adjacency list model of a tree

Writers in the SQL graph literature often give solutions using single denormalised tables. Denormalisation can cost, big time. The bigger the table, the bigger the cost. You cannot edit nodes and edges separately. Carrying extra node information during edge computation slows performance. With nodes and edges denormalised to one table, you have to represent the root node with a NULL.

Normalisation banishes these difficulties. For William Shakespeare's family tree (Fig 2) we again use two tables, a family table describing family members (nodes), and a familytree table with a row for each tie that binds (edges). Later, when we use a different tree model, we won't have to mess with the data being modelled.

Listing 3
-- Base data:
CREATE TABLE family (
  ID smallint(6) PRIMARY KEY AUTO_INCREMENT,
  name char(20) default '',
  siborder tinyint(4) default NULL,
  born smallint(4) unsigned default NULL,
  died smallint(4) unsigned default NULL
);
INSERT INTO family VALUES (1, 'Richard Shakespeare', NULL, NULL, 1561);
INSERT INTO family VALUES (2, 'Henry Shakespeare', 1, NULL, 1569);
INSERT INTO family VALUES (3, 'John Shakespeare', 2, 1530, 1601);
INSERT INTO family VALUES (4, 'Joan Shakespeare', 1, 1558, NULL);
INSERT INTO family VALUES (5, 'Margaret Shakespeare', 2, 1562, 1563);
INSERT INTO family VALUES (6, 'William Shakespeare', 3, 1564, 1616);
INSERT INTO family VALUES (7, 'Gilbert Shakespeare', 4, 1566, 1612);
INSERT INTO family VALUES (8, 'Joan Shakespeare', 5, 1568, 1646);
INSERT INTO family VALUES (9, 'Anne Shakespeare', 6, 1571, 1579);
INSERT INTO family VALUES (10, 'Richard Shakespeare', 7, 1574, 1613);
INSERT INTO family VALUES (11, 'Edmond Shakespeare', 8, 1580, 1607);
INSERT INTO family VALUES (12, 'Susana Shakespeare', 1, 1583, 1649);
INSERT INTO family VALUES (13, 'Hamnet Shakespeare', 1, 1585, 1596);
INSERT INTO family VALUES (14, 'Judith Shakespeare', 1, 1585, 1662);
INSERT INTO family VALUES (15, 'William Hart', 1, 1600, 1639);
INSERT INTO family VALUES (16, 'Mary Hart', 2, 1603, 1607);
INSERT INTO family VALUES (17, 'Thomas Hart', 3, 1605, 1670);
INSERT INTO family VALUES (18, 'Michael Hart', 1, 1608, 1618);
INSERT INTO family VALUES (19, 'Elizabeth Hall', 1, 1608, 1670);
INSERT INTO family VALUES (20, 'Shakespeare Quiney', 1, 1616, 1617);
INSERT INTO family VALUES (21, 'Richard Quiney', 2, 1618, 1639);
INSERT INTO family VALUES (22, 'Thomas Quiney', 3, 1620, 1639);
INSERT INTO family VALUES (23, 'John Bernard', 1, NULL, 1674);

-- Table which models the tree:
CREATE TABLE familytree (
  childID smallint NOT NULL,
  parentID smallint NOT NULL,
  PRIMARY KEY (childID, parentID);
);
INSERT INTO familytree VALUES
  (2, 1), (3, 1), (4, 2), (5, 2), (6, 2), (7, 2), (8, 2), (9, 2),
  (10, 2), (11, 2), (12, 6), (13, 6), (14, 6), (15, 8), (16, 8),
  (17, 8), (18, 8), (19, 12), (20, 14), (21, 14), (22, 14), (23, 19);

(The family PK is auto-increment, but the listing is more reader-friendly when the ID values are shown.)

It will be useful to have a function that returns the family.name for a pointer in familytree. Listing 4 shows that function, and simple queries which call it to retrieve child and parent names:

Listing 4
-- 5.0.16 OR LATER:
SET GLOBAL log_bin_trust_function_creators=TRUE;

DROP FUNCTION IF EXISTS PersonName;
DELIMITER |

CREATE FUNCTION PersonName( personID SMALLINT )
RETURNS CHAR(20)
BEGIN
  DECLARE pname CHAR(20) DEFAULT '';
  SELECT name INTO pname FROM family WHERE ID=personID;
  RETURN pname;
END;
|
DELIMITER ;

SELECT PersonName( parentID ) AS 'Parent of William'
FROM familytree
WHERE childID = 6;
+-------------------+
| Parent of William |
+-------------------+
| Henry Shakespeare |
+-------------------+
SELECT PersonName( childID ) AS 'Children of William'
FROM familytree
WHERE parentID = (
  SELECT ID FROM family
  WHERE name = 'William Shakespeare'
);
+---------------------+
| Children of William |
+---------------------+
| Susana Shakespeare  |
| Hamnet Shakespeare  |
| Judith Shakespeare  |
+---------------------+
SELECT
  PersonName(childID) AS child,
  PersonName(parentID) AS parent
FROM familytree;
+----------------------+---------------------+
| child                | parent              |
+----------------------+---------------------+
| Henry Shakespeare    | Richard Shakespeare |
| John Shakespeare     | Richard Shakespeare |
| Joan Shakespeare     | Henry Shakespeare   |
| Margaret Shakespeare | Henry Shakespeare   |
| William Shakespeare  | Henry Shakespeare   |
| Gilbert Shakespeare  | Henry Shakespeare   |
| Joan Shakespeare     | Henry Shakespeare   |
| Anne Shakespeare     | Henry Shakespeare   |
| Richard Shakespeare  | Henry Shakespeare   |
| Edmond Shakespeare   | Henry Shakespeare   |
| Susana Shakespeare   | William Shakespeare |
| Hamnet Shakespeare   | William Shakespeare |
| Judith Shakespeare   | William Shakespeare |
| William Hart         | Joan Shakespeare    |
| Mary Hart            | Joan Shakespeare    |
| Thomas Hart          | Joan Shakespeare    |
| Michael Hart         | Joan Shakespeare    |
| Elizabeth Hall       | Susana Shakespeare  |
| Shakespeare Quiney   | Judith Shakespeare  |
| Richard Quiney       | Judith Shakespeare  |
| Thomas Quiney        | Judith Shakespeare  |
| John Bernard         | Elizabeth Hall      |
+----------------------+---------------------+

GROUP_CONCAT() simplifies grouping parents with their children:

Listing 5
SELECT
  PersonName(parentID) AS Parent,
  GROUP_CONCAT( PersonName(childID) SEPARATOR ', ' ) AS Children
FROM familytree
GROUP BY parentID;
+---------------------+----------------------------------------------------------------------------------+
| Parent              | Children                                                                         |
+---------------------+----------------------------------------------------------------------------------+
| Richard Shakespeare | Henry Shakespeare, John Shakespeare                                              |
| Henry Shakespeare   | Edmond Shakespeare, Richard Shakespeare, Anne Shakespeare, Joan Shakespeare,     |
|                     | Gilbert Shakespeare, William Shakespeare, Margaret Shakespeare, Joan Shakespeare |
| William Shakespeare | Judith Shakespeare, Hamnet Shakespeare, Susana Shakespeare                       |
| Joan Shakespeare    | William Hart, Mary Hart, Thomas Hart, Michael Hart                               |
| Susana Shakespeare  | Elizabeth Hall                                                                   |
| Judith Shakespeare  | Shakespeare Quiney, Richard Quiney, Thomas Quiney                                |
| Elizabeth Hall      | John Bernard                                                                     |
+---------------------+----------------------------------------------------------------------------------+

The paterfamilias is the root node, individuals with no children are the leaf nodes, and queries to retrieve subtree statistics are straightforward:

Listing 6
SELECT
  PersonName(ID) AS Paterfamilias,
  IFNULL(born,'?') AS Born,
  IFNULL(died,'?') AS Died
FROM family AS f
LEFT JOIN familytree AS t ON f.ID=t.childID
WHERE t.childID IS NULL;
+---------------------+------+------+
| Paterfamilias       | Born | Died |
+---------------------+------+------+
| Richard Shakespeare | ?    | 1561 |
+---------------------+------+------+

SELECT
  PersonName(ID) AS Childless,
  IFNULL(born,'?') AS Born,
  IFNULL(died,'?') AS Died
FROM family AS f
LEFT JOIN familytree AS t ON f.ID=t.parentID
WHERE t.parentID IS NULL;
+----------------------+------+------+
| Childless            | Born | Died |
+----------------------+------+------+
| John Shakespeare     | 1530 | 1601 |
| Joan Shakespeare     | 1558 | ?    |
| Margaret Shakespeare | 1562 | 1563 |
| Gilbert Shakespeare  | 1566 | 1612 |
| Anne Shakespeare     | 1571 | 1579 |
| Richard Shakespeare  | 1574 | 1613 |
| Edmond Shakespeare   | 1580 | 1607 |
| Hamnet Shakespeare   | 1585 | 1596 |
| William Hart         | 1600 | 1639 |
| Mary Hart            | 1603 | 1607 |
| Thomas Hart          | 1605 | 1670 |
| Michael Hart         | 1608 | 1618 |
| Shakespeare Quiney   | 1616 | 1617 |
| Richard Quiney       | 1618 | 1639 |
| Thomas Quiney        | 1620 | 1639 |
| John Bernard         | ?    | 1674 |
+----------------------+------+------+

SELECT ROUND(AVG(died-born),2) AS 'Longevity of the childless'
FROM family AS f
LEFT JOIN familytree AS t ON f.ID=t.parentID
WHERE t.parentID IS NULL;
+----------------------------+
| Longevity of the childless |
+----------------------------+
|                      25.86 |
+----------------------------+

In marked contrast with Celko's nested sets model, inserting a new item in this model requires no revision of existing rows. We just add a new family row, then a new familytree row with IDs specifying who is the parent of whom. Deletion is also a two-step: delete the familytree row which documents the child-parent link, then delete the family row for that child.

Retrieving subtrees in the edge-adjacency list model

Retrieving subtrees is what gives the edge-adjacency list model its reputation for difficulty. We can't know in advance, except in the simplest of trees, how many levels of parent and child have to be queried, so we need recursion or a logically equivalent loop.

It's a natural problem for a stored procedure. Here is a breadth-first algorithm that is straightforward, though not optimised:

  1. Create a table for the results, descendants,
  2. Create a HEAP table, nextparents, for the next parents whose children are to be found , and another HEAP table for use as a temp copy,
  3. Seed nextparents with the name of the ancestor whose descendants we wish to list,
  4. Add to descendants all children of rows in nextparents,
  5. Seed nextparents with those children's IDs, and if there are any, loop back to 4.

As a convenience, the procedure accepts either a name or numeric ID argument. (With MySQL 5.0.6 or 5.0.7, you will have to move the CREATE TABLE calls outside the procedure, to step around a MySQL bug that was fixed in 5.0.9.)

Listing 7
DROP PROCEDURE IF EXISTS ListDescendants;
DELIMITER |

CREATE PROCEDURE ListDescendants( ancestor CHAR(20) )
BEGIN
  DECLARE rows, iLevel, iMode INT DEFAULT 0;
  -- create temp tables
  DROP TEMPORARY TABLE IF EXISTS descendants,nextparents,prevparents;
  CREATE TEMPORARY TABLE descendants( childID INT, parentID INT, level INT );
  CREATE TEMPORARY TABLE IF NOT EXISTS nextparents ( parentID SMALLINT) ENGINE=MEMORY;
  CREATE TEMPORARY TABLE prevparents LIKE nextparents;
  -- seed nextparents
  IF ancestor RLIKE '[:alpha:]+' THEN      -- ancestor passed as a string
    INSERT INTO nextparents SELECT id FROM family WHERE name=ancestor;
  ELSE
    SET iMode = 1;                         -- ancestor passed as a numeric
    INSERT INTO nextparents VALUES( CAST( ancestor AS UNSIGNED ));
  END IF;
  SET rows = ROW_COUNT();
  WHILE rows > 0 DO
    -- add children of nextparents
    SET iLevel = iLevel + 1;
    INSERT INTO descendants
      SELECT t.childID, t.parentID, iLevel
      FROM familytree AS t
      INNER JOIN nextparents USING(parentID);
    SET rows = ROW_COUNT();
    -- save nextparents to prevparents
    TRUNCATE prevparents;
    INSERT INTO prevparents
      SELECT * FROM nextparents;
    -- next parents are children of these parents:
    TRUNCATE nextparents;
    INSERT INTO nextparents
      SELECT childID FROM familytree
      INNER JOIN prevparents USING (parentID);
    SET rows = rows + ROW_COUNT();
  END WHILE;  
  -- result
  IF iMode = 1 THEN
    SELECT CONCAT(REPEAT( ' ', level), parentID ) As Parent, GROUP_CONCAT(childID) AS Child
    FROM descendants GROUP BY parentID ORDER BY level;
  ELSE
    SELECT CONCAT(REPEAT( ' ', level), PersonName(parentID) ) As Parent, PersonName(childID) AS Child
    FROM descendants;
  END IF;
  DROP TEMPORARY TABLE descendants,nextparents,prevparents;
END;
|
DELIMITER ;
CALL ListDescendants( 1 );
+---------+-------------------+
| Parent  | Child             |
+---------+-------------------+
|  1      | 2,3               |
|   2     | 11,10,9,8,7,6,5,4 |
|    6    | 14,13,12          |
|    8    | 15,16,17,18       |
|     12  | 19                |
|     14  | 20,21,22          |
|      19 | 23                |
+---------+-------------------+
CALL ListDescendants( 'Richard Shakespeare' );
+------------------------+----------------------+
| Parent                 | Child                |
+------------------------+----------------------+
|  Richard Shakespeare   | Henry Shakespeare    |
|  Richard Shakespeare   | John Shakespeare     |
|   Henry Shakespeare    | Joan Shakespeare     |
|   Henry Shakespeare    | Margaret Shakespeare |
|   Henry Shakespeare    | William Shakespeare  |
|   Henry Shakespeare    | Gilbert Shakespeare  |
|   Henry Shakespeare    | Joan Shakespeare     |
|   Henry Shakespeare    | Anne Shakespeare     |
|   Henry Shakespeare    | Richard Shakespeare  |
|   Henry Shakespeare    | Edmond Shakespeare   |
|    William Shakespeare | Susana Shakespeare   |
|    William Shakespeare | Hamnet Shakespeare   |
|    William Shakespeare | Judith Shakespeare   |
|    Joan Shakespeare    | William Hart         |
|    Joan Shakespeare    | Mary Hart            |
|    Joan Shakespeare    | Thomas Hart          |
|    Joan Shakespeare    | Michael Hart         |
|     Susana Shakespeare | Elizabeth Hall       |
|     Judith Shakespeare | Shakespeare Quiney   |
|     Judith Shakespeare | Richard Quiney       |
|     Judith Shakespeare | Thomas Quiney        |
|      Elizabeth Hall    | John Bernard         |
+------------------------+----------------------+

Not too bad. Notice how little of the code is specific to the Shakespeare example: the logic will port easily to any other edge list. In fact let's prove that right now by rewriting ListDescendants() for the general case. We need six parameters to drive three PREPAREd queries:

  • the name and key column of the data table (family, ID in the Shakespeare example),
  • the name of the edge table, its ID column and its parentID column (familytree, childID, parentID),
  • a parameter for the ancestorID whose subtree is to be listed:
Listing 7a: General-purpose edge list subtree walker
DROP PROCEDURE IF EXISTS GenericSubtree;
DELIMITER |
CREATE PROCEDURE GenericSubtree( 
  dataTable CHAR(64),
  dataIDcol CHAR(64),
  edgeTable CHAR(64),
  edgeIDcol CHAR(64),
  edgeParentIDcol CHAR(64),
  ancestorID INT
)
BEGIN
  DECLARE rows INT DEFAULT 0;
  -- create temp tables
  DROP TEMPORARY TABLE IF EXISTS descendants,nextparents,prevparents;
  CREATE TEMPORARY TABLE descendants( id INT, parentID INT );
  CREATE TEMPORARY TABLE IF NOT EXISTS nextparents ( parentID INT ) ENGINE=MEMORY;
  CREATE TEMPORARY TABLE prevparents LIKE nextparents;
  -- seed nextparents table with ancestorID
  INSERT INTO nextparents VALUES (ancestorID);
  SET rows = ROW_COUNT();
  WHILE rows > 0 DO
    -- insert children of nextparents into descendants
    SET @sql = CONCAT( 'INSERT INTO descendants SELECT t.', edgeIDcol, ',t.', 
                       edgeParentIDcol, ' FROM ', edgeTable, ' AS t JOIN nextparents n ON t.', 
                       edgeParentIDcol, ' = n.parentID'
                     );
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    SET rows = ROW_COUNT();
    DROP PREPARE stmt;
    -- save copy of nextparents to prevparents
    TRUNCATE prevparents;
    INSERT INTO prevparents SELECT * FROM nextparents;
    -- insert the children of these parents into nextparents:
    TRUNCATE nextparents;
    SET @sql = CONCAT( 'INSERT INTO nextparents SELECT ', edgeIDcol, ' FROM ', edgeTable,
                       ' t JOIN prevparents p ON t.', edgeParentIDcol, ' = p.parentID'
                     );
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    SET rows = rows + ROW_COUNT();
    DROP PREPARE stmt;
  END WHILE;
  -- show the result
  SET @sql = CONCAT( 'SELECT t.* FROM descendants d',
                     ' JOIN ', edgeTable, ' e ON d.ID = e.', edgeIDCol,
                     ' JOIN ', dataTable, ' t ON e.', edgeIDcol, '=t.', dataIDcol
                   );
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DROP PREPARE stmt;
  -- clean up
  DROP TEMPORARY TABLE descendants,nextparents,prevparents;
END;
|
DELIMITER ;

Does it work? Let's have William's descendants again:

CALL GenericSubtree( 'family', 'ID', 'familytree', 'childID', 'parentID', 
                     (SELECT ID from family where name='William Shakespeare') );
+----+--------------------+----------+------+------+
| ID | name               | siborder | born | died |
+----+--------------------+----------+------+------+
| 12 | Susana Shakespeare |        1 | 1583 | 1649 |
| 13 | Hamnet Shakespeare |        1 | 1585 | 1596 |
| 14 | Judith Shakespeare |        1 | 1585 | 1662 |
| 19 | Elizabeth Hall     |        1 | 1608 | 1670 |
| 20 | Shakespeare Quiney |        1 | 1616 | 1617 |
| 21 | Richard Quiney     |        2 | 1618 | 1639 |
| 22 | Thomas Quiney      |        3 | 1620 | 1639 |
| 23 | John Bernard       |        1 | NULL | 1674 |
+----+--------------------+----------+------+------+

Is it fast? No. But once we have this algorithm, pruning subtrees is no harder than calling GenericSubtree() then deleting the listed rows. Better still, write a generic tree pruner from Listing 7a replacing the final SELECT with a DELETE command. To insert a subtree, prepare a table of new rows, point its top edge at an existing node as parent, and INSERT it. 

For speed, try recursion! Here is a recursive depth-first PHP treewalk for the  familytree and family tables:

Listing 7b: Recursive edge list subtree in PHP
$info = recursivesubtree( 1, $a = array(), 0 );
foreach( $info as $row ) 
  echo str_repeat( "&nbsp;", 2*$row[4] ), ( $row[3] > 0 ) ? "<b>{$row[1]}</b>" : $row[1], "<br/>";

function recursivesubtree( $rootID, $a, $level ) {
  $childcountqry = "(SELECT COUNT(*) FROM familytree WHERE parentID=t.childID) AS childcount";
  $qry = "SELECT t.childid,f.name,t.parentid,$childcountqry,$level " .
         "FROM familytree t JOIN family f ON t.childID=f.ID " .
         "WHERE parentid=$rootID ORDER BY childcount<>0,t.childID";
  $res = mysql_qry( $qry );
  while( $row = mysql_fetch_row( $res )) {
    $a[] = $row;
    if( $row[3] > 0 ) $a = recursivesubtree( $row[0], $a, $level+1 );    // down before right
  }
  return $a;
}

A query with a subquery, a fetch loop, and a recursive call--that's all there is to it. To port this to MySQL, you must have set maximum recursion depth in my.cnf/ini or in your client:

Listing 7c: Recursive edge list subtree in MySQL
SET @@SESSION.max_sp_recursion_depth=25;
DROP PROCEDURE IF EXISTS recursivesubtree;
DELIMITER |
CREATE PROCEDURE recursivesubtree( iroot INT, ilevel INT )
BEGIN
  DECLARE ichildid, iparentid,ichildcount,done INT DEFAULT 0;
  DECLARE cname VARCHAR(64);
  DECLARE cur CURSOR FOR
  SELECT 
    t.childid,t.parentid,f.name,
    (SELECT COUNT(*) FROM familytree WHERE parentID=t.childID) AS childcount
  FROM familytree t JOIN family f ON t.childID=f.ID
  WHERE parentid=iroot ORDER BY childcount<>0,t.childID;
  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = 1;
  IF ilevel = 0 THEN
    DROP TEMPORARY TABLE IF EXISTS descendants;
    CREATE TEMPORARY TABLE descendants(
      childID INT,parentID INT,name VARCHAR(64),childcount INT,level INT
    );
  END IF;
  OPEN cur;
  WHILE NOT done DO
    FETCH cur INTO ichildid,iparentid,cname,ichildcount;
    IF NOT done THEN
      INSERT INTO descendants VALUES(ichildid,iparentid,cname,ichildcount,ilevel );
      IF ichildcount > 0 THEN
        CALL recursivesubtree( ichildid, ilevel + 1 );
      END IF;
    END IF;
  END WHILE;
  CLOSE cur;
END;
|
DELIMITER ;
CALL recursivesubtree(1,0);
SELECT CONCAT(REPEAT(' ',2*level),IF(childcount,UPPER(name),name)) AS 'Richard\'s Descendants'
FROM descendants;
+--------------------------+
| Richard's Descendants    |
+--------------------------+
| John Shakespeare         |
| HENRY SHAKESPEARE        |
|   Joan Shakespeare       |
|   Margaret Shakespeare   |
|   Gilbert Shakespeare    |
|   Anne Shakespeare       |
|   Richard Shakespeare    |
|   Edmond Shakespeare     |
|   WILLIAM SHAKESPEARE    |
|     Hamnet Shakespeare   |
|     SUSANA SHAKESPEARE   |
|       ELIZABETH HALL     |
|         John Bernard     |
|     JUDITH SHAKESPEARE   |
|       Shakespeare Quiney |
|       Richard Quiney     |
|       Thomas Quiney      |
|   JOAN SHAKESPEARE       |
|     William Hart         |
|     Mary Hart            |
|     Thomas Hart          |
|     Michael Hart         |
+--------------------------+

The recursive depth-first treewalk is faster than the breadth-first procedures. It is also faster than a MySQL version of Kendall Willet's depth-first edge list subtree algorithm:

Listing 7d: Depth-first edge list subtree 
CREATE PROCEDURE depthfirstsubtree( iroot INT )
BEGIN
  DECLARE ilastvisited, inxt, ilastord INT;
  SET ilastvisited = iroot;
  SET ilastord = 1;
  DROP TABLE IF EXISTS descendants;
  CREATE TABLE descendants SELECT childID,parentID,-1 AS ord FROM familytree;
  UPDATE descendants SET ord=1 WHERE childID=iroot;
  this: LOOP
    SET inxt = NULL;
    SELECT MIN(childID) INTO inxt FROM descendants   -- go down
    WHERE parentID = ilastvisited AND ord = -1 ;
    IF inxt IS NULL THEN                             -- nothing down, so go right   
      SELECT MIN(d2.childID) INTO inxt 
      FROM descendants d1
      JOIN descendants d2 ON d1.parentID = d2.parentID AND d1.childID < d2.childID
      WHERE d1.childID = ilastvisited;
    END IF;
    IF inxt IS NULL THEN                             -- nothing right. so go up        
      SELECT parentID INTO inxt FROM descendants
      WHERE childID = ilastvisited AND parentID IS NOT NULL;
    END IF;
    UPDATE descendants SET ord = ilastord + 1
    WHERE childID = inxt AND ord = -1;
    IF ROW_COUNT() > 0 THEN
      SET ilastord = ilastord + 1;
    END IF;
    IF inxt IS NULL THEN
      LEAVE this;
    END IF;
    SET ilastvisited = inxt;
  END LOOP;
END;

One reason Willet's is slower is that MySQL does not permit use of temporary tables with its queries; when all algorithms are denied temp tables, though, this algorithm is still slower than recursion.

Edge list subtrees are easier to query than their reputation suggests. And edge tables are flexible. For a tree describing a parts explosion rather than a family, just add columns for weight, quantity, assembly time, cost, price and so on. Reports need only aggregate column values and sums. We'll revisit this near the end of the chapter.

Enumerating paths in an edge-adjacency list

Path enumeration in an edge list tree is almost as easy as depth-first subtree traversal:

  • create a table for paths,
  • seed it with paths of unit length from the tree table,
  • iteratively add paths till there are no more to add.

MySQL's INSERT IGNORE command simplifies the code by removing the need for a NOT EXISTS(...) clause in the INSERT ... SELECT statement. Since adjacencies are logically symmetrical, we make path direction the caller's choice, UP or DOWN:

Listing 8
DROP PROCEDURE IF EXISTS ListAdjacencyPaths;
DELIMITER |
CREATE PROCEDURE ListAdjacencyPaths( IN direction CHAR(5) )
BEGIN
  DROP TABLE IF EXISTS paths;
  CREATE TABLE paths(
    start SMALLINT,
    stop SMALLINT,
    len SMALLINT,
    PRIMARY KEY(start,stop)
  ) ENGINE=HEAP;
  IF direction = 'UP' THEN
    INSERT INTO paths
      SELECT childID,parentID,1
      FROM familytree;
  ELSE
    INSERT INTO paths
      SELECT parentID,childID,1
      FROM familytree;
  END IF;
  WHILE ROW_COUNT() > 0 DO
    INSERT IGNORE INTO paths
      SELECT DISTINCT
        p1.start,
        p2.stop,
        p1.len + p2.len
      FROM paths AS p1 INNER JOIN paths AS p2 ON p1.stop = p2.start;
  END WHILE;
  SELECT start, stop, len
  FROM paths
  ORDER BY start, stop;
  DROP TABLE paths;
END;
|
DELIMITER ;

To find the paths from just one node, seed the paths table with paths from the starting node, then iteratively search a JOIN of familytree and paths for edges which will extend existing paths in the user-specified direction:

Listing 9
DROP PROCEDURE IF EXISTS ListAdjacencyPathsOfNode;
DELIMITER |
CREATE PROCEDURE ListAdjacencyPathsOfNode( IN node SMALLINT, IN direction CHAR(5) )
BEGIN
  TRUNCATE paths;
  IF direction = 'UP' THEN
    INSERT INTO paths
      SELECT childID,parentID,1
      FROM familytree
      WHERE childID = node;
  ELSE
    INSERT INTO paths
      SELECT parentID,childID,1
      FROM familytree
      WHERE parentID = node;
  END IF;
  WHILE ROW_COUNT() > 0 DO
    IF direction = 'UP' THEN
      INSERT IGNORE INTO paths
        SELECT DISTINCT
          paths.start,
          familytree.parentID,
          paths.len + 1
        FROM paths
          INNER JOIN familytree ON paths.stop = familytree.childID;
    ELSE
      INSERT IGNORE INTO paths
        SELECT DISTINCT
          paths.start,
          familytree.childID,
          paths.len + 1
        FROM paths
          INNER JOIN familytree ON paths.stop = familytree.parentID;

    END IF;
  END WHILE;
    SELECT start, stop, len
    FROM paths
    ORDER BY start, stop;
END;
|
DELIMITER ;

CALL ListAdjacencyPathsOfNode(1,'DOWN');
+-------+------+------+
| start | stop | len  |
+-------+------+------+
|     1 |    2 |    1 |
|     1 |    3 |    1 |
|     1 |    4 |    2 |
|     1 |    5 |    2 |
|     1 |    6 |    2 |
|     1 |    7 |    2 |
|     1 |    8 |    2 |
|     1 |    9 |    2 |
|     1 |   10 |    2 |
|     1 |   11 |    2 |
|     1 |   12 |    3 |
|     1 |   13 |    3 |
|     1 |   14 |    3 |
|     1 |   15 |    3 |
|     1 |   16 |    3 |
|     1 |   17 |    3 |
|     1 |   18 |    3 |
|     1 |   19 |    4 |
|     1 |   20 |    4 |
|     1 |   21 |    4 |
|     1 |   22 |    4 |
|     1 |   23 |    5 |
+-------+------+------+

These algorithms don't bend the brain. They perform acceptably with large trees. Querying edge-adjacency lists for subtrees and paths is less daunting than their reputation suggests.

Nested sets model of a tree

Imagine an oval drawn round every leaf and every subtree in Fig 2, and a final oval round the entire tree. The tree is a set. Each subtree is a subset. That's the basic idea of the nested sets model.

The advantage of the nested sets model is that root, leaves, subtrees, levels, tree height, ancestors, descendants and paths can be retrieved without recursion or application language code. The disadvantages are:

  • initial setup of the tree table can be difficult,
  • queries for parents (immediate superiors) and children (immediate subordinates) are more complicated than with an edge list model,
  • insertion, updates and deletion are extremely cumbersome since they may require updates to much of the tree.

The nested sets model depends on using a modified preorder tree traversal (MPTT) depth-first algorithm to assign each node left and right integers which define the node's tree position. All nodes of a subtree have

  • left values greater than the subtree parent's left value, and
  • right values smaller than that of the subtree parent's right value.

so queries for subtrees are dead simple. If the numbering scheme is integer-sequential as in Fig 3, the root node receives a left value of 1 and a right value equal to twice the item count.

To see how to code nested sets using MPTT, trace the ascending integers in Fig 3, starting with 1 on the left side of the root node (Richard Shakespeare). Following edges downward and leftward, the left side of each box gets the next integer. When you reach a leaf (Joan, left=3), the right side of that box gets the next integer (4). If there is another node to the right on the same level, continue in that direction; otherwise continue up the right side of the subtree you just descended. When you arrive back at the root on the right side, you're done. Down, right and up.

A serious problem with this scheme jumps out right away: after you've written the Fig 3 tree to a table, what if historians discover an older brother or sister of Henry and John? Every row in the tree table must be updated!

Celko and others have proposed alternative numbering schemes to get round this problem, but the logical difficulty remains: inserts and updates can invalidate many or all rows, and no SQL CHECK or CONSTRAINT can prevent it. The nested sets model is not good for trees which require frequent updates, and is pretty much unsupportable for large updatable trees that will be accessed by many concurrent users. But as we'll see in a moment, it can be very useful indeed for reporting a tree.

How to build a nested set representation from an edge list

Obviously, numbering a tree by hand will be error-prone, and seriously impractical for large trees. It's usually best, therefore, to code the tree initially as an edge-adjacency list, then use a stored procedure to translate the edge list to a nested sets model. We can write a stored procedure that uses Celko's pushdown stack method to translate any edge list into a nested sets tree:

  1. create a table for the tree: node, leftedge, rightedge, and a stack pointer (top),
  2. seed that table, nestedsettree, with the root node of the adjacency tree,
  3. set a counter to 1 plus the left value of the root node, i.e. 2,
  4. while that counter is less than the right value of the root node ...
    • insert rows for children of parent rows into nestedsettree, or
    • update existing right integers in nestedsettree.
Listing 10
DROP PROCEDURE IF EXISTS EdgeListToNestedSet;
DELIMITER |
CREATE PROCEDURE EdgeListToNestedSet( edgeTable CHAR(64), idCol CHAR(64), parentCol CHAR(64) )
BEGIN
  DECLARE maxrightedge, rows SMALLINT DEFAULT 0;
  DECLARE current SMALLINT DEFAULT 1;
  DECLARE nextedge SMALLINT DEFAULT 2;
  -- create working tree table as a copy of edgeTable
  DROP TEMPORARY TABLE IF EXISTS tree;
  CREATE TEMPORARY TABLE tree( childID INT, parentID INT );
  SET @sql = CONCAT( 'INSERT INTO tree SELECT ', idCol, ',', parentCol, ' FROM ', edgeTable );
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DROP PREPARE stmt;
  -- initialise result table
  DROP TABLE IF EXISTS nestedsettree;
  CREATE TABLE nestedsettree (
    top SMALLINT, nodeID SMALLINT, leftedge SMALLINT, rightedge SMALLINT
  ) ENGINE=HEAP;
  -- find root of tree (or: add an sproc param for the desired root value)
  SET @sql = CONCAT( 'SELECT DISTINCT f.', parentCol, ' INTO @root FROM ',
                     edgeTable, ' AS f LEFT JOIN ', edgeTable, ' AS t ON f.',
                     parentCol, '=', 't.', idCol, ' WHERE t.', idCol, ' IS NULL'
                   );
  -- For the familytree table, the above query parses to:
  -- SELECT DISTINCT f.parentid INTO @root 
  -- FROM familytree AS f LEFT JOIN familytree AS t ON f.parentid=t.childid 
  -- WHERE t.childid IS NULL |
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DROP PREPARE stmt;
  -- build nested sets tree
  SET maxrightedge = 2 * (1 + (SELECT + COUNT(*) FROM tree));
  INSERT INTO nestedsettree VALUES( 1, @root, 1, maxrightedge );
  WHILE nextedge < maxrightedge DO
    SELECT *
    FROM nestedsettree AS s
    JOIN tree AS t ON s.nodeID=t.parentID AND s.top=current;
    SET rows = FOUND_ROWS();
    IF rows > 0 THEN
      BEGIN
        INSERT INTO nestedsettree
          SELECT current+1, MIN(t.childID), nextedge, NULL
          FROM nestedsettree AS s
          JOIN tree AS t ON s.nodeID = t.parentID AND s.top = current;
        DELETE FROM tree
        WHERE childID = (SELECT nodeID FROM nestedsettree WHERE top=(current+1));
        SET nextedge = nextedge + 1;
        SET current = current + 1;
       END;
     ELSE
       BEGIN
         UPDATE nestedsettree
         SET rightedge=nextedge, top = -top
         WHERE top=current;
         SET nextedge=nextedge+1;
         SET current=current-1;
       END;
    END IF;
  END WHILE;
  SET rows := ( SELECT COUNT(*) FROM tree );
  DROP TEMPORARY TABLE tree;
  -- show result
  IF rows > 0 THEN
    SELECT 'Orphaned rows remain';
  ELSE
    SELECT nodeID, leftedge AS 'Left', rightedge AS 'Right'
    FROM nestedsettree ORDER BY nodeID;
  END IF;
END |
DELIMITER ;
CALL EdgeListToNestedSet( 'familytree', 'childID', 'parentID' );
+--------+------+-------+
| nodeID | Left | Right |
+--------+------+-------+
|      1 |    1 |    46 |
|      2 |    2 |    43 |
|      3 |   44 |    45 |
|      4 |    3 |     4 |
|      5 |    5 |     6 |
|      6 |    7 |    24 |
|      7 |   25 |    26 |
|      8 |   27 |    36 |
|      9 |   37 |    38 |
|     10 |   39 |    40 |
|     11 |   41 |    42 |
|     12 |    8 |    13 |
|     13 |   14 |    15 |
|     14 |   16 |    23 |
|     15 |   28 |    29 |
|     16 |   30 |    31 |
|     17 |   32 |    33 |
|     18 |   34 |    35 |
|     19 |    9 |    12 |
|     20 |   17 |    18 |
|     21 |   19 |    20 |
|     22 |   21 |    22 |
|     23 |   10 |    11 |
+--------+------+-------+
Finding a node's parent and children

A nested sets tree can be queried without recursion, but using the model's arithmetic in a query requires a bit of thought. For example, if we INNER JOIN the tree table AS t1 to itself AS t2 ON t2.leftedge BETWEEN t1.leftedge AND t1.rightedge, and if we scope the query to the descendants of a particular node, we get a list of t2.nodeID values in which the children (one level down) occur once, the grandkids (two levels down) occur twice, and so on:

Listing 11a
SELECT GROUP_CONCAT(t2.nodeID ORDER BY t2.nodeID) AS 'Descendants of William'
FROM nestedsettree AS t1
INNER JOIN nestedsettree AS t2
  ON t2.leftedge BETWEEN t1.leftedge AND t1.rightedge
WHERE t1.leftedge > 7 AND t1.rightedge < 24;
+-------------------------------------------+
| Descendants of William                    |
+-------------------------------------------+
| 12,13,14,19,19,20,20,21,21,22,22,23,23,23 |
+-------------------------------------------+

Therefore HAVING COUNT(t2.nodeID)=1 will scope listed descendants to those who are one level down:

Listing 11b
DROP PROCEDURE IF EXISTS ListNestedSetChildNodes;
DELIMITER |
CREATE PROCEDURE ListNestedSetChildNodes( node SMALLINT )
BEGIN
  DECLARE thisleft, thisright SMALLINT DEFAULT 0;
  SELECT leftedge, rightedge
    INTO thisleft, thisright
  FROM nestedsettree
  WHERE nodeID = node;
  SELECT
    PersonName(t2.nodeid) AS Children
  FROM nestedsettree AS t1
    INNER JOIN nestedsettree AS t2
    ON t2.leftedge BETWEEN t1.leftedge AND t1.rightedge
  WHERE t1.leftedge > thisleft AND t1.rightedge < thisright
  GROUP BY t2.nodeid
  HAVING COUNT(t2.nodeid) = 1
  ORDER BY t2.leftedge;
END;
|
DELIMITER ;

CALL ListNestedSetChildNodes(6);
+--------------------+
| Children           |
+--------------------+
| Susana Shakespeare |
| Hamnet Shakespeare |
| Judith Shakespeare |
+--------------------+

Similar logic gets us the parent of a node:

  1. retrieve its leftedge and rightedge values,
  2. compute its level,
  3. find the node which is one level up and which has edge values outside the node's leftedge and rightedge values.
Listing 12
DROP PROCEDURE IF EXISTS ShowNestedSetParent;
DELIMITER |
CREATE PROCEDURE ShowNestedSetParent( node SMALLINT )
BEGIN
  DECLARE level, thisleft, thisright SMALLINT DEFAULT 0;
  -- find node edges
  SELECT leftedge, rightedge
    INTO thisleft, thisright
  FROM nestedsettree
  WHERE nodeID = node;
  -- find node level
  SELECT COUNT(n1.nodeid)
    INTO level
  FROM nestedsettree AS n1
    INNER JOIN nestedsettree AS n2
    ON n2.leftedge BETWEEN n1.leftedge AND n1.rightedge
  WHERE n2.nodeid = node
  GROUP BY n2.nodeid;
  -- find parent
  SELECT
    PersonName(n2.nodeid) AS Parent
  FROM nestedsettree AS n1
    INNER JOIN nestedsettree AS n2
    ON n2.leftedge BETWEEN n1.leftedge AND n1.rightedge
  WHERE n2.leftedge < thisleft AND n2.rightedge > thisright
  GROUP BY n2.nodeid
  HAVING COUNT(n1.nodeid)=level-1;
END;
|
DELIMITER ;
CALL ShowNestedSetParent(6);
+-------------------+
| Child             |
+-------------------+
| Henry Shakespeare |
+-------------------+
Other queries

For some queries, adjacency list and nested sets queries are equivalently simple. For example to find the tree root and leaves, compare Listing 6 with:

Listing 13
SELECT
  name AS Paterfamilias,
  IFNULL(born,'?') AS Born,
  IFNULL(died,'?') AS Died
FROM nestedsettree AS t
INNER JOIN family AS f ON t.nodeID=f.ID
WHERE leftedge = 1;
+---------------------+------+------+
| Paterfamilias       | Born | Died |
+---------------------+------+------+
| Richard Shakespeare | ?    | 1561 |
+---------------------+------+------+

SELECT
  name AS 'Childless Shakespeares',
  IFNULL(born,'?') AS Born,
  IFNULL(died,'?') AS Died
FROM nestedsettree AS t
INNER JOIN family AS f ON t.nodeID=f.ID
WHERE rightedge = leftedge + 1;
+------------------------+------+------+
| Childless Shakespeares | Born | Died |
+------------------------+------+------+
| Joan Shakespeare       | 1558 | ?    |
| Margaret Shakespeare   | 1562 | 1563 |
| John Bernard           | ?    | 1674 |
| Hamnet Shakespeare     | 1585 | 1596 |
| Shakespeare Quiney     | 1616 | 1617 |
| Richard Quiney         | 1618 | 1639 |
| Thomas Quiney          | 1620 | 1639 |
| Gilbert Shakespeare    | 1566 | 1612 |
| William Hart           | 1600 | 1639 |
| Mary Hart              | 1603 | 1607 |
| Thomas Hart            | 1605 | 1670 |
| Michael Hart           | 1608 | 1618 |
| Anne Shakespeare       | 1571 | 1579 |
| Richard Shakespeare    | 1574 | 1613 |
| Edmond Shakespeare     | 1580 | 1607 |
| John Shakespeare       | 1530 | 1601 |
+------------------------+------+------+

But in contrast to edge list models, finding subtrees in a nested sets model requires no twisted code, no stored procedure. To retrieve the nestedsettree nodes in William's subtree, simply ask for the nodes whose leftedge values are greater, and whose rightedge values are smaller than William's:

Listing 14
SELECT
  PersonName(t.nodeID) AS Descendant
FROM nestedsettree AS s
  INNER JOIN nestedsettree AS t
  ON s.leftedge < t.leftedge AND s.rightedge > t.rightedge
WHERE s.nodeID = (
  SELECT ID FROM family
  WHERE name='William Shakespeare'
);

Finding a single path in the nested sets model is about as complicated as path enumeration (Listings 8, 9) with edge-adjacency lists:

Listing 15
SELECT
  t2.nodeID AS Node,
  PersonName(t2.nodeID) AS Person,
  (SELECT COUNT(*)
   FROM nestedsettree AS t4
   WHERE t4.leftedge BETWEEN t1.leftedge AND t1.rightedge
     AND t2.leftedge BETWEEN t4.leftedge AND t4.rightedge
   ) AS Path
FROM nestedsettree AS t1
  INNER JOIN nestedsettree AS t2 ON t2.leftedge BETWEEN t1.leftedge AND t1.rightedge
  INNER JOIN nestedsettree AS t3 ON t3.leftedge BETWEEN t2.leftedge AND t2.rightedge
WHERE t1.nodeID=(SELECT ID FROM family WHERE name='William Shakespeare')
  AND t3.nodeID=(SELECT ID FROM family WHERE name='John Bernard');
+------+---------------------+------+
| Node | Person              | Path |
+------+---------------------+------+
|    6 | William Shakespeare |    1 |
|   12 | Susana Shakespeare  |    2 |
|   19 | Elizabeth Hall      |    3 |
|   23 | John Bernard        |    4 |
+------+---------------------+------+
Displaying the tree

Here the nested sets model shines. The arithmetic that was used to build the tree makes short work of summary queries. For example to retrieve a node list which preserves all parent-child relations, we need just two facts:

  • listing order is the order taken in the node walk that created the tree, i.e. leftedge,
  • a node's indentation depth is simply the JOIN (edge) count from root to node:
Listing 16
SELECT
  CONCAT( SPACE(2*COUNT(n1.nodeid)-2), PersonName(n2.nodeid) )
  AS 'The Shakespeare Family Tree'
FROM nestedsettree AS n1
  INNER JOIN nestedsettree n2
  ON n2.leftedge BETWEEN n1.leftedge AND n1.rightedge
GROUP BY n2.nodeid
ORDER BY n2.leftedge;
+-----------------------------+
| The Shakespeare Family Tree |
+-----------------------------+
| Richard Shakespeare         |
|   Henry Shakespeare         |
|     Joan Shakespeare        |
|     Margaret Shakespeare    |
|     William Shakespeare     |
|       Susana Shakespeare    |
|         Elizabeth Hall      |
|           John Bernard      |
|       Hamnet Shakespeare    |
|       Judith Shakespeare    |
|         Shakespeare Quiney  |
|         Richard Quiney      |
|         Thomas Quiney       |
|     Gilbert Shakespeare     |
|     Joan Shakespeare        |
|       William Hart          |
|       Mary Hart             |
|       Thomas Hart           |
|       Michael Hart          |
|     Anne Shakespeare        |
|     Richard Shakespeare     |
|     Edmond Shakespeare      |
|   John Shakespeare          |
+-----------------------------+

To retrieve only a subtree, add a query clause which restricts nodes to those whose edges are within the range of the parent node's left and right edge values, for example for William and his descendants...

WHERE n1.leftedge >= 7 AND n1.rightedge <=24

Node insertions, updates and deletions

Nested set arithmetic also helps with insertions, updates and deletions, but the problem remains that changing just one node can require changing much of the tree.

Inserting a node in the tree requires at least two pieces of information: the nodeID to insert, and the nodeID of its parent. The model is normalised so the nodeID first must have been added to the parent (family) table. The algorithm for adding the node to the tree is:

  1. increment leftedge by 2 in nodes where the rightedge value is greater than the parent's rightedge,
  2. increment rightedge by 2 in nodes where the leftedge value is greater than the parent's leftedge,
  3. insert a row with the given nodeID, leftedge = 1 + parent's leftedge, rightedge = 2 + parent's leftedge.

That's not difficult, but all rows will have to be updated!

Listing 17
DROP PROCEDURE IF EXISTS InsertNestedSetNode;
DELIMITER |
CREATE PROCEDURE InsertNestedSetNode( IN node SMALLINT, IN parent SMALLINT )
BEGIN
  DECLARE parentleft, parentright SMALLINT DEFAULT 0;
  SELECT leftedge, rightedge
    INTO parentleft, parentright
  FROM nestedsettree
  WHERE nodeID = parent;
  IF FOUND_ROWS() = 1 THEN
    BEGIN
      UPDATE nestedsettree
        SET rightedge = rightedge + 2
      WHERE rightedge > parentleft;
      UPDATE nestedsettree
        SET leftedge = leftedge + 2
      WHERE leftedge > parentleft;
      INSERT INTO nestedsettree
        VALUES ( 0, node, parentleft + 1, parentleft + 2 );
    END;
  END IF;
END;
|
DELIMITER ;

"Sibline" or horizontal order is obviously significant in family trees, but may not be significant in other trees. Listing 17 adds the new node at the left edge of the sibline. To specify another position, modify the procedure to accept a third parameter for the nodeID which is to be to the left or right of the insertion point.

Updating a node in place requires nothing more than editing nodeID to point at a different parent row.

Deleting a node raises the problem of how to repair links severed by the deletion. In tree models of parts explosions, the item to be deleted is often replaced by a new item, so it can be treated like a simple nodeID update. In organisational boss-employee charts, though, does a colleague move over, does a subordinate get promoted, does everybody in the subtree move up a level, or does something else happen? No formula can catch all the possibilities. Listing 18 illustrates how to handle two common scenarios, move everyone up, and move someone over. All possibilities except simple replacement of the nodeID involve changes elsewhere in the tree.

Listing 18
DROP PROCEDURE IF EXISTS DeleteNestedSetNode;
DELIMITER |
CREATE PROCEDURE DeleteNestedSetNode( IN mode CHAR(7), IN node SMALLINT )
BEGIN
  DECLARE thisleft, thisright SMALLINT DEFAULT 0;
  SELECT leftedge, rightedge
    INTO thisleft, thisright
  FROM nestedsettree
  WHERE nodeID = node;
  IF mode = 'PROMOTE' THEN
    BEGIN                                                         -- Ian Holsman found these two bugs
      DELETE FROM nestedsettree
      WHERE nodeID = node;
      UPDATE nestedsettree
        SET leftedge = leftedge - 1, rightedge = rightedge - 1    -- rather than = thisleft
      WHERE leftedge BETWEEN thisleft AND thisright;
      UPDATE nestedsettree
        SET rightedge = rightedge - 2
      WHERE rightedge > thisright;
      UPDATE nestedsettree
        SET leftedge = leftedge - 2
      WHERE leftedge > thisright;                                 -- rather than > thisleft
    END;
  ELSEIF mode = 'REPLACE' THEN
    BEGIN
      UPDATE nestedsettree
        SET leftedge = thisleft - 1, rightedge = thisright
      WHERE leftedge = thisleft + 1;
      UPDATE nestedsettree
        SET rightedge = rightedge - 2
      WHERE rightedge > thisleft;
      UPDATE nestedsettree
        SET leftedge = leftedge - 2
      WHERE leftedge > thisleft;
      DELETE FROM nestedsettree
      WHERE nodeID = node;
    END;
  END IF;
END;
|
DELIMITER ;
Nested set model summary

Given the concurrency nightmare which nested sets impose for inserts and deletions, it is reasonable to reserve the nested set model for fairly static trees whose users are mostly interested in querying subtrees. You could think of the nested set model as an OLAP tool: maintain an OLTP tree in an edge-adjacency list representation, and build a nested sets OLAP table when the report is needed.

If you will be using the nested sets model, you may be converting back and forth with edge list models, so here is a simple query which generates an edge list from a nested set tree:

Listing 19
SELECT
  p.nodeID AS parentID,
  c.nodeID AS childID
FROM nestedsettree AS p
  INNER JOIN nestedsettree AS c
  ON p.leftedge = (SELECT MAX(s.leftedge)
                   FROM nestedsettree AS s
                   WHERE c.leftedge > s.leftedge
                     AND c.leftedge < s.rightedge)
ORDER BY p.nodeID;

Edge list model of a non-tree graph

Many graphs are not trees. Imagine a small airline which has just acquired licences for flights no longer than 6,000 km between Los Angeles (LAX), New York (JFK), Heathrow in London, Charles de Gaulle in Paris, Amsterdam-Schiphol, Arlanda in Sweden, and Helsinki-Vantaa. You have been asked to compute the shortest possible one-way routes that do not deviate more than 90° from the direction of the first hop—roughly, one-way routes and no circuits.

Airports are nodes, flights are edges, routes are paths. We will need three tables.

Airports (nodes)

To identify an airport we need its code, location name, latitude and longitude. Latitude and longitude are usually given as degrees, minutes and seconds, north or south of the equator, east or west of Greenwich. To hide details that aren't directly relevant to nodes and edges, code latitude and longitude as simple reals where longitudes west of Greenwich and latitudes south of the equator are negative, whilst longitudes east of Greenwich and latitudes north of the equator are positive:

Listing 20
CREATE TABLE airports (
  code char(3) NOT NULL,
  city char(100) default NULL,
  latitude float NOT NULL,
  longitude float NOT NULL,
  PRIMARY KEY (code)
) ENGINE=MyISAM;

INSERT INTO airports VALUES ('JFK', 'New York, NY', 40.75, -73.97);
INSERT INTO airports VALUES ('LAX', 'Los Angeles, CA', 34.05, -118.22);
INSERT INTO airports VALUES ('LHR', 'London, England', 51.5, -0.45);
INSERT INTO airports VALUES ('HEL', 'Helsinki, Finland', 60.17, 24.97);
INSERT INTO airports VALUES ('CDG', 'Paris, France', 48.86, 2.33);
INSERT INTO airports VALUES ('STL', 'St Louis, MO', 38.63, -90.2);
INSERT INTO airports VALUES ('ARN', 'Stockholm, Sweden', 59.33, 18.05);
Flights (edges)

The model attaches two weights to flights: distance and direction.

We need a method of calculating the Great Circle Distance—the geographical distance between any two cities - another natural job for a stored function. The distance calculation

  • converts to radians the degree coordinates of any two points on the earth's surface,
  • calculates the angle of the arc subtended by the two points, and
  • converts the result, also in radians, to surface (circumferential) kilometres (1 radian=6,378.388 km).
Listing 21
SET GLOBAL log_bin_trust_function_creators=TRUE;   -- since 5.0.16
DROP FUNCTION IF EXISTS GeoDistKM;
DELIMITER |
CREATE FUNCTION GeoDistKM( lat1 FLOAT, lon1 FLOAT, lat2 FLOAT, lon2 FLOAT ) RETURNS float
BEGIN
  DECLARE pi, q1, q2, q3 FLOAT;
  SET pi = PI();
  SET lat1 = lat1 * pi / 180;
  SET lon1 = lon1 * pi / 180;
  SET lat2 = lat2 * pi / 180;
  SET lon2 = lon2 * pi / 180;
  SET q1 = COS(lon1-lon2);
  SET q2 = COS(lat1-lat2);
  SET q3 = COS(lat1+lat2);
  SET rads = ACOS( 0.5*((1.0+q1)*q2 - (1.0-q1)*q3) );
  RETURN 6378.388 * rads;
END;
|
DELIMITER ;

That takes care of flight distances. Flight direction is, approximately, the arctangent (ATAN) of the difference between flights.depart and flights.arrive latitudes and longitudes. Now we can seed the airline's flights table with one-hop flights up to 6,000 km long:

Listing 22
CREATE TABLE flights (
  id INT PRIMARY KEY AUTO_INCREMENT,
  depart CHAR(3),
  arrive CHAR(3),
  distance DECIMAL(10,2),
  direction DECIMAL(10,2)
) ENGINE=MYISAM;

INSERT INTO flights
  SELECT
  NULL,
  depart.code,
  arrive.code,
  ROUND(GeoDistKM(depart.latitude,depart.longitude,arrive.latitude,arrive.longitude),2),
  ROUND(DEGREES(ATAN(arrive.latitude-depart.latitude,arrive.longitude-depart.longitude)),2)
  FROM airports AS depart
  INNER JOIN airports AS arrive ON depart.code <> arrive.code
  HAVING Km <= 6000;

SELECT * FROM flights;
+----+--------+--------+----------+-----------+
| id | depart | arrive | distance | direction |
+----+--------+--------+----------+-----------+
|  1 | LAX    | JFK    | 3941.18  | 8.61      |
|  2 | LHR    | JFK    | 5550.77  | -171.68   |
|  3 | CDG    | JFK    | 5837.46  | -173.93   |
|  4 | STL    | JFK    | 1408.11  | 7.44      |
|  5 | JFK    | LAX    | 3941.18  | -171.39   |
|  6 | STL    | LAX    | 2553.37  | -170.72   |
|  7 | JFK    | LHR    | 5550.77  | 8.32      |
|  8 | HEL    | LHR    | 1841.91  | -161.17   |
|  9 | CDG    | LHR    | 354.41   | 136.48    |
| 10 | ARN    | LHR    | 1450.12  | -157.06   |
| 11 | LHR    | HEL    | 1841.91  | 18.83     |
| 12 | CDG    | HEL    | 1912.96  | 26.54     |
| 13 | ARN    | HEL    | 398.99   | 6.92      |
| 14 | JFK    | CDG    | 5837.46  | 6.07      |
| 15 | LHR    | CDG    | 354.41   | -43.52    |
| 16 | HEL    | CDG    | 1912.96  | -153.46   |
| 17 | ARN    | CDG    | 1545.23  | -146.34   |
| 18 | JFK    | STL    | 1408.11  | -172.56   |
| 19 | LAX    | STL    | 2553.37  | 9.28      |
| 20 | LHR    | ARN    | 1450.12  | 22.94     |
| 21 | HEL    | ARN    | 398.99   | -173.08   |
| 22 | CDG    | ARN    | 1545.23  | 33.66     |
+----+--------+--------+----------+-----------+

The distances agree approximately with public information sources for flight lengths. For a pair of airports A and B not very near the poles, the error in calculating direction using ATAN(), is small. To remove that error, instead of ATAN() use a formula from spherical trigonometry (for example one of the formulas at http://www.dynagen.co.za/eugene/where/formula.html).

Routes (paths)

A route is a path along one or more of these edges, so flights:routes is a 1:many relationship. For simplicity, though, we denormalise our representation of routes with a variation of the materialised path model to store all the hops of one route as a list of flights in one routes column. The column routes.route is the sequence of airports, from first departure to final arrival, the column routes.hops is the number of hops in that route, and the column routes.direction is the direction:

Listing 23
CREATE TABLE routes (
  id INT PRIMARY KEY AUTO_INCREMENT,
  depart CHAR(3),
  arrive CHAR(3),
  hops SMALLINT,
  route CHAR(50),
  distance DECIMAL(10,2),
  direction DECIMAL(10,2)
) ENGINE=MYISAM;

Starting with an empty routes table, how do we populate it with the shortest routes between all ordered pairs of airports?

  1. Insert all 1-hop flights from the flights table.
  2. Add in the set of shortest multi-hop routes for all pairs of airports which don't have rows in the flights table.

For 1-hop flights we just write

Listing 24
INSERT INTO routes
  SELECT
    NULL,
    depart,
    arrive,
    1,
    CONCAT(depart,',',arrive),
    distance,
    direction
  FROM flights;

NULL being the placeholder for the auto-incrementing id column.

For multi-hop routes, we iteratively add in sets of all allowed 2-hop, 3-hop, ... n-hop routes, replacing longer routes by shorter routes as we find them, until there is nothing more to add or replace. That also breaks down to two logical steps: add hops to build the set of next allowed routes, and update longer routes with shorter ones.

Next allowed routes

The set of next allowed routes is the set of shortest routes that can be built by adding, to existing routes, flights which leave from the last arrival airport of an existing route, which arrive at an airport which is not yet in the given route, and which stay within ± 90° of the route's initial compass direction. That is, every new route is a JOIN between routes and flights in which

  • depart = routes.depart,
  • arrive = flights.arrive,
  • flights.depart = routes.arrive,
  • distance = MIN(routes.distance + flights.distance),
  • LOCATE( flights.arrive,routes.route) = 0,
  • flights.direction+360 > routes.direction+270 AND flights.direction+360 < routes.direction+450

This is a natural logical unit of work for a VIEW:

Listing 25
CREATE OR REPLACE VIEW nextroutes AS
  SELECT
    routes.depart,
    flights.arrive,
    routes.hops+1 AS hops,
    CONCAT(routes.route, ',', flights.arrive) AS route,
    MIN(routes.distance + flights.distance) AS distance,
    routes.direction
  FROM routes INNER JOIN flights
    ON routes.arrive = flights.depart
    AND LOCATE(flights.arrive,routes.route) = 0
  WHERE flights.direction+360>routes.direction+270 
    AND flights.direction+360<routes.direction+450
  GROUP BY depart,arrive;

How to add these new hops to routes? In standard SQL, this variant on a query by Scott Stephens should do it...

Listing 26
INSERT INTO routes
  SELECT NULL,depart,arrive,hops,route,distance,direction FROM nextroutes
  WHERE (nextroutes.depart,nextroutes.arrive) NOT IN (
    SELECT depart,arrive FROM routes
  );

but MySQL does not yet support subqueries on the table being updated. We have to use a subquery-less (and faster) version of that logic:

Listing 27
INSERT INTO routes
  SELECT
    NULL,
    nextroutes.depart,
    nextroutes.arrive,
    nextroutes.hops,
    nextroutes.route,
    nextroutes.distance,
    nextroutes.direction
  FROM nextroutes
  LEFT JOIN routes ON nextroutes.depart = routes.depart
        AND nextroutes.arrive = routes.arrive
  WHERE routes.depart IS NULL AND routes.arrive IS NULL;

Running that code right after the initial seeding from flights gives ...

SELECT * FROM routes;
+----+--------+--------+------+-------------+----------+-----------+
| id | depart | arrive | hops | route       | distance | direction |
+----+--------+--------+------+-------------+----------+-----------+
|  1 | LAX    | JFK    |    1 | LAX,JFK     | 3941.18  | 8.61      |
|  2 | LHR    | JFK    |    1 | LHR,JFK     | 5550.77  | -171.68   |
|  3 | CDG    | JFK    |    1 | CDG,JFK     | 5837.46  | -173.93   |
|  4 | STL    | JFK    |    1 | STL,JFK     | 1408.11  | 7.44      |
|  5 | JFK    | LAX    |    1 | JFK,LAX     | 3941.18  | -171.39   |
|  6 | STL    | LAX    |    1 | STL,LAX     | 2553.37  | -170.72   |
|  7 | JFK    | LHR    |    1 | JFK,LHR     | 5550.77  | 8.32      |
|  8 | HEL    | LHR    |    1 | HEL,LHR     | 1841.91  | -161.17   |
|  9 | CDG    | LHR    |    1 | CDG,LHR     | 354.41   | 136.48    |
| 10 | ARN    | LHR    |    1 | ARN,LHR     | 1450.12  | -157.06   |
| 11 | LHR    | HEL    |    1 | LHR,HEL     | 1841.91  | 18.83     |
| 12 | CDG    | HEL    |    1 | CDG,HEL     | 1912.96  | 26.54     |
| 13 | ARN    | HEL    |    1 | ARN,HEL     | 398.99   | 6.92      |
| 14 | JFK    | CDG    |    1 | JFK,CDG     | 5837.46  | 6.07      |
| 15 | LHR    | CDG    |    1 | LHR,CDG     | 354.41   | -43.52    |
| 16 | HEL    | CDG    |    1 | HEL,CDG     | 1912.96  | -153.46   |
| 17 | ARN    | CDG    |    1 | ARN,CDG     | 1545.23  | -146.34   |
| 18 | JFK    | STL    |    1 | JFK,STL     | 1408.11  | -172.56   |
| 19 | LAX    | STL    |    1 | LAX,STL     | 2553.37  | 9.28      |
| 20 | LHR    | ARN    |    1 | LHR,ARN     | 1450.12  | 22.94     |
| 21 | HEL    | ARN    |    1 | HEL,ARN     | 398.99   | -173.08   |
| 22 | CDG    | ARN    |    1 | CDG,ARN     | 1545.23  | 33.66     |
| 23 | ARN    | JFK    |    2 | ARN,LHR,JFK | 7000.89  | -157.06   |
| 24 | CDG    | LAX    |    2 | CDG,JFK,LAX | 9778.64  | -173.93   |
| 25 | CDG    | STL    |    2 | CDG,JFK,STL | 7245.57  | -173.93   |
| 26 | HEL    | JFK    |    2 | HEL,LHR,JFK | 7392.68  | -161.17   |
| 27 | JFK    | ARN    |    2 | JFK,LHR,ARN | 7000.89  | 8.32      |
| 28 | JFK    | HEL    |    2 | JFK,LHR,HEL | 7392.68  | 8.32      |
| 29 | LAX    | CDG    |    2 | LAX,JFK,CDG | 9778.64  | 8.61      |
| 30 | LAX    | LHR    |    2 | LAX,JFK,LHR | 9491.95  | 8.61      |
| 31 | LHR    | LAX    |    2 | LHR,JFK,LAX | 9491.95  | -171.68   |
| 32 | LHR    | STL    |    2 | LHR,JFK,STL | 6958.88  | -171.68   |
| 33 | STL    | CDG    |    2 | STL,JFK,CDG | 7245.57  | 7.44      |
| 34 | STL    | LHR    |    2 | STL,JFK,LHR | 6958.88  | 7.44      |
+----+--------+--------+------+-------------+----------+-----------+

... adding 12 two-hop rows.

Replace longer routes with shorter ones

As we build routes with more hops, it is logically possible that the nextroutes view will find shorter routes for an existing routes pair of depart and arrive. Standard SQL for replacing existing routes rows with nextroutes rows which match (depart, arrive) and have shorter distance values would be:

Listing 28
UPDATE routes SET (hops,route,distance,direction) = (
  SELECT hops, route, distance, direction
  FROM nextroutes
  WHERE nextroutes.depart = routes.depart AND nextroutes.arrive = routes.arrive
)
WHERE (depart,arrive) IN (
  SELECT depart,arrive FROM nextroutes
  WHERE nextroutes.distance < routes.distance
);

but MySQL does not support SET(col1,...) syntax, and as with Listing 7, MySQL does not yet accept subqueries referencing the table being updated, so we have to write more literal SQL:

Listing 29
UPDATE routes, nextroutes
SET
  routes.hops=nextroutes.hops,
  routes.route=nextroutes.route,
  routes.distance=nextroutes.distance,
  routes.direction=nextroutes.direction
WHERE routes.arrive=nextroutes.arrive
  AND routes.depart=nextroutes.depart
  AND nextroutes.distance < routes.distance;

Running this code right after the first run of Listing 27 updates no rows. To test the logic of iteration, continue running Listings 27 and 29 until no rows are being added or changed. The final result is:

SELECT * FROM ROUTES;
+----+--------+--------+------+-----------------+----------+-----------+
| id | depart | arrive | hops | route           | distance | direction |
+----+--------+--------+------+-----------------+----------+-----------+
|  1 | LAX    | JFK    |    1 | LAX,JFK         | 3941.18  | 8.61      |
|  2 | LHR    | JFK    |    1 | LHR,JFK         | 5550.77  | -171.68   |
|  3 | CDG    | JFK    |    1 | CDG,JFK         | 5837.46  | -173.93   |
|  4 | STL    | JFK    |    1 | STL,JFK         | 1408.11  | 7.44      |
|  5 | JFK    | LAX    |    1 | JFK,LAX         | 3941.18  | -171.39   |
|  6 | STL    | LAX    |    1 | STL,LAX         | 2553.37  | -170.72   |
|  7 | JFK    | LHR    |    1 | JFK,LHR         | 5550.77  | 8.32      |
|  8 | HEL    | LHR    |    1 | HEL,LHR         | 1841.91  | -161.17   |
|  9 | CDG    | LHR    |    1 | CDG,LHR         | 354.41   | 136.48    |
| 10 | ARN    | LHR    |    1 | ARN,LHR         | 1450.12  | -157.06   |
| 11 | LHR    | HEL    |    1 | LHR,HEL         | 1841.91  | 18.83     |
| 12 | CDG    | HEL    |    1 | CDG,HEL         | 1912.96  | 26.54     |
| 13 | ARN    | HEL    |    1 | ARN,HEL         | 398.99   | 6.92      |
| 14 | JFK    | CDG    |    1 | JFK,CDG         | 5837.46  | 6.07      |
| 15 | LHR    | CDG    |    1 | LHR,CDG         | 354.41   | -43.52    |
| 16 | HEL    | CDG    |    1 | HEL,CDG         | 1912.96  | -153.46   |
| 17 | ARN    | CDG    |    1 | ARN,CDG         | 1545.23  | -146.34   |
| 18 | JFK    | STL    |    1 | JFK,STL         | 1408.11  | -172.56   |
| 19 | LAX    | STL    |    1 | LAX,STL         | 2553.37  | 9.28      |
| 20 | LHR    | ARN    |    1 | LHR,ARN         | 1450.12  | 22.94     |
| 21 | HEL    | ARN    |    1 | HEL,ARN         | 398.99   | -173.08   |
| 22 | CDG    | ARN    |    1 | CDG,ARN         | 1545.23  | 33.66     |
| 23 | ARN    | JFK    |    2 | ARN,LHR,JFK     | 7000.89  | -157.06   |
| 24 | CDG    | LAX    |    2 | CDG,JFK,LAX     | 9778.64  | -173.93   |
| 25 | CDG    | STL    |    2 | CDG,JFK,STL     | 7245.57  | -173.93   |
| 26 | HEL    | JFK    |    2 | HEL,LHR,JFK     | 7392.68  | -161.17   |
| 27 | JFK    | ARN    |    2 | JFK,LHR,ARN     | 7000.89  | 8.32      |
| 28 | JFK    | HEL    |    2 | JFK,LHR,HEL     | 7392.68  | 8.32      |
| 29 | LAX    | CDG    |    2 | LAX,JFK,CDG     | 9778.64  | 8.61      |
| 30 | LAX    | LHR    |    2 | LAX,JFK,LHR     | 9491.95  | 8.61      |
| 31 | LHR    | LAX    |    2 | LHR,JFK,LAX     | 9491.95  | -171.68   |
| 32 | LHR    | STL    |    2 | LHR,JFK,STL     | 6958.88  | -171.68   |
| 33 | STL    | CDG    |    2 | STL,JFK,CDG     | 7245.57  | 7.44      |
| 34 | STL    | LHR    |    2 | STL,JFK,LHR     | 6958.88  | 7.44      |
| 35 | ARN    | LAX    |    3 | ARN,LHR,JFK,LAX | 10942.07 | -157.06   |
| 36 | ARN    | STL    |    3 | ARN,LHR,JFK,STL | 8409.00  | -157.06   |
| 37 | HEL    | LAX    |    3 | HEL,LHR,JFK,LAX | 11333.86 | -161.17   |
| 38 | HEL    | STL    |    3 | HEL,LHR,JFK,STL | 8800.79  | -161.17   |
| 39 | LAX    | ARN    |    3 | LAX,JFK,CDG,ARN | 10942.07 | 8.61      |
| 40 | LAX    | HEL    |    3 | LAX,JFK,CDG,HEL | 11333.86 | 8.61      |
| 41 | STL    | ARN    |    3 | STL,JFK,CDG,ARN | 8409.00  | 7.44      |
| 42 | STL    | HEL    |    3 | STL,JFK,CDG,HEL | 8800.79  | 7.44      |
+----+--------+--------+------+-----------------+----------+-----------+

All that's left to do is to assemble the code in a stored procedure:

Listing 30
DROP PROCEDURE IF EXISTS BuildRoutes;
DELIMITER |
CREATE PROCEDURE BuildRoutes()
BEGIN
  DECLARE rows INT DEFAULT 0;
  TRUNCATE routes;

  -- STEP 1, LISTING 24: SEED ROUTES WITH 1-HOP FLIGHTS
  INSERT INTO routes
    SELECT
      NULL,
      depart,
      arrive,
      1,
      CONCAT(depart,',',arrive),
      distance,
      direction
  FROM flights;
  SET rows = ROW_COUNT();

  WHILE (rows > 0) DO

    -- STEP 2, LISTINGS 25, 27: ADD NEXT SET OF ROUTES
    INSERT INTO routes
      SELECT
        NULL,
        nextroutes.depart,
        nextroutes.arrive,
        nextroutes.hops,
        nextroutes.route,
        nextroutes.distance,
        nextroutes.direction
      FROM nextroutes
      LEFT JOIN routes ON nextroutes.depart = routes.depart
            AND nextroutes.arrive = routes.arrive
      WHERE routes.depart IS NULL AND routes.arrive IS NULL;
    SET rows = ROW_COUNT();

    -- STEP 3, LISTING 29: UPDATE WITH SHORTER nextroutes ROUTES IF ANY
    UPDATE routes,nextroutes SET
      routes.hops=nextroutes.hops,
      routes.route=nextroutes.route,
      routes.distance=nextroutes.distance,
      routes.direction=nextroutes.direction
    WHERE routes.arrive=nextroutes.arrive
      AND routes.depart=nextroutes.depart
      AND nextroutes.distance < routes.distance;
    SET rows = rows + ROW_COUNT();

  END WHILE;

END;
|
DELIMITER ;

If you are running MySQL 5.0.6 or 5.0.7, BuildRoutes() fails to insert four rows:

+--------+--------+-----------------+------+----------+-----------+
| depart | arrive | route           | hops | distance | direction |
+--------+--------+-----------------+------+----------+-----------+
| ARN    | LAX    | ARN,LHR,JFK,LAX |    3 | 10942.07 | -157.06   |
| ARN    | STL    | ARN,LHR,JFK,STL |    3 | 8409.00  | -157.06   |
| HEL    | LAX    | HEL,LHR,JFK,LAX |    3 | 11333.86 | -161.17   |
| HEL    | STL    | HEL,LHR,JFK,STL |    3 | 8800.79  | -161.17   |
+--------+--------+-----------------+------+----------+-----------+

That MySQL bug (#11302) was corrected in 5.0.9.

Route queries

Route queries are straightforward. How do we check that the algorithm produced no duplicate depart-arrive pairs? The following query should yield zero rows,

Listing 31
SELECT depart, arrive, COUNT(*)
FROM routes
GROUP BY depart,arrive
HAVING COUNT(*) > 1;

and does. Reachability queries are just as simple, for example where can we fly to from Helsinki?

Listing 32
SELECT *
FROM routes
WHERE depart='HEL'
ORDER BY distance;
+----+--------+--------+------+-----------------+----------+-----------+
| id | depart | arrive | hops | route           | distance | direction |
+----+--------+--------+------+-----------------+----------+-----------+
| 21 | HEL    | ARN    |    1 | HEL,ARN         | 398.99   | -173.08   |
|  8 | HEL    | LHR    |    1 | HEL,LHR         | 1841.91  | -161.17   |
| 16 | HEL    | CDG    |    1 | HEL,CDG         | 1912.96  | -153.46   |
| 26 | HEL    | JFK    |    2 | HEL,LHR,JFK     | 7392.68  | -161.17   |
| 38 | HEL    | STL    |    3 | HEL,LHR,JFK,STL | 8800.79  | -161.17   |
| 37 | HEL    | LAX    |    3 | HEL,LHR,JFK,LAX | 11333.86 | -161.17   |
+----+--------+--------+------+-----------------+----------+-----------+

An extended edge list model is simple to implement, gracefully accepts extended attributes for nodes, edge and paths, does not unduly penalise updates, and responds to queries with reasonable speed.

Parts explosions

A bill of materials for a house would include the cement block, lumber, shingles, doors, wallboard, windows, plumbing, electrical system, heating system, and so on. Each subassembly also has a bill of materials; the heating system has a furnace, ducts, and so on. A bill of materials implosion links component pieces to a major assembly. A bill of materials explosion breaks apart assemblies and subassemblies into their component parts.

Which graph model best handles a parts explosion? Combining edge list and "nested sets" algorithms provides a clean solution.

Imagine a new company that plans to make variously sized bookcases, either packaged as do-it-yourself kits of, or assembled from sides, shelves, shelf brackets, backboards, feet and screws. Shelves and sides are cut from planks. Backboards are trimmed from laminated sheeting. Feet are machine-carved from readycut blocks. Screws and shelf brackets are purchased in bulk. Here are the elements of one bookcase:

  1 backboard, 2 x 1 m
    1 laminate
    8 screws
  2 sides 2m x 30 cm
    1 plank length 4m
    12 screws
  8 shelves 1 m x 30 cm (incl. top and bottom)
    2 planks
    24 shelf brackets
  4 feet 4cm x 4cm
    4 cubes
    16 screws

which may be packaged in a box for sale at one price, or assembled as a finished product at a different price. At any time we need to be able to answer questions like

  • Do we have enough parts to make the bookcases on order?
  • What assemblies or packages would be most profitable to make given the current inventory?

There is no reason to break the normalising rule that item detail belongs in a nodes table, and graph logic belongs in an edges table. Edges also require a quantity attribute, for example a shelf includes four shelf brackets. Nodes and edges may also have costs and prices:

  • item purchase cost,
  • item assembly cost,
  • assembly cost,
  • assembly selling price.

In many parts problems like this one, items occur in multiple assemblies and subassemblies. The graph is not a tree. Also, it is often desirable to model multiple graphs without the table glut that would arise from giving each graph its own edges table. A simple way to solve this problem is to represent multiple graphs (assemblies) in the edges table by giving every row not only childID and parentID pointers, but a pointer which identifies the root itemID of the graph to which the row belongs.

So the data model is just two tables, for items (nodes) and for product graphs or assemblies (edges). Assume that the company begins with a plan to sell the 2m x 1m bookcase in two forms, assembled and kit, and that the purchasing department has bought quantities of raw materials (laminate, planks, shelf supports, screws, wood cubes, boxes). Here are the nodes (items) and edges (assemblies):

Listing 33
CREATE TABLE items (
  itemID INT PRIMARY KEY AUTO_INCREMENT,
  name CHAR(20) NOT NULL,
  onhand INT NOT NULL DEFAULT 0,
  reserved INT NOT NULL DEFAULT 0,
  purchasecost DECIMAL(10,2) NOT NULL DEFAULT 0,
  assemblycost DECIMAL(10,2) NOT NULL DEFAULT 0,
  price DECIMAL(10,2) NOT NULL DEFAULT 0
);
CREATE TABLE assemblies (
  assemblyID INT NOT NULL,
  assemblyroot INT NOT NULL,
  childID INT NOT NULL,
  parentID INT NOT NULL,
  quantity DECIMAL(10,2) NOT NULL,
  assemblycost DECIMAL(10,2) NOT NULL,
  PRIMARY KEY(assemblyID,childID,parentID)
);
INSERT INTO items VALUES    -- inventory
  (1,'laminate',40,0,4,0,8),
  (2,'screw',1000,0,0.1,0,.2),
  (3,'plank',200,0,10,0,20),
  (4,'shelf bracket',400,0,0.20,0,.4),
  (5,'wood cube',100,0,0.5,0,1),
  (6,'box',40,0,1,0,2),
  (7,'backboard',0,0,0,3,0),
  (8,'side',0,0,0,8,0),
  (9,'shelf',0,0,0,4,0),
  (10,'foot',0,0,0,1,0),
  (11,'bookcase2x30',0,0,0,10,0),
  (12,'bookcase2x30 kit',0,0,0,2,0);
INSERT INTO assemblies VALUES
  (1,11,1,7,1,0),      -- laminate to backboard
  (2,11,2,7,8,0),      -- screws to backboard
  (3,11,3,8,.5,0),     -- planks to side
  (4,11,2,8,6,0),      -- screws to side
  (5,11,3,9,0.25,0),   -- planks to shelf
  (6,11,4,9,4,0),      -- shelf brackets to shelf
  (7,11,5,10,1,0),     -- wood cubes to foot
  (8,11,2,10,1,0),     -- screws to foot
  (9,11,7,11,1,0),     -- backboard to bookcase
  (10,11,8,11,2,0),    -- sides to bookcase
  (11,11,9,11,8,0),    -- shelves to bookcase
  (12,11,10,11,4,0),   -- feet to bookcase
  (13,12,1,7,1,0),     -- laminate to backboard
  (14,12,2,7,8,0),     -- screws to backboard
  (15,12,3,8,0.5,0),   -- planks to side
  (16,12,2,8,6,0),     -- screws to sides
  (17,12,3,9,0.25,0),  -- planks to shelf
  (18,12,4,9,4,0),     -- shelf brackets to shelves
  (19,12,5,10,1,0),    -- wood cubes to foot
  (20,12,2,10,1,0),    -- screws to foot
  (21,12,7,12,1,0),    -- backboard to bookcase kit
  (22,12,8,12,2,0),    -- sides to bookcase kit
  (23,12,9,12,8,0),    -- shelves to bookcase kit
  (24,12,10,12,4,0),   -- feet to bookcase kit
  (25,12,6,12,1,0);    -- container box to bookcase kit

Now, we want a parts list, a bill of materials, which will list show parent-child relationships and quantities, and sum the costs. Could we adapt the depth-first "nested sets" treewalk algorithm (Listing 10) to this problem even though our graph is not a tree and our sets are not properly nested? Yes indeed. We just have to modify the treewalk to handle multiple parent nodes for any child node, and add code to percolate costs and quantities up the graph. Navigation remains simple using leftedge and rightedge values. This is just the sort of problem the Celko algorithm is good for: reporting!

Listing 34
DROP PROCEDURE IF EXISTS ShowBOM;
DELIMITER |
CREATE PROCEDURE ShowBOM( IN root INT )
BEGIN
  DECLARE thischild, thisparent, rows, maxrightedge INT DEFAULT 0;
  DECLARE thislevel, nextedgenum INT DEFAULT 1;
  DECLARE thisqty, thiscost DECIMAL(10,2);

  -- Create and seed intermediate table:
  DROP TABLE IF EXISTS edges;
  CREATE TABLE edges (
    childID smallint NOT NULL,
    parentID smallint NOT NULL,
    PRIMARY KEY (childID, parentID)
  ) ENGINE=HEAP;
  INSERT INTO edges
    SELECT childID,parentID
    FROM assemblies
    WHERE assemblyRoot = root;
  SET maxrightedge = 2 * (1 + (SELECT COUNT(*) FROM edges));
  -- Create and seed result table:
  DROP TABLE IF EXISTS bom;
  CREATE TABLE bom (
    level SMALLINT,
    nodeID SMALLINT,
    parentID SMALLINT,
    qty DECIMAL(10,2),
    cost DECIMAL(10,2),
    leftedge SMALLINT,
    rightedge SMALLINT
  ) ENGINE=HEAP;
  INSERT INTO bom
    VALUES( thislevel, root, 0, 0, 0, nextedgenum, maxrightedge );
  SET nextedgenum = nextedgenum + 1;
  WHILE nextedgenum < maxrightedge DO
    -- How many children of this node remain in the edges table?
    SET rows = (
      SELECT COUNT(*)
      FROM bom AS s
      INNER JOIN edges AS t ON s.nodeID=t.parentID AND s.level=thislevel
    );
    IF rows > 0 THEN
      -- There is at least one child edge.
      -- Compute qty and cost, insert into bom, delete from edges.
      BEGIN
        -- Alas MySQL nulls MIN(t.childid) when we combine the next two queries
        SET thischild = (
          SELECT MIN(t.childID)
          FROM bom AS s
          INNER JOIN edges AS t ON s.nodeID=t.parentID AND s.level=thislevel
        );
        SET thisparent = (
          SELECT DISTINCT t.parentID
          FROM bom AS s
          INNER JOIN edges AS t ON s.nodeID=t.parentID AND s.level=thislevel
        );
        SET thisqty = (
          SELECT quantity FROM assemblies
          WHERE assemblyroot = root
            AND childID = thischild
            AND parentID = thisparent
        );
        SET thiscost = (
          SELECT a.assemblycost + (thisqty * (i.purchasecost + i.assemblycost ))
          FROM assemblies AS a
          INNER JOIN items AS i ON a.childID = i.itemID
          WHERE assemblyroot = root
            AND a.parentID = thisparent
            AND a.childID = thischild
        );
        INSERT INTO bom
          VALUES(thislevel+1, thischild, thisparent, thisqty, thiscost, nextedgenum, NULL);
        DELETE FROM edges
        WHERE childID = thischild AND parentID=thisparent;
        SET thislevel = thislevel + 1;
        SET nextedgenum = nextedgenum + 1;
      END;
    ELSE
      BEGIN
        -- Set rightedge, remove item from edges
        UPDATE bom
        SET rightedge=nextedgenum, level = -level
        WHERE level = thislevel;
        SET thislevel = thislevel - 1;
        SET nextedgenum = nextedgenum + 1;
      END;
    END IF;
  END WHILE;
  SET rows := ( SELECT COUNT(*) FROM edges );
  IF rows > 0 THEN
    SELECT 'Orphaned rows remain';
  ELSE
    -- Total
    SET thiscost = (SELECT SUM(qty*cost) FROM bom);
    UPDATE bom
    SET qty = 1, cost = thiscost
    WHERE nodeID = root;
    -- Show the result
    SELECT
      CONCAT(Space(Abs(level)*2), ItemName(nodeid,root)) AS Item,
      ROUND(qty,2) AS Qty,
      ROUND(cost, 2) AS Cost
    FROM bom
    ORDER BY leftedge;
  END IF;
END;
|
DELIMITER ;

-- Function used by ShowBOM() to retrieve bom item names:
DROP FUNCTION IF EXISTS ItemName;
SET GLOBAL log_bin_trust_function_creators=TRUE;
DELIMITER |
CREATE FUNCTION ItemName( id INT, root INT ) RETURNS CHAR(20)
BEGIN
  DECLARE s CHAR(20) DEFAULT '';
  SELECT name INTO s FROM items WHERE itemid=id;
  RETURN IF( id = root, UCASE(s), s );
END;
|
DELIMITER ;
CALL ShowBOM(11);
+---------------------+------+--------+

| Item                | Qty  | Cost   |

+---------------------+------+--------+

|   BOOKCASE2X30      |  1.0 | 327.93 |

|     backboard       |  1.0 |   3.00 |

|       laminate      |  1.0 |   4.00 |

|       screw         |  8.0 |   0.80 |

|     side            |  2.0 |  16.00 |

|       screw         |  6.0 |   0.60 |

|       plank         |  0.5 |   5.00 |

|     shelf           |  8.0 |  32.00 |

|       plank         |  0.3 |   2.50 |

|       shelf bracket |  4.0 |   0.80 |

|     foot            |  4.0 |   4.00 |

|       screw         |  1.0 |   0.10 |

|       wood cube     |  1.0 |   0.50 |

+---------------------+------+--------+


With ShowBOM() in hand, it's easy to compare costs of assemblies and subassemblies. By adding price columns, we can do the same for prices and profit margins. And now that MySQL has re-enabled prepared statements in stored procedures, it will be relatively easy to write a more general version of ShowBOM(). We leave that to you.

Shorter and sweeter

But ShowBOM() is not the small, efficient bit of nested sets reporting code we were hoping for. There is a simpler solution: hide graph cycles from the edges table by making them references to rows in a nodes table, so we can treat the edges table like a tree; then apply a breadth-first edge-list subtree algorithm to generate the Bill of Materials. Again assume a cabinetmaking company making bookcases (with a different costing model). For clarity, skip inventory tracking for now. An items table ww_nodes tracks purchased and assembled bookcase elements with their individual costs, and an assemblies/edges ww_edges table tracks sets of edges that combine to make products.

Listing 35: DDL for a simpler parts explosion
DROP TABLE IF EXISTS ww_nodes;
CREATE TABLE ww_nodes (
  nodeID int,
  description CHAR(50),
  cost decimal(10,2)
);
INSERT INTO ww_nodes VALUES (1,'finished bookcase',10);
INSERT INTO ww_nodes VALUES (2,'backboard2x1',1);
INSERT INTO ww_nodes VALUES (3,'laminate2x1',8);
INSERT INTO ww_nodes VALUES (4,'screw',.10);
INSERT INTO ww_nodes VALUES (5,'side',4);
INSERT INTO ww_nodes VALUES (6,'plank',20);
INSERT INTO ww_nodes VALUES (7,'shelf',4);
INSERT INTO ww_nodes VALUES (8,'shelf bracket',.5);
INSERT INTO ww_nodes VALUES (9,'feet',1);
INSERT INTO ww_nodes VALUES (10,'cube4cmx4cm',1);
INSERT INTO ww_nodes VALUES (11,'bookcase kit',2);
INSERT INTO ww_nodes VALUES (12,'carton',1);
 
DROP TABLE IF EXISTS ww_edges;
CREATE TABLE ww_edges (
  rootID INT,
  nodeID int,
  parentnodeID int,
  qty decimal(10,2)
);
INSERT INTO ww_edges VALUES (1,1,null,1);
INSERT INTO ww_edges VALUES (1,2,1,1);
INSERT INTO ww_edges VALUES (1,3,2,1);
INSERT INTO ww_edges VALUES (1,4,2,8);
INSERT INTO ww_edges VALUES (1,5,1,2);
INSERT INTO ww_edges VALUES (1,6,5,1);
INSERT INTO ww_edges VALUES (1,4,5,12);
INSERT INTO ww_edges VALUES (1,7,1,8);
INSERT INTO ww_edges VALUES (1,6,7,.5);
INSERT INTO ww_edges VALUES (1,8,7,4);
INSERT INTO ww_edges VALUES (1,9,1,4);
INSERT INTO ww_edges VALUES (1,10,9,1);
INSERT INTO ww_edges VALUES (1,4,9,1);
 
INSERT INTO ww_edges VALUES (11,11,null,1);
INSERT INTO ww_edges VALUES (11,2,11,1);
INSERT INTO ww_edges VALUES (11,3,2,1);
INSERT INTO ww_edges VALUES (11,4,2,8);
INSERT INTO ww_edges VALUES (11,5,11,2);
INSERT INTO ww_edges VALUES (11,6,5,1);
INSERT INTO ww_edges VALUES (11,4,5,12);
INSERT INTO ww_edges VALUES (11,7,11,8);
INSERT INTO ww_edges VALUES (11,6,7,.5);
INSERT INTO ww_edges VALUES (11,8,7,4);
INSERT INTO ww_edges VALUES (11,9,11,4);
INSERT INTO ww_edges VALUES (11,10,9,1);
INSERT INTO ww_edges VALUES (11,4,9,11);
INSERT INTO ww_edges VALUES (11,12,11,1);

Here is an adaptation of the breadth-first edge list algorithm to retrieve a Bill of Materials for a product identified by a rootID:

·  Initialise a level-tracking variable to zero.

·  Seed a temp reporting table with the rootID of the desired product.

·  While rows are being retrieved, increment the level tracking variable and add rows to the temp table whose parentnodeIDs are nodes at the current level.

·   Print the BOM ordered by path with indentation proportional to tree level.

Listing 36: A simpler parts explosion
DROP PROCEDURE IF EXISTS ww_bom;
DELIMITER |
CREATE PROCEDURE ww_bom( root INT )
BEGIN
  DECLARE lev INT DEFAULT 0;
  DECLARE totalcost DECIMAL( 10,2);
  DROP TABLE IF EXISTS temp;
  CREATE TABLE temp                                 -- initialise temp table with root node
  SELECT
    e.nodeID AS nodeID,
    n.description AS Item,
    e.parentnodeID,
    e.qty,
    n.cost AS nodecost,
    e.qty * n.cost AS cost,
    0 as level,                                     -- tree level
    CONCAT(e.nodeID,'') AS path                     -- path to this node as a string
  FROM ww_nodes n
  JOIN ww_edges e USING(nodeID)                     -- root node
  WHERE e.nodeID = root AND e.parentnodeID IS NULL;
  WHILE FOUND_ROWS() > 0 DO 
    BEGIN
      SET lev = lev+1;                              -- increment level
      INSERT INTO temp                              -- add children of this level
      SELECT 
        e.nodeID,
        n.description AS Item,
        e.parentnodeID,
        e.qty,
        n.cost AS nodecost,
        e.qty * n.cost AS cost,
        lev,                                
        CONCAT(t.path,',',e.nodeID)
      FROM ww_nodes n
      JOIN ww_edges e USING(nodeID)
      JOIN temp t ON e.parentnodeID = t.nodeID
      WHERE e.rootID = root AND t.level = lev-1;
    END;
  END WHILE;
  WHILE lev > 0 DO                                  -- percolate costs up the graph
    BEGIN
      SET lev = lev - 1;
      DROP TABLE IF EXISTS tempcost;
      CREATE TABLE tempcost                         -- compute child cost
      SELECT p.nodeID, SUM(c.nodecost*c.qty) AS childcost
      FROM temp p 
      JOIN temp c ON p.nodeid=c.parentnodeid
      WHERE c.level=lev
      GROUP by p.nodeid;
      UPDATE temp JOIN tempcost USING(nodeID)       -- update parent item cost
      SET nodecost = nodecost + tempcost.childcost;
      UPDATE temp SET cost = qty * nodecost         -- update parent cost
      WHERE level=lev-1;
    END;
  END WHILE;
  SELECT                                            -- list BoM
    CONCAT(SPACE(level*2),Item) AS Item,
    ROUND(nodecost,2) AS 'Unit Cost',
    ROUND(Qty,0) AS Qty,ROUND(cost,2) AS Cost FROM temp
  ORDER by path;  
END |
DELIMITER ;
CALL ww_bom( 1 );
+-------------------+-----------+------+--------+
| Item              | Unit Cost | Qty  | Cost   |
+-------------------+-----------+------+--------+
| finished bookcase |    206.60 |  1.0 | 206.60 |
|   backboard2x1    |      9.80 |  1.0 |   9.80 |
|     laminate2x1   |      8.00 |  1.0 |   8.00 |
|     screw         |      0.10 |  8.0 |   0.80 |
|   side            |     25.20 |  2.0 |  50.40 |
|     screw         |      0.10 | 12.0 |   1.20 |
|     plank         |     20.00 |  1.0 |  20.00 |
|   shelf           |     16.00 |  8.0 | 128.00 |
|     plank         |     20.00 |  0.5 |  10.00 |
|     shelf bracket |      0.50 |  4.0 |   2.00 |
|   foot            |      2.10 |  4.0 |   8.40 |
|     cube4cmx4cm   |      1.00 |  1.0 |   1.00 |
|     screw         |      0.10 |  1.0 |   0.10 |
+-------------------+-----------+------+--------+

Summary

Stored procedures, stored functions and Views make it possible to implement edge list graph models, nested sets graph models, and breadth-first and depth-first graph search algorithms in MySQL 5&6.

Further Reading

Celko J, "Trees and Hierarchies in SQL For Smarties", Morgan Kaufman, San Francisco, 2004.

Codersource.net, "Branch and Bound Algorithm in C#", http://www.codersource.net/csharp_branch_and_bound_algorithm_implementation.aspx.

Math Forum, "Euler's Solution: The Degree of a Vertex", http://mathforum.org/isaac/problems/bridges2.html

Muhammad RB, "Trees", http://www.personal.kent.edu/~rmuhamma/GraphTheory/MyGraphTheory/trees.htm.

Mullins C, "The Future of SQL", http://www.craigsmullins.com/idug_sql.htm.

Rodrigue J-P, "Graph Theory: Definition and Properties", http://people.hofstra.edu/geotrans/eng/ch2en/meth2en/ch2m1en.html.

Santry P, "Recursive SQL User Defined Functions", http://www.wwwcoder.com/main/parentid/191/site/1857/68/default.aspx.

Shasha D, Wang JTL, and Giugno R, "Algorithmics and applications of tree and graph searching", In Symposium on Principles of Database Systems, 2002, p 39--52.

Stephens S, "Solving directed graph problems with SQL, Part I", http://builder.com.com/5100-6388_14-5245017.html.

Stephens, S, "Solving directed graph problems with SQL, Part II", http://builder.com.com/5100-6388_14-5253701.html.

Steinbach T, "Migrating Recursive SQL from Oracle to DB2 UDB", http://www-106.ibm.com/developerworks/db2/library/techarticle/0307steinbach/0307steinbach.html.

Tropashko V, "Nested Intervals Tree Encoding in SQL, http://www.sigmod.org/sigmod/record/issues/0506/p47-article-tropashko.pdf

Van Tulder G, "Storing Hierarchical Data in a Database", http://www.sitepoint.com/print/hierarchical-data-database.

Venagalla S, "Expanding Recursive Opportunities with SQL UDFs in DB2 v7.2", http://www-106.ibm.com/developerworks/db2/library/techarticle/0203venigalla/0203venigalla.html.

Wikipedia, "Graph Theory", http://en.wikipedia.org/wiki/Graph_theory.

Willets K, "SQL Graph Algorithms", http://willets.org/sqlgraphs.html.



Last updated 20 Aug 2007

신고
Posted by Tornado tornado
TAG mysql

Sybase 실행계획 보기

SQL 2008.08.07 22:43

set showplan on

하고 쿼리 날리면 끝

신고
Posted by Tornado tornado

[출처] http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc38151_1270/html/iqref/Update.htm

example 3 의 진하게 표시해둔 부분이다.

---------------------------------------------------------------------------

Chapter 6: SQL Statements


UPDATE statement

Description

Modifies existing rows of a single table, or a view that contains only one table.

Syntax

UPDATE table
...  SET [column-name  = expression, ...
...  [ FROM table-expression, ]
...  [ WHERE search-condition ]
...  [ ORDER BY expression [ ASC | DESC ] ,... ]
FROM table-expression
table-expression:
table-spec | table-expression join-type table-spec [ ON condition ] | table-expression, ...

Examples

Example 1

  • Transfers employee Philip Chin (employee 129) from the sales department to the marketing department:

UPDATE employee
SET dept_id = 400
WHERE emp_id = 129 ;

Example 2

  • The Marketing Department (400) increases bonuses from 4% to 6% of each employee’s base salary:

UPDATE employee
SET bonus = base * 6/100
WHERE dept_id=400;

Example 3

  • Each employee gets a pay increase with the department bonus:

UPDATE employee
SET emp.salary = emp.salary + dept.bonus
FROM employee emp, department dept
WHERE emp.deptnum = dept.deptnum;

Example 4

  • Another way to give each employee a pay increase with the department bonus:

UPDATE employee
SET emp.salary = emp.salary + dept.bonus
FROM employee emp JOIN department dept
ON emp.deptnum = dept.deptnum;

Usage

The table on which you use UPDATE may be a base table or a temporary table.

NoteThe base table cannot be part of any join index.

Each named column is set to the value of the expression on the right-hand side of the equal sign. Even column-name can be used in the expression—the old value is used.

The FROM clause can contain multiple tables with join conditions and returns all the columns from all the tables specified and filtered by the join condition and/or WHERE condition.

Using the wrong join condition in a FROM clause causes unpredictable results. If the FROM clause specifies a one-to-many join and the SET clause references a cell from the “many” side of the join, the cell is updated from the first value selected. In other words, if the join condition causes multiple rows of the table to be updated per row ID, the first row returned becomes the update result. For example:

UPDATE T1 
SET T1.c2 = T2.c2
FROM T1 JOIN TO T2
ON T1.c1 = T2.c1

If table T2 has more than one row per T2.c1, results might be as follows:

T2.c1              T2.c2              T2.c3
1                  4                  3
1                  8                  1
1                  6                  4
1                  5                  2

With no ORDER BY clause, T1.c2 may be 4, 6, 8, or 9.

  • With ORDER BY T2.c3, T1.c2 is updated to 8.

  • With ORDER BY T2.c3 DESC, T1.c2 is updated to 6.

Sybase IQ rejects any UPDATE statement in which the table being updated is on the null-supplying side of an outer join. In other words:

  • In a left outer join, the table on the left side of the join cannot be missing any rows on joined columns.

  • In a right outer join, the table on the right side of the join cannot be missing any rows on joined columns.

  • In a full outer join, neither table can be missing any rows on joined columns.

For example, in the following statement, table T1 is on the left side of a left outer join, and thus cannot contain be missing any rows:

UPDATE T1 
SET T1.c2 = T2.c4
FROM T1 LEFT OUTER JOIN T2
ON T1.rowid = T2.rowid

Normally, the order in which rows are updated does not matter. However, in conjunction with the NUMBER(*) function, an ordering can be useful to get increasing numbers added to the rows in some specified order. If you are not using the NUMBER(*) function, avoid using the ORDER BY clause, because the UPDATE statement performs better without it.

In an UPDATE statement, if the NUMBER(*) function is used in the SET clause and the FROM clause specifies a one-to-many join, NUMBER(*) generates unique numbers that increase, but do not increment sequentially due to row elimination. For more information about the NUMBER(*) function, see “NUMBER function [Miscellaneous]”.

You can use the ORDER BY clause to control the result from an UPDATE when the FROM clause contains multiple joined tables.

Sybase IQ ignores the ORDER BY clause in searched UPDATE and returns a message that the syntax is not valid ANSI syntax.

If no WHERE clause is specified, every row is updated. If you specify a WHERE clause, Sybase IQ updates only rows satisfying the search condition.

The left side of each SET clause must be a column in a base table.

Views can be updated provided the SELECT statement defining the view does not contain a GROUP BY clause or an aggregate function, or involve a UNION operation. The view should contain only one table.

Character strings inserted into tables are always stored in the case they are entered, regardless of whether the database is case sensitive or not. Thus a character data type column updated with a string Value is always held in the database with an upper-case V and the remainder of the letters lowercase. SELECT statements return the string as Value. If the database is not case sensitive, however, all comparisons make Value the same as value, VALUE, and so on. Further, if a single-column primary key already contains an entry Value, an INSERT of value is rejected, as it would make the primary key not unique.

If the update violates any check constraints, the whole statement is rolled back.

Sybase IQ supports scalar subqueries within the SET clause, for example:

UPDATE r
SET	r.o= (SELECT MAX(t.o) FROM t ... WHERE t.y = r.y),		r.s= (SELECT SUM(x.s) FROM x ... WHERE x.x = r.x)
WHERE r.a = 10

Sybase IQ supports DEFAULT column values in UPDATE statements. If a column has a DEFAULT value, this DEFAULT value is used as the value of the column in any UPDATE statement that does not explicitly modify the value for the column.

For detailed information on the use of column DEFAULT values, see “Using column defaults” in Chapter 9, “Ensuring Data Integrity” in the Sybase IQ System Administration Guide.

See the CREATE TABLE statement for details about updating IDENTITY/AUTOINCREMENT columns, which are another type of DEFAULT column.


 

Side effects

None.

Standards

  • SQL92 Vendor extension.

  • Sybase With the following exceptions, syntax of the IQ UPDATE statement is generally compatible with the Adaptive Server Enterprise UPDATE statement Syntax 1: Sybase IQ supports multiple tables with join conditions in the FROM clause.

    The Transact-SQL ROWCOUNT option has no effect on UPDATE operations in Sybase IQ.

    Updates of remote tables are limited to Sybase IQ syntax supported by CIS, as described in Chapter 17, “Server Classes for Remote Data Access” and Chapter 16, “Accessing Remote Data” in the Sybase IQ System Administration Guide.

Permissions

Must have UPDATE permission for the columns being modified.

신고
Posted by Tornado tornado

아... sybase 처음 써봄.
데이터가 겁나게 많아서 오더링 잘못 내리면 겁나 느리다 ㅠㅠ
페이지 분할해서 봐야 하는데 mssql 이나 오라클, mysql 처럼 쉽지는 않지만...
예전 MSSQL 2000 에서 rowcount 를 이용해서 페이지 분할 하는 방법 비슷하게
작업할 수 있다.
그러나~~ 테이블 설계가 좀 이상하면 속도.. 겁나느리다. 테이블 수정하고 인덱스점 어케 건들고 싶지만 내 영역도 아니고, 서로 밥먹고 살아야 하기 때문에 나는 오로지 세션빈만 개발할란다~~
--------------------------------------------------------------------------------


[급:질문]top , rownum과 같이 라인수 주는 방법있나요?
작성자 강선규 작성일 2005-01-31 11:41:22 조회수 735
안녕하세요^^

지금 사이베이스로 개발을 하고 있는데,

방대한 데이터를 처리하는데(약 100,000 정도 되는거 같네여...ㅡ.,ㅡ;),

100개의 데이터씩 볼수 있게 하는

top , rownum과 같이 라인수 주는 방법있나요?

부탁드립니다.
 

이 글에 대한 댓글이 총 3건 있습니다.
 

프리챌 DW개발자모임 커뮤니테에 있는 글을 올립니다. 도움이 되었으면 합니다.


[1] SET ROWCOUNT nn 사용하기

1> set rowcount 10
2> go
1> select foo
2> from bar
3> order by barID desc
4> go

그렇다면, 100번째부터 150번째까지를 가져오려면??
[2] 임시테이블을 사용한다면,
set rowcount 150

select rowid = identity(3),
       col1,
       col2
  into #tempA
  from masterTable
 where clause...
 order by 2,3

처럼 rowid라는 칼럼을 identity(3)으로 생성하면 1번부터 번호가 부여된다.
그런 다음,
select * from #tmepAwhere rowid between 100 and 150
하면 된다.
사용한 temp 테이블은 세션이 끊기면 자동으로 제거되지만
drop table #tempA를 해주는것이 좋겠다.
[3] 150건을 먼저 temp에 넣고 100건을 삭제하는 건 어쩔까?
set rowcount 150

select col1,
       col2
  into #tempA
  from masterTable
 where clause...

set rowcount 100
delete #tempAselect * from #tempA [4] 임시 테이블을 사용하지 않고 처리하는 방법??테이블에 identity 칼럼을 만들어 놓고,다음과 같은 조회용 프로시저를 만들어보자.사용법?-- 레코드 100번부터 50건 조회한다면exec proc_my_view(50, 3)  CREATE PROCEDURE proc_my_view
    @perpage    INT,
    @pagenumber INT
WITH RECOMPILE
AS

   -- 페이지 번호가 1인 경우는 아주 쉽네요. set rowcount 사용하면 되니까
   IF @pagenumber = 1
   BEGIN
      SET ROWCOUNT @perpage

      SELECT ...
      RETURN
   END

   -- 페이지가 1보다 큰 경우에는 시작할 레코드 번호를 구해서
   DECLARE @min_postid NUMERIC( 8, 0 ),
           @position   INT

   SELECT @position = @perpage * ( @pagenumber - 1 ) + 1

   SET ROWCOUNT @position

   -- 해당 레코드까지 이동해 가자. 물론, set rowcount를 이용해서
   SELECT @min_postid = postid
     FROM post
    WHERE ...
    ORDER BY postid ASC

   SET ROWCOUNT @perpage

   -- we know where we want to go (say the 28th post in a set of 50).
    SELECT ...
      FROM post
     WHERE postid >= @min_postid
           ...
  ORDER BY postid ASC

장형운(좋은구름)님이 2005-02-01 09:45:31에 작성한 댓글입니다.
 

다음은 직접 구현하여 테스트한 예제입니다.

MYTABLE 이라는 테이블이 있습니다.

create table MYTABLE

(

seqnum numeric(10,0) identity not null,

subject varchar(100) null,

context varchar(250) null,

writer char(10) null

)


이 테이블은 identity 속성에 의해 seqnum 컬럼에 데이터 입력시 자동으로 순번이 들어가겠지요.


다음은 proc_mytable_view라는 프로시저를 만들어서

게시판 조회할 때 호출합니다.


CREATE PROCEDURE proc_mytable_view
    @perpage    INT,
    @pagenumber INT
WITH RECOMPILE
AS

   -- 페이지 번호가 1인 경우는 아주 쉽네요. set rowcount 사용하면 되니까
   IF @pagenumber = 1
   BEGIN
      SET ROWCOUNT @perpage

      SELECT * from MYTABLE order by seqnum
      RETURN
   END


   -- 페이지가 1보다 큰 경우에는 시작할 레코드 번호를 구해서
   DECLARE @min_seqnum NUMERIC(8, 0),
           @position   INT

   SELECT @position = @perpage * ( @pagenumber - 1 ) + 1


   SET ROWCOUNT @position


   -- 해당 레코드까지 이동해 가자. 물론, set rowcount를 이용해서
   SELECT @min_seqnum = seqnum
     FROM MYTABLT
    ORDER BY seqnum ASC


   SET ROWCOUNT @perpage


    SELECT *
      FROM MYTABLE
     WHERE seqnum >= @min_seqnum
    ORDER BY seqnum ASC
go


이제 테스트해 볼까요?

-- 10건씩 첫 페이지

exec proc_mytable_view  10, 1

-- 10건씩 2 페이지

exec proc_mytable_view  10, 2


장형운(좋은구름)님이 2005-02-01 10:01:39에 작성한 댓글입니다.
이 댓글은 2005-02-01 10:07:05에 마지막으로 수정되었습니다.
장형운님이 작성하신게...모범답안이 될것같네요..


sybase ase는 조회page가 뒤쪽(제일 먼저 입력된 자료)일수록


page가 많은게 문제죠....(게시판에서...)


oracle의 rownum의 기능이 아쉽죠...




12.5.3부터는 top fuction을 제공을 합니다....


그걸 활용해 보시는 것도.....(뭐 rowcount랑 별차이는 없는데요....그게 많이 익숙할실듯 해서...)

지연님이 2005-02-01 11:18:22에 작성한 댓글입니다. Edit X
[출처] oracle-sybase conversion: ROWNUM|작성자 라붐

신고
Posted by Tornado tornado
처음 보는 에러...


당황...


하드디스크 용량 봤더니 100% ㅠㅠ

로그 지웠더니 하드 용량의 절반을 찾음..

로그 쉘을 잘못 짰나... 고쳐야겠다.
신고
Posted by Tornado tornado



declare @money money

set @money = 123456 

select replace(convert(varchar(15),convert(money,@money),1),'.00','')

 

-.-

 

Sql money 타입으로 캐스팅 하고 그것을 다시 convert 1 주면 1,000,000.00 같이 나오는데

그중에 .00 공백으로 치환하니 간단히 되는군요

신고
Posted by Tornado tornado
원글 : http://sqlmania.net/board/view.asp?boardid=7&idx=24&page=1&srchKey=&srchVal=


출처:MSDN
---------------------------------------------------------------------------------------------------

Text, Ntext, Image에 대한 간략한 자습서

Joseph Gama

Text, Ntext, Image 데이터는 오래도록 여러 가지로 사용되어 왔지만 서로 간의 차이를 간과하기가 쉽습니다. Joseph Gama는 이 자습서에서 이러한 특수 데이터 형식의 구현과 사용에 대한 개요를 간략하게 설명합니다. Joseph Gama는 특별히 지도해 준 Metropolitan State College of Denver의 Aaron Gordon, Earl Hasz, Jerry Shultz, Shahar Boneh 박사님들과 아낌없이 지원해 준 Adam과 Karen Schwartz에게 이 기사를 바칩니다.

데이터베이스의 크기와 복잡성이 커지고 있는 원인이 부분적으로는 오늘날의 하드웨어와 소프트웨어가 멀티미디어 및 문서 데이터를 비롯하여 엄청난 양의 데이터를 저장할 수 있게 되었기 때문이라고 볼 수 있습니다. JPG, PNG, MP3, DOC/RTF, HTML, 유니코드, XML 데이터를 SQL Server 데이터베이스에 image, text 또는 ntext로 모두 저장할 수 있습니다.

일반적으로 큰 ASCII 문자열을 저장하는 데는 text를 사용하고 유니코드 문자열에는 ntext, 이진 이미지 데이터에는 image를 사용합니다. 크기의 경우 Text는 최대 2^31 - 1(2,147,483,647)자의 가변 길이 비 유니코드 문자를 제공하고 ntext는 최대 2^30 - 1(1,073,741,823)자, image는 최대 2^31 - 1(2,147,483,647)바이트를 제공합니다. ntext의 실제 저장소 크기(바이트 단위)는 입력한 문자 수의 두 배입니다. ntext의 SQL-92 동의어는 national text입니다.

이제 이러한 형식이 어떻게 작동하는지 살펴보겠습니다. 데이터를 참조할 때는 포인터를 사용하며 특수 함수에서 포인터를 통해 데이터를 그러한 형식에서 추가, 추출 또는 제거할 수 있습니다. 표 1에서는 각각의 장점과 제한 사항을 요약합니다.

표 1. text, ntext, image 데이터 형식의 장점과 제한 사항

가능

불가능

저장 프로시저의 입력 또는 출력 매개 변수로 사용됩니다.

DECLARE, SET, FETCH와 작동합니다. 즉, 다른 데이터 형식과 똑같이 변수로 사용할 수 없습니다.

UDF에 대한 입력으로 사용됩니다.

UDF의 반환 데이터 형식이 됩니다(타임스탬프도 불가능).

최대 8,000바이트를 다른 데이터 형식으로 변환합니다.

변환을 수행하지 않을 경우 FETCH로 커서에서 읽어들여집니다.

UNION ALL 절에 사용합니다.

sql_variant와 작동합니다.

GROUP BY 절에서 비교, 저장 또는 사용됩니다. 유일한 예외는 IS NULL 또는 LIKE를 사용할 경우입니다. 쉽게 해결할 수 있는 방법은 다른 데이터 형식을 반환하는 사용자 정의 함수나 CONVERT를 사용하는 것입니다.

이 형식의 UNION은 DISTINCT 절에 해당하므로 UNION 절에서 사용되며 text, ntext, image 데이터 형식은 정렬할 수 없으므로 오류가 발생합니다.

Text 및 image 함수

여기서는 여러 가지 text 함수와 image 함수에 대해 설명하면서 해당 구문과 각각의 출력 예제를 보여 줍니다.

TEXTPTR
TEXTPTR은 text, ntext 또는 image 열을 참조하는 text-pointer 값인 16바이트 길이의 varbinary를 반환합니다.

구문: TEXTPTR ( column )

--TEXTPTR 샘플, text-pointer를 만들고 해당 값을 봅니다. create table #t (n ntext) insert #t values('abcdef') DECLARE @ptrval binary(16) SELECT @ptrval = TEXTPTR(n) FROM #t print @ptrval drop table #t

결과:

0xFFFF6900000000004D00000001000000

TEXTVALID
TEXTVALID는 text-pointer가 유효할 경우 값 1이 되고 그렇지 않으면 0이 되는 int를 반환합니다.

구문: TEXTVALID ( 'table.column' , text_ptr )

--TEXTPTR 샘플, text-pointer를 만들고 해당 값을 테스트합니다. create table #t (n ntext) insert #t values('abxyef') DECLARE @ptrval binary(16), @ptrval2 binary(16) SELECT @ptrval = TEXTPTR(n) FROM #t if TEXTVALID('#t.n',@ptrval)=1 print '@ptrval has a valid text pointer.' else print '@ptrval has an invalid text pointer.' if TEXTVALID('#t.n',@ptrval2)=1 print '@ptrval2 has a valid text pointer.' else print '@ptrval2 has an invalid text pointer.' drop table #t

결과:

@ptrval has a valid text pointer. @ptrval2 has an invalid text pointer.

SET TEXTSIZE
SET TEXTSIZE는 SELECT 문을 사용할 때 반환되는 text와 ntext 데이터의 크기인 int 값을 설정합니다.

구문: SET TEXTSIZE { number }

--SET TEXTSIZE 샘플 create table #t (n ntext) insert #t values('abcdefghijk') SET TEXTSIZE 10--ntext는 유니코드 2바이트/문자임 select * from #t SET TEXTSIZE 20--ntext는 유니코드 2바이트/문자임 select * from #t drop table #t

결과:

abcde abcdefghij

@@TEXTSIZE
@@TEXTSIZE는 SELECT 문을 사용할 때 반환되는 text와 ntext 데이터의 크기인 int 값을 반환합니다. 이 값은 SET TEXTSIZE를 사용하여 설정됩니다.

구문: @@TEXTSIZE

--@@TEXTSIZE 샘플 SET TEXTSIZE 10--ntext는 유니코드 2바이트/문자임 print @@TEXTSIZE SET TEXTSIZE 20--ntext는 유니코드 2바이트/문자임 print @@TEXTSIZE

결과:

10 20

WRITETEXT
WRITETEXT는 text, ntext, image 열의 데이터를 덮어씁니다.

구문: WRITETEXT { table.column text_ptr } [ WITH LOG ] { data }

--WRITETEXT 샘플 create table #t (n ntext) insert #t values('abc') DECLARE @ptrval binary(16) SELECT @ptrval = TEXTPTR(n) FROM #t WRITETEXT #t.n @ptrval 'def' select * from #t drop table #t

결과:

def

UPDATETEXT
UPDATETEXT는 기존 text, ntext, image 열의 데이터를 변경합니다.

구문: UPDATETEXT { table_name.dest_column_name dest_text_ptr } { NULL | insert_offset } { NULL | delete_length } [ WITH LOG ] [ inserted_data | { table_name.src_column_name src_text_ptr } ]

--UPDATETEXT 샘플 삽입 전용 create table #t (n ntext) insert #t values('bd') DECLARE @ptrval binary(16), @i int SELECT @ptrval = TEXTPTR(n) FROM #t UPDATETEXT #t.n @ptrval 0 0 'a'--맨 앞에 삽입 select * from #t UPDATETEXT #t.n @ptrval 2 0 'c'--중간에 삽입 select * from #t set @i=(select DATALENGTH(n) from #t)/2 --/ntext일 경우만 2, 2바이트/문자 print @i UPDATETEXT #t.n @ptrval @i 0 'e'--맨 끝에 삽입 select * from #t drop table #t

결과:

abd abcd abcde

샘플 삭제 및 삽입:

--UPDATETEXT 샘플 삭제+삽입 create table #t (n ntext) insert #t values('abxyef') DECLARE @ptrval binary(16), @i int SELECT @ptrval = TEXTPTR(n) FROM #t UPDATETEXT #t.n @ptrval 2 2 'cd'--2 삽입, 2 삭제 --위치 2에서 시작하는 문자 select * from #t drop table #t

결과:

abcdef

READTEXT
READTEXT는 text, ntext, image 열의 데이터에서 일정량을 읽습니다.

구문: READTEXT { table.column text_ptr offset size } [ HOLDLOCK ]

--READTEXT 샘플 create table #t (n ntext) insert #t values('abcdefghijk') DECLARE @ptrval binary(16) SELECT @ptrval = TEXTPTR(n) FROM #t READTEXT #t.n @ptrval 3 8 --위치 3에서 시작하여 여덟 문자를 읽습니다. drop table #t

결과:

defghijk

DATALENGTH
DATALENGTH는 text, ntext, image 열의 크기(바이트 수)를 반환합니다.

구문: DATALENGTH ( expression )

--DATALENGTH 샘플 create table #t (n ntext) insert #t values('1234567890') DECLARE @i int set @i=(select DATALENGTH(n) from #t) --길이(바이트 단위)=2*UNICODE 길이로 반환해야 합니다. PRINT @i drop table #t

결과:

20

PATINDEX
PATINDEX는 text, ntext, image 열에서 패턴이 처음 나오는 위치를 int 값으로 반환하거나 패턴이 없으면 0을 반환합니다.

구문: PATINDEX ( '%pattern%' , expression )

--PATINDEX 샘플 create table #t (n ntext) insert #t values('Hello Tim, long time no see!') SELECT PATINDEX('%tim%', n) FROM #t SELECT PATINDEX('%time%', n) FROM #t drop table #t

결과:

7 17

CONVERT
CONVERT는 다른 데이터 형식으로 변환된 식을 반환합니다.

구문: CONVERT ( data_type [ ( length ) ] , expression [ , style ] )

--CONVERT 샘플 create table #t (n ntext) insert #t values('Hello Tim, long time no see!') DECLARE @c nvarchar(5) SET @c=(select convert(nvarchar(5),n) from #t) print @c drop table #t

결과:

Hello

CAST
CAST는 다른 데이터 형식으로 캐스트된(변환된) 식을 반환합니다.

구문: CAST ( expression AS data_type )

--CAST 샘플 create table #t (n ntext) insert #t values('Hello Tim, long time no see!') DECLARE @c nvarchar(5) SET @c=(select CAST ( n AS nvarchar(5) ) from #t) print @c drop table #t

결과:

Hello

일반적인 구현

이제 배경 지식이 좀 생겼으니 일반적인 사용 방법을 살펴보겠습니다.

text, ntext, image 열을 파일에 저장
text, ntext 또는 image 형식의 열 하나를 파일에 저장할 수 있는 방법을 보여 주는 저장 프로시저를 각각 세 개 만들었습니다. 이러한 코드와 다른 모든 예제를 파일 다운로드를 통해 볼 수 있습니다.

--saveText2file 샘플 create table ##t (n text) insert ##t values('Hello Tim, long time no see!') EXEC saveText2file 'c:\test.txt', '##t','n', '' drop table ##t --saveNtext2file 샘플 create table ##t (n ntext) insert ##t values('Hello Tim, long time no see!') EXEC saveNtext2file 'c:\test.txt', '##t','n', '' drop table ##t --saveImage2file 샘플 exec saveImage2file 'c:\Category1.bak', 'Northwind..Categories', 'Picture', 'where categoryid=1'

파일의 text, ntext, image 열 업데이트
TEXTPTR, WRITETEXT, UPDATETEXT의 경우 테이블 또는 열 매개 변수를 정의할 때 변수 이름을 사용할 수 없으므로 파일 내용을 열에 읽어들려면 동적 SQL을 사용해야 합니다. 저장 프로시저 readImageFromfile에서는 데이터를 이진 데이터로 읽고 임시 테이블을 사용하지 않은 상태로 데이터를 기록하므로 image와 varchar 데이터 형식을 모두 처리할 수 있습니다. Ntext는 readNtextFromfile을 사용하여 읽을 수 있습니다.

--readImageFromfile 샘플 --파일의 text 열 읽기 create table ##t (n text) insert ##t values('Hi Tim, long time no see!') EXEC readImageFromfile 'c:\hello.txt', '##t','n', '' select * from ##t drop table ##t

결과:

Hello

동일한 코드가 image 열에 대해 작동합니다. 이 저장 프로시저에서는 변환이 이루어지지 않으므로 ntext 열에는 ASCII 텍스트 파일의 데이터를 받아서는 안 되고 text 열에는 유니코드 데이터를 받아서는 안 된다는 것만 기억하면 됩니다. 데이터는 원시 형식으로 제공됩니다.

저장 프로시저 readImageFromfile2는 원래 내용을 바꾸지 않고 데이터를 추가하며 readNtextFromfile은 Ntext(유니코드) 데이터를 읽습니다. 업데이트는 원래 데이터를 새 데이터의 첫째 블록으로 먼저 바꾼 다음 새로운 데이터 블록을 연속적으로 추가하여 수행되지만 추가 시에는 첫째 단계가 필요 없으므로 데이터를 추가하는 것이 업데이트하는 것보다 쉽습니다. ntext 읽기는 유니코드가 Little Endian이 되어야 하는 것 외에는 기본적으로 다른 데이터 형식을 읽는 것과 동일합니다. 텍스트 프로세서는 0xFFFE를 유니코드 텍스트 파일의 맨 앞에 추가하는 경우가 많습니다. 저장 프로시저 readNtextFromfile에는 유니코드의 Big Endian과 Little Endian 변형을 구분할 때 사용하는 "바이트 순서 표시"(BOM)인 0xFFFEm을 제거하는 추가 코드가 약간 있습니다.

파일에서 유니코드 Big Endian을 읽어서 Little Endian으로 변환하는 readNtextFromfileBigEndian이라는 또 다른 저장 프로시저를 보려면 파일 다운로드를 참고하십시오.

개체의 텍스트를 파일로 저장
ntext 열이 포함된 임시 테이블을 사용하면 규칙, 기본값, 암호화되지 않은 저장 프로시저, UDF, 트리거 또는 뷰의 텍스트를 파일로 읽을 수 있습니다. 임시 테이블이 만들어진 다음 INSERT EXEC 기술로 새 레코드가 삽입됩니다. 이 레코드에는 시스템 저장 프로시저 sp_helptext로 sysobjects에서 읽어들인 개체의 텍스트가 포함됩니다. 다음 코드가 포함된 저장 프로시저 saveobj에서 수행됩니다.

create table #temp(s nvarchar(4000)) insert #temp exec sp_helptext @object

저장 프로시저 saveobj는 규칙, 기본값, 암호화되지 않은 저장 프로시저, UDF, 트리거 또는 뷰를 텍스트 파일에 저장하여 백업 및 문서화에 아주 유용하도록 할 수 있습니다.

--saveobj 샘플, hello 개체 저장 exec saveobj 'c:\hello.sql', 'hello'

파일에서 개체 텍스트 읽기
8,000바이트가 넘는 파일을 읽으려면 많은 버퍼가 파일 크기에 따라 채워진 다음 나중에 실행되므로 동적 SQL을 사용해야 합니다. 저장 프로시저 readobj는 루프를 사용하여 동적 SQL을 만들어서 만들 임시 변수의 개수를 확인하는 루프에 대해 작동합니다. 각 임시 변수에는 8,000바이트의 데이터가 들어가므로 가져오는 파일의 크기가 제한되지 않습니다. 문자 한 개와 일련 번호가 포함된 임시 변수 이름을 사용하여 구분하면 원하는 대로 됩니다.

--readobj 샘플, hello 개체 읽기 exec readobj 'c:\hello.sql'

8,000바이트 이상의 INSERT 또는 UPDATE 문
저장 프로시저 readobj는 텍스트 파일에서 규칙, 기본값, 암호화되지 않은 저장 프로시저, UDF, 트리거 또는 뷰 개체를 만들 수 있습니다. INSERT, UPDATE, DELETE 또는 8,000바이트가 넘는 그 밖의 문들도 사용할 수 있습니다. 문이 텍스트 파일에는 없고 큰 SQL 문자열에 있는 경우에는 다음과 같은 간단한 저장 프로시저로 해당 문자열을 실행할 수 있습니다.

CREATE PROCEDURE exec_ntext (@SQL ntext) --큰 SQL 문을 실행합니다. AS EXEC (@SQL)

사용자 정의 함수를 사용하여 text, ntext, image 열 비교
두 열을 가장 빠르고 쉽게 비교할 수 있는 방법은 크기를 비교하는 것이지만, 두 열이 내용은 다르지만 길이가 같을 수도 있기 때문에 이 방법에는 확실히 한계가 있습니다.

CREATE FUNCTION testlength (@a ntext, @b ntext) --두 입력 내용의 길이가 같으면 true를 반환합니다. RETURNS bit AS BEGIN declare @temp_bit bit if datalength(@a)=datalength(@b) set @temp_bit= 1 else set @temp_bit=0 return @temp_bit END

앞의 예제에서는 입력 내용이 ntext였지만 text나 image가 될 수도 있습니다.

두 열을 비교하는 가장 좋은 방법은 전체 내용이나 그 안의 일부 내용을 비교하는 것입니다.

CREATE FUNCTION testequality (@a ntext, @b ntext) --비교 결과가 true이면 1을 반환합니다. RETURNS bit AS BEGIN declare @temp_bit bit if @a like @b set @temp_bit= 1 else set @temp_bit=0 return @temp_bit END

와일드카드 문자를 사용하여 정의한 패턴 일치(둘째 입력 변수)를 통해 이 기술이 더욱 강력해집니다. SQL Server의 와일드카드 문자는 다음과 같습니다.

  • % 크기에 관계없는 문자열(빈 문자열도 허용)
  • _ 길이가 한 자리인 문자열
  • [ ] 문자 집합([fhsvf]) 또는 문자 범위([k-t]) 중에 있는 한 문자
  • [^] 문자 집합([^fhsvf]) 또는 문자 범위([^k-t]) 중에 포함되지 않은 한 문자

이미지의 형식 확인
이미지가 비트맵, JPEG, PNG, TIFF 중 어떤 형식인지 알아내는 간단한 방법은 이미지의 첫째 바이트를 검사하여 해당 형식의 헤더 명세에 맞는지 여부를 확인하는 것입니다. 원시 데이터의 경우 실패할 수 있기 때문에 이 방법이 절대 안전하지는 않지만 더욱 정교한 식별 기능의 첫 단계가 됩니다.

CREATE FUNCTION getImageType (@a image) --이미지 형식의 유형을 반환합니다. RETURNS varchar(4) AS BEGIN declare @out varchar(4), @temp varbinary(8) SET @temp=convert(varbinary(8), @a) SET @out=CASE WHEN LEFT(@temp,2)=0x424D THEN 'BMP' WHEN LEFT(@temp,2)=0xFFD8 THEN 'JPG' WHEN LEFT(@temp,8)=0x89504E470D0A1A0A THEN 'PNG' WHEN (LEFT(@temp,2)=0x4949)OR(LEFT(@temp,2)=0x4D4D) THEN 'TIFF' ELSE '' END return @out END

이러한 샘플과 그 밖의 샘플들을 파일 다운로드를 통해 사용할 수 있습니다. 여러 가지 샘플을 검토해 보고 응용 프로그램에 직접 활용해 보십시오.

NTEXTCODE.exe



신고
Posted by Tornado tornado

http://www.oracle.com/technology/software/products/sql/index.html


무료라는거~

신고
Posted by Tornado tornado
출처 : swynk.com

내용 : Information Schema Views에 대한 종류와 그 내용. 샘플을 다룹니다.

Microsoft SQL Server ships with some information schema views that are very helpful for
getting information about the meta-data. And Microsoft suggests to use these views
instead of querying the system tables like systypes, sysreferences, sysindexes etc. since
they can change from release to release.
I personally prefer to have a good knowledge of the system tables and occasionally use
them for database administration purposes but for the application it's good if your SQL
queries are based off these views so you are sure that when you migrate from one
version to the other of SQL Server, you are not going to be breaking something.
All these information_schema views exist in SQL Server 2000 and detailed information on
all of them is available in BOL so I wont' go into details of what each view does. Here is
the list of the views:

Information_Schema.Check_Constraints
Information_Schema.Column_Domani_Usage
Information_Schema.Column_Privleges
Information_Schema.Columns
Information_Schema.Constraint_Column_Usage
Information_Schema.Constraint_Table_Usage
Information_Schema.Domain_Constraints
Information_Schema.Domains
Information_Schema.Key_Column_Usage
Information_Schema.Parameters
Information_Schema.Referential_Constraints
Information_Schema.Ruotine_Columns
Information_Schema.Routines
Information_Schema.Schemata
Information_Schema.Table_Constraints
Information_Schema.Table_Constraints
Information_Schema.Table_Privileges
Information_Schema.Tables
Information_Schema.Views
Information_Schema.View_Column_Usage
Information_Schema.View_Table_Usage
Some examples:
1)For getting a list of constraints, column_names and their position in the constraint:


SELECT COLS.CONSTRAINT_NAME,COLS.COLUMN_NAME,COLS.ORDINAL_POSITION
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS COLS
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS CONS ON
COLS.CONSTRAINT_NAME = CONS.CONSTRAINT_NAME
WHERE COLS.CONSTRAINT_CATALOG = DB_NAME()
AND COLS.TABLE_NAME = 'TASK_DTL'
AND CONS.CONSTRAINT_TYPE = 'PRIMARY KEY'
ORDER BY COLS.CONSTRAINT_NAME, COLS.ORDINAL_POSITION

You can replace the name of the table to whatever you want and can also replace the
constraint_type if you want to search for constraints other than the Primary Key.


2) SELECT CONSTRAINT_NAME,
COLUMN_NAME,
ORDINAL_POSITION
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE CONSTRAINT_CATALOG = DB_NAME()
AND TABLE_NAME = 'X'
ORDER BY CONSTRAINT_NAME, ORDINAL_POSITION

Subsitute x with the name of the table and you will get the name of the constraints defined
on the table (Primary Key as well as the Foreign key constraints), the names of the
columns involved and the position of the columns in the constraint, for example, which
column is number 1 in the case of a covered primary key.

3) For a list of tables, their columns, data-type for the columns, NULL/Not NULL criteria:


SELECT A.TABLE_NAME, B.COLUMN_NAME, B.DATA_TYPE, B.IS_NULLABLE
FROM INFORMATION_SCHEMA.TABLES A, INFORMATION_SCHEMA.COLUMNS B
WHERE A.TABLE_NAME = B.TABLE_NAME
ORDER BY A.TABLE_NAME
신고
Posted by Tornado tornado
[MySQL] mysqld_multi 사용법

- 작성자 : 김칠봉 <san2(at)linuxchannel.net>
- 작성일 : 2002.02.09(오타수정, 보완)
         : 2003.01.29
- 분 류  : mysql
- 수 준  : 초중급이상
- 내 용  : 하나의 서버에서 여러개의 mysqld 데몬을 띄우는 방법
- 키워드 : mysql, mysqld, mysqld_multi, safe_mysqld, mysqld_safe

*주1)
이 문서에 대한 최신 내용은 아래 URL에서 확인할 수 있습니다.

http://www.linuxchannel.net/docs/mysqld_multi.txt

*주2)
이 문서의 내용은 같은 하나의 MySQL 서버에서 여러개의 mysqld 데몬을
띄우는 방법에 대해서 설명합니다.
mysql 설치나 실제 실무에서 다루는 고차원적이고 상세한 기술적인
부분은 다루지 않습니다.

*주3)
`mysqld_multi` 프로그램은 MySQL 3.23.29 이상 버전용입니다.

reference :
- http://www.mysql.com/doc/en/mysqld_multi.html
- http://www.mysql.com/doc/en/Multiple_servers.html

---------------------------------------------------------
목차
1. mysqld multi 란?
2. 서로 다른 여러개의 mysqld를 띄우는 방법(기본)
3. mysqld_multi 을 이용하여 여러개의 mysqld 띄우는 방법
4. MySQL 3.23.x와 MySQL 4.0.x 을 동시에 구동하는 방법
5. 공유한 하나의 DB에 랜덤하게 접속하는 방법
6. 후기
---------------------------------------------------------

1. mysqld multi 란?

`mysqld_multi'라는 PERL 스크립트(3.23.29 이상)를 이용하여
같은 서버에서 여러개의 독자적인 mysqld를 운영하는 방법을 말합니다.

정확히 말해서, mysql 포트나 소켓을 서로 달리하여 여러개의 mysqld
프로세스를 띄워 운영하는 방법을 말합니다.

httpd에서 80, 8080 포트로 운영하는 것이나 또는 sendmail에서
daemonport를 서로 달리하여 운영하는 방법과 비슷한 원리입니다.


2. 서로 다른 여러개의 mysqld를 띄우는 방법(기본)

<주의>
이 절의 내용은 여러개의 mysqld 데몬을 띄우는 기본 사항에 대해서
다룹니다. 실제로 이렇게 운영할 수도 있지만 약간 불편하기 때문에
권장방법은 아닙니다.
</주의>

따로 mysql을 추가로 컴파일할 필요는 없습니다. 기존의 safe_mysqld
스크립트를 이용하면 됩니다.

이때 중요한 점은 반드시 최소한 pid-file, socket 위치를 서로 다르게
지정해야 합니다.

*반드시 다르게 설정해야할 옵션)

--pid-file=
--socket=
--port= or --skip-network

나머지 옵션은 mysql을 컴파일할때 기본값으로 사용하는 값을 따르거나
my.cnf 파일의 [mysqld] 섹션에 설정한 옵션을 따릅니다.

<팁> 외부 MySQL client에서 접속을 원천 봉쇄하려면?
mysql.host 테이블과 상관없이 아예 mysql 포트를 열지않도록
--skip-network 옵션을 주고 mysqld를 시작하면 됩니다.
(`netstat -atnp`로 확인).
</팁>

이하 설명은 외부 mysql client에서 접속해 오는 경우가 아닌 내부 client
에서 접속하는 경우로 가정하겠습니다.

만약 외부 mysql client(대부분 웹서버)에서 접속해야만 하는 경우라면
--skip-network 대신 --port=# 옵션으로 바꾸어야 합니다.


[첫번째 mysqld 구동]

shell> safe_mysqld &
or
shell> safe_mysqld --defaults-file=/root/.my.cnf &

[두번째 추가적인 mysqld 구동]

shell> safe_mysqld \
  --pid-file=/usr/local/mysql/var/`hostname`.pid2 \
  --socket=/tmp/mysql.sock2 \
  --skip-network & /*** or --port=3307 ***/
or
shell> safe_mysqld \
  --defaults-file=/root/.my.cnf \
  --pid-file=/usr/local/mysql/var/`hostname`.pid2 \
  --socket=/tmp/mysql.sock2 \
  --skip-network & /*** or --port=3307 ***/

두번째 mysqld에서 사용되는 나머지 옵션은 첫번째 mysqld의 기본 옵션을
그대로 따릅니다. 즉 basedir, datadir 와 같은 옵션을 따로 지정하지
않았기 때문에 어떤 값을 사용하는지를 알아보려면 다음과 같이 확인해
봅니다.

[기본적으로 사용되는 값 알아보기]

shell> /usr/local/mysql/libexec/mysqld --help
...
basedir:     /usr/local/mysql/
datadir:     /usr/local/mysql/var/
tmpdir:      /tmp/
language:    /usr/local/mysql/share/mysql/english/
pid file:    /usr/local/mysql/var/home.pid
TCP port:    3306
Unix socket: /tmp/mysql.sock
...
Possible variables for option --set-variable (-O) are:
back_log              current value: 50
binlog_cache_size     current value: 32768
connect_timeout       current value: 5
...


[두번째 mysqld에 접속해 보기]

현재 두번째 mysqld의 datadir 옵션이 없기 때문에 첫번째 mysqld의 datadir
와 같습니다. 즉 하나의 datadir를 공유한 셈이 됩니다.

*프로세스 확인)
shell> ps -ef --cols 400 | grep mysqld

*소켓 확인)
shell> ls /tmp/mysql.sock*
/tmp/mysql.sock  /tmp/mysql.sock2

*접속해 보기)
shell> mysql -u username -p -S /tmp/mysql.sock2 db_name

'-S'(또는 --socket) 옵션으로 두번째 mysqld의 소켓 위치를 입력해 주면
됩니다.


[두번째 mysqld 종료하기]

이와 같이 두번째, 세번째 mysqld를 구동하는 방법은 매번 옵션을 적어줘야
하는 불편함이 있습니다(또한 관리도 썩 매끄럽지 못하고).

일단 두번째 mysqld 구동에 성공하고 접속 테스트까지 끝냈다면 이제는
필요없으니 두번째 mysqld를 종료해 봅시다.

종료도 간단합니다. 단지 소켓 위치만 추가 옵션을 주면 됩니다.

shell> mysqladmin -u root -p -S /tmp/mysql.sock2 shutdown


3. mysqld_multi 을 이용하여 여러개의 mysqld 띄우는 방법

mysqld_multi 프로그램은 PERL로 짜여져 있으며, PREFIX/bin/mysqld_multi에
위치합니다. (MySQL 3.23.29 버전부터 추가되었군요.)

mysql.server 또는 safe_mysqld 스크립트는 기본적으로 /etc/my.cnf 또는
~/.my.cnf 또는 PREFIX/var/my.cnf 설정 파일에서 [mysqld] 섹션의 옵션 내용
을 읽어들여 mysqld 를 구동합니다.

<주의>
MySQL 4.0.x 버전은 safe_mysqld가 아니라 mysqld_safe으로 스크립트 이름이
변경되었습니다.
</주의>

이하 mysql 설정파일(my.cnf)은 /root/.my.cnf으로 통일합니다.

반면 mysqld_multi 스크립트는 [mysqld_multi], [mysqld1], [mysqld2], ...,
[mysqld] 의 섹션을 참고합니다.

정리하면,

  `safe_mysqld'  <-- [mysqld] <-- defaults options

  `mysqld_multi' <-- [mysqld_multi]
                       |-- [mysqld1] <-- ([mysqld]) <-- defaults options
                       |-- [mysqld2] <-- ([mysqld]) <-- defaults options
                       |-- [mysqld3] <-- ([mysqld]) <-- defaults options
                       `-- [mysqld#] <-- ([mysqld]) <-- defaults options

이와 같은 위계로 각 섹션을 참고하여 mysqld를 구동합니다.
(괄호안의 섹션은 그 섹션이 있다면 참고한다는 옵션입니다.)

mysqld_multi 를 사용하기 위해서 /root/.my.cnf 파일에 다음과 같이
추가해야 합니다(없다면 파일 생성).

shell> /usr/local/mysql/bin/mysqld_multi --example

하면 그 예제가 출력됩니다.

-- /root/.my.cnf 또는 /etc/my.cnf (수정전) ----------
[client]
password	= 안 갈쳐조오..
host		= localhost

[mysql]
ignore-spaces

[mysqld]
default-character-set = euc_kr
safe-show-database
skip-name-resolve
#port		= 3306
skip-network
log
#log-update
user		= mysql
set-variable 	= key_buffer=256M
set-variable 	= thread_cache_size=8
set-variable 	= table_cache=256

[myisamchk]
set-variable 	= key_buffer=100M
set-variable 	= sort_buffer=100M
set-variable 	= read_buffer=3M
set-variable 	= write_buffer=3M
#set-variable 	= sort_key_blocks=20
#set-variable 	= decode_bits=10

[mysqladmin]
#set-variable 	= connect_timeout=100
#set-variable 	= shutdown_timeout=1

[mysqldump]
user		= root
-----------------------------------------------------

-- /root/.my.cnf 또는 /etc/my.cnf (수정후) ----------
[client]
(생략)...

[mysql]
(생략)...

[mysqld]
default-character-set = euc_kr
skip-name-resolve
skip-network	## only localhost access
language	= /usr/local/mysql/share/mysql/english
(생략)...

[mysqld_multi]
mysqld		= /usr/local/mysql/bin/safe_mysqld
mysqladmin	= /usr/local/mysql/bin/mysqladmin
#user		= root

[mysqld1]
socket		= /tmp/mysql.sock1
#port		= 3307
pid-file	= /usr/local/mysql/var/mysqld1.pid
datadir		= /usr/local/mysql/var
log		= /usr/local/mysql/var/mysqld1.log
user		= mysql

[mysqld2]
socket		= /tmp/mysql.sock2
#port		= 3308
pid-file	= /usr/local/mysql/var2/mysqld2.pid
datadir		= /usr/local/mysql/var2
log		= /usr/local/mysql/var2/mysqld2.log
user		= mysql

[myisamchk]
(생략)...

[mysqladmin]
(생략)...

[mysqldump]
(생략)...
-----------------------------------------------------

앞의 설정에서 [mysqld1]은 datadir이 기존의 [mysqld] 섹션과 동일하므로
하나의 datadir을 공유하겠다는 의미입니다.

[mysqld2]의 datadir은 서로 다르므로 아주 독자적인 데이터베이스를 운영
하겠다는 의미입니다.

예) [mysqld2]용 datadir 만들기
shell> mkdir /usr/local/mysql/var2
shell> /usr/local/mysql/bin/mysql_install_db \
  --datadir=/usr/local/mysql/var2
shell> chown mysql.mysql -R /usr/local/mysql/var2

이와 같이 기본 mysql 데이터베이스를 만들어 주면 됩니다.

MySQL 영문 매뉴얼에 의하면 [mysqld_multi] 섹션에 multi_admin 유저를
다음과 같이 기존의 MySQL에 추가하여 관리하도록 하는 예제가 있습니다.

shell> mysql -u root -S /tmp/mysql.sock -proot_password -e \
  "GRANT SHUTDOWN ON *.* TO multi_admin@localhost \
  IDENTIFIED BY 'multipass'"

그러나 반드시 이와 같이 multi_admin 유저를 추가할 필요는 없고
기존의 root 유저를 이용하면 그만입니다.

<참고>
[mysqld1] 섹션에서 user는 해당 mysqld의 프로세스 유저를
의미합니다. /etc/passwd 파일에 존재한 다른 유저로도 설정가능합니다.
만약 하나의 유저에 대한 어떤 제한(ulimt)이 있다면 mysql 유저 대신
다른 유저를 /etc/passwd 파일에 추가하고 해당 유저를 사용하면 됩니다.
</참고>

그럼 두번째 [mysqld1], 세번째 [mysqld2] mysqld를 구동해 봅시다.

shell> /usr/local/mysql/bin/mysqld_multi --help
...
Usage: mysqld_multi [OPTIONS] {start|stop|report} [GNR,GNR,GNR...]
or     mysqld_multi [OPTIONS] {start|stop|report} [GNR-GNR,GNR,GNR-GNR,...]
..

GNR은 'Group NumbeR'를 의미하며 [mysqld#]에서 # 부분을 말합니다.
즉 정수 단위로 1부터 가능하며 GNR 옵션이 없다면 모든 GNR을 기본값으로
사용합니다.

*mysqld 구동하기)
shell> /usr/local/mysql/bin/mysqld_multi start
or
shell> /usr/local/mysql/bin/mysqld_multi start 1,2

*구동확인)
shell> /usr/local/mysql/bin/mysqld_multi report
Reporting MySQL servers
MySQL server from group: mysqld1 is running
MySQL server from group: mysqld2 is running

*두번째 mysqld에 접속하기
shell> mysql [OPTIONS] -S /tmp/mysql.sock1 db_name

*세번째 mysqld에 접속하기
shell> mysql [OPTIONS] -S /tmp/mysql.sock2 db_name

*두번째 mysqld 종료하기)
shell> /usr/local/mysql/bin/mysqld_multi stop 1

*두번째, 세번째 mysqld 종료하기)
shell> /usr/local/mysql/bin/mysqld_multi stop 1,2

*모든 mysqld multi 종료하기)
shell> /usr/local/mysql/bin/mysqld_multi stop
or
shell> /usr/local/mysql/bin/mysqld_multi stop 1,2

mysqld_multi는 safe_mysqld와 별개입니다. 어떤 옵션을 주고 구동하느냐에
따라서 두개가 서로 연계를 갖을 수 있지만 정확히 말하면 두개의 독립된 별개의
mysqld 입니다.

즉,

shell> safe_mysqld &
shell> mysqld_multi start
shell> mysqladmin shutdown
shell> mysqld_multi stop

이와 같이 별도로 각각 구별하여 시작/종료할 수 있습니다.

만약 이와 같이 각각 서로 구별하여 운영하고 싶지 않다면
기존의 safe_mysql의 [mysqld] 섹션을 mysqld_multi 용의 [mysqld1]으로
수정하여 기본 mysqld로 운영할 수 있습니다.

<참고> 부팅시 자동 시작하기(/etc/rc.d/rc.local)
/usr/local/mysql/bin/mysqld_multi \
  --defaults-file=/root/.my.cnf start
</참고>


4. MySQL 3.23.x와 MySQL 4.0.x 을 동시에 구동하는 방법

지금까지의 내용을 이해했다면 두개의 서로 다른 MySQL 버전을 구동하는
방법은 아주 쉽습니다.

  MySQL 3.23.x : /usr/local/mysql
  MySQL 4.0.x  : /usr/local/mysql4

에 각각 설치했다는 가정입니다.

<주의>
MySQL 4.0.x 버전은 safe_mysqld가 아니라 mysqld_safe으로 스크립트 이름이
변경되었습니다.
</주의>

-- /root/.my.cnf 또는 /etc/my.cnf ----------------
[client]
(생략)...

[mysql]
(생략)...

[mysqld]
(생략)...

[mysqld_multi]
mysqld		= /usr/local/mysql4/bin/mysqld_safe ## <-- 주의
mysqladmin	= /usr/local/mysql4/bin/mysqladmin
#user		= root

[mysqld1]
socket		= /tmp/mysql4.sock1
#port		= 3307
skip-name-resolve
skip-network	## only localhost access
default-character-set = euc_kr
pid-file	= /usr/local/mysql4/var/mysqld1.pid
datadir		= /usr/local/mysql4/var
language	= /usr/local/mysql4/share/mysql/english
log		= /usr/local/mysql4/var/mysqld1.log
user		= mysql

[myisamchk]
(생략)...

[mysqladmin]
(생략)...

[mysqldump]
(생략)...
-----------------------------------------------------

*MySQL 3.23.x 구동
shell> safe_mysqld &

*MySQL 4.0.x 구동
shell> mysqld_multi start

필자가 이 방법을 소개한 이유는 MySQL 4.0.x 버전으로 업그레이드할 경우
관리자가 조용하게(?) 테스트해 보아, 이상이 없다면 아무도 모르게(?)
조용하게 업그레이드할 수 있다는 점입니다.

물론 다음의 URL을 먼저 알아두어야 합니다.

  - http://www.mysql.com/doc/en/Upgrading-from-3.23.html


5. 공유한 하나의 DB에 랜덤하게 접속하는 방법

이 방법의 서로 다른 여러개의 mysqld 데몬의 datadir을 모두 동일하게
설정하여 운영하는 방법입니다.

여러개의 mysqld 데몬을 띄어야 하므로 시스템에 충분한 메모리가 있어야
합니다(512M 이상).

*권장 하는 경우)
1. 시스템 부하가 극도로 높지 않는 경우.
2. 하나(socket)의 mysqld로 시스템을 운영하기에는 자원이 남는 경우
3. 하나의 mysqld로 서비스가 포화 상태인 경우
4. 기타 테스트로 전보다 낫은 성능을 낼 경우


<주의>
필자가 테스트해 본 바로는 시스템 부하(CPU 사용량 95%이상)가 심한
경우, 좋은 대안이 될 수 없습니다(큰 효과가 없음).
(여러개의 mysqld 데몬을 띄우지 않아도 얼마든지 높은 퍼포먼스를
낼 수 있습니다)
다만, 이 방법은 여러가지 더 많은 테스트를 해보아, 전보다 좀더 낫은
성능을 낼 경우에만 사용하도록 하세요.
</주의>

-- /root/.my.cnf 또는 /etc/my.cnf ----------------
[client]
(생략)...

[mysql]
(생략)...

[mysqld]
default-character-set = euc_kr
skip-name-resolve
skip-network	## only localhost access
datadir		= /usr/local/mysql/var
language	= /usr/local/mysql/share/mysql/english
user		= mysql
(생략)...

[mysqld_multi]
mysqld		= /usr/local/mysql/bin/safe_mysqld
mysqladmin	= /usr/local/mysql/bin/mysqladmin
#user		= root

[mysqld1]
socket		= /tmp/mysql.sock1
#port		= 3307
pid-file	= /usr/local/mysql/var/mysqld1.pid
log		= /usr/local/mysql/var/mysqld1.log

[mysqld2]
socket		= /tmp/mysql.sock2
#port		= 3308
pid-file	= /usr/local/mysql/var/mysqld2.pid
log		= /usr/local/mysql/var/mysqld2.log

[mysqld3]
socket		= /tmp/mysql.sock3
#port		= 3309
pid-file	= /usr/local/mysql/var/mysqld3.pid
log		= /usr/local/mysql/var/mysqld3.log

[myisamchk]
(생략)...

[mysqladmin]
(생략)...

[mysqldump]
(생략)...
-----------------------------------------------------

앞의 설정을 요약하면 각 그룹 [mysqld#]는 datadir 옵션이 없으므로
[mysqld] 섹션의 datadir를 그대로 따릅니다.
즉 모두 같은 하나의 datadir을 공유하는 결과가 됩니다.

또한 log 옵션을 생략하면 기본 [mysqld] 섹션의 log 옵션을 따릅니다.
앞의 예제는 접속 확인을 알아보기 위해서 각각 다른 log 파일에 기록
하도록 했을 뿐입니다.

만약 하나의 유저로 모두 구동하고 싶지 않다면 각 [mysqld#] 섹션에
해당 user 옵션을 설정해 주면 됩니다.
(하나의 유저당 파일 제한수가 적을 경우)

shell> mysqld_multi start

이렇게 명령어를 내리면 추가로 3개의 각각 독립된 mysqld가 구동되며,
모두 동일한 하나의 datadir를 공유합니다.

만약 1, 2번 mysqld만 시작하도록 하고 싶다면

shell> mysqld_multi start 1,2

이렇게 추가 옵션을 주면 됩니다.

shell> mysqld_multi report

앞의 명령어로 3개 모두 구동하는지 확인해 보도록 하세요.

이제는 랜덤하게 1, 2, 3번에 접속하도록 socket 파일만 랜덤하게 선택해
주면 됩니다.

만약 PHP로 접근한다면(검색에만 적용),

----------------------------------------------------
<?php
##
## 검색모드에만 적용
if('검색모드인경우') {
  $sock = '/tmp/mysql.sock' . rand(1,3);
  ini_set('mysql.default_socket',$sock);
}

(생략: 나머지는 기존과 동일)

?>
----------------------------------------------------

이렇게 접속하는 기본 socket 파일 위치를 랜덤하게 만들어 주면 됩니다.

만약 각각 datadir이 서로 다르고 독자적으로 모두 별개의 MySQL 서비스를
한다면, 관리자 측면에서 다음과 같이 설정할 수 있습니다.

-- httpd.conf --------------------------------------
...
<VirtualHost xxxx>
  ...
  php_value mysql.default_socket "/tmp/mysql.sock1"
</VirtualHost>

<VirtualHost xxxx>
  ...
  php_value mysql.default_socket "/tmp/mysql.sock2"
</VirtualHost>

<VirtualHost xxxx>
  ...
  php_value mysql.default_socket "/tmp/mysql.sock3"
</VirtualHost>
----------------------------------------------------

or

-- DOCUMENT_ROOT/.htaccess -------------------------
...
php_value mysql.default_socket "/tmp/mysql.sock2"
----------------------------------------------------


6. 후기

(생략)


EOF
신고
Posted by Tornado tornado